Robots.txt Tips For Dealing With Bots
Jan 15, 2010, 03:02 (0 Talkback[s])
[ Thanks to Andrew
Weber for this link. ]
"The robots.txt is used to provide crawling
instructions to web robots using the Robots Exclusion Protocol.
When a web robots visits your site it will check this file,
robots.txt, to discover any directories or pages you want to
exclude from the web robot listing on the search engine. This is an
important file which determines SEO for search engines and can help
rankings.
"User-agent: *
Disallow: /administrator
Disallow: /media
Disallow: /topsecret
"The text above tells the robot not to visit the /administrator
directory, the /media directory or the /topsecret directory. The
robots do not have to follow your suggestions, they can ignore your
“disallows”. It is important to understand that you
really do not control the robots, you only are making suggestions.
So, do not count on keeping that /topsecret directory secret. This
is especially true of malware robots who are really looking for
stuff like the /topsecret directory."
Complete Story
Related Stories: