Robots.txt Tips For Dealing With Bots
Jan 15, 2010, 03:02 (0 Talkback[s])
Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
[ Thanks to Andrew
Weber for this link. ]
"The robots.txt is used to provide crawling
instructions to web robots using the Robots Exclusion Protocol.
When a web robots visits your site it will check this file,
robots.txt, to discover any directories or pages you want to
exclude from the web robot listing on the search engine. This is an
important file which determines SEO for search engines and can help
"The text above tells the robot not to visit the /administrator
directory, the /media directory or the /topsecret directory. The
robots do not have to follow your suggestions, they can ignore your
“disallows”. It is important to understand that you
really do not control the robots, you only are making suggestions.
So, do not count on keeping that /topsecret directory secret. This
is especially true of malware robots who are really looking for
stuff like the /topsecret directory."