Linux Today: Linux News On Internet Time.





More on LinuxToday


Robots.txt Tips For Dealing With Bots

Jan 15, 2010, 03:02 (0 Talkback[s])

WEBINAR:
On-Demand

How to Help Your Business Become an AI Early Adopter


[ Thanks to Andrew Weber for this link. ]

"The robots.txt is used to provide crawling instructions to web robots using the Robots Exclusion Protocol. When a web robots visits your site it will check this file, robots.txt, to discover any directories or pages you want to exclude from the web robot listing on the search engine. This is an important file which determines SEO for search engines and can help rankings.

"User-agent: *
Disallow: /administrator
Disallow: /media
Disallow: /topsecret

"The text above tells the robot not to visit the /administrator directory, the /media directory or the /topsecret directory. The robots do not have to follow your suggestions, they can ignore your “disallows”. It is important to understand that you really do not control the robots, you only are making suggestions. So, do not count on keeping that /topsecret directory secret. This is especially true of malware robots who are really looking for stuff like the /topsecret directory."

Complete Story

Related Stories: