Linux Today: Linux News On Internet Time.
Search Linux Today
Linux News Sections:  Developer -  High Performance -  Infrastructure -  IT Management -  Security -  Storage -
Linux Today Navigation
LT Home
Contribute
Contribute
Link to Us
Linux Jobs

Partner Sites
JustLinux.com
Linux Planet
PHPBuilder
Technology Jobs

Top White Papers

More on LinuxToday


Robots.txt Tips For Dealing With Bots

Jan 15, 2010, 03:02 (0 Talkback[s])

[ Thanks to Andrew Weber for this link. ]

"The robots.txt is used to provide crawling instructions to web robots using the Robots Exclusion Protocol. When a web robots visits your site it will check this file, robots.txt, to discover any directories or pages you want to exclude from the web robot listing on the search engine. This is an important file which determines SEO for search engines and can help rankings.

"User-agent: *
Disallow: /administrator
Disallow: /media
Disallow: /topsecret

"The text above tells the robot not to visit the /administrator directory, the /media directory or the /topsecret directory. The robots do not have to follow your suggestions, they can ignore your “disallows”. It is important to understand that you really do not control the robots, you only are making suggestions. So, do not count on keeping that /topsecret directory secret. This is especially true of malware robots who are really looking for stuff like the /topsecret directory."

Complete Story

Related Stories: