[ Thanks to Justin
Klein Keane for this link. ]
“For instance, if your site has an administrative
interface at /admin you might want to list a couple hundred
non-existent sub-directories and sift /admin into the list near the
middle or end. This would provide frustrating for an attacker
looking through the robots.txt entries by hand. If an attacker was
using an automated tool, however, they likely won’t be slowed down
by false entries in the robots.txt file.“The system I’m describing can be implemented in a number of
ways. The basic idea is the same though. You fill your robots.txt
file with numerous false entries. Each of these false entries leads
to a server response that triggers a blacklisting of the offending
IP address. This means that real subdirectories and files can still
safely be embedded in the robots.txt, but the time to search each
entry becomes exhaustive for an attacker.“In principle the system functions in a fairly straightforward
manner. Assume we have an administrative login page at /admin that
we want to hide from attackers. We create a robots.txt file that
contains the following entries:”