Apache Security: A Watched Bot Never Spoils (Your Server)
Jan 28, 2009, 12:04 (0 Talkback[s])
(Other stories by Ken Coar)
"At the time that robots.txt is being processed, there is no way
of telling which of these five cases will apply. For this reason,
robots.txt merely checks the ID, as it were, of the spider making
the request, for use by intelligence in subsequent requests.
"We can handle case number three by emitting stanzas that only
apply to the robot making the request. That way, there are no other
robots mentioned whose permissions it can record and later
abuse.
"Now that robots.txt is actually a dynamic document, let's put
it to work and actually do something with it. For starters, and for
performance, I use a MySQL database to record bot activity and
access rules. Let's begin by having it record the particulars of
the current request. Here is the first MySQL table I use for
that:"
Complete Story
Related Stories:
- SSH Tunnel Setup and Configuration Guide(Jan 27, 2009)
- Microsoft donates code to Apache Stonehenge project(Jan 23, 2009)
- Ubuntu Server Edition Gaining Business Applications(Jan 22, 2009)
- Hardening the Linux server(Jan 11, 2009)
- Setting Up A High-Availability Load Balancer With HAProxy/Wackamole/Spread(Jan 09, 2009)
- MySQL, YourSQL, OurSQL(Dec 18, 2008)
- Creating an oBAMP Stack: OpenBSD, Apache, MySQL, and PHP(Dec 10, 2008)
- Secure Apache: Out, Damned Bot(Dec 05, 2008)