Linux Today: Linux News On Internet Time.

Apache Security: A Watched Bot Never Spoils (Your Server)

Jan 28, 2009, 12:04 (0 Talkback[s])
(Other stories by Ken Coar)

"At the time that robots.txt is being processed, there is no way of telling which of these five cases will apply. For this reason, robots.txt merely checks the ID, as it were, of the spider making the request, for use by intelligence in subsequent requests.

"We can handle case number three by emitting stanzas that only apply to the robot making the request. That way, there are no other robots mentioned whose permissions it can record and later abuse.

"Now that robots.txt is actually a dynamic document, let's put it to work and actually do something with it. For starters, and for performance, I use a MySQL database to record bot activity and access rules. Let's begin by having it record the particulars of the current request. Here is the first MySQL table I use for that:"

Complete Story

Related Stories: