Apache Today: Apache Guide: Spiders and Robots
Nov 21, 2000, 13:40 (0 Talkback[s])
(Other stories by Rich Bowen)
"A robot, also called a 'bot,' a spider, a web crawler, and a
variety of other names, is a program that automatically downloads
web pages for a variety of purposes. Because it downloads one page,
and then recursively downloads every page that is linked to from
that page, one can imagine it crawling around the web, harvesting
content. From these images come some of the names that these
programs are called, as well as some of the names of particular
spiders, like WebCrawler, Harvester, MomSpider, and so on."
"What they do with these web pages once they have downloaded
them varies from application to application. Most of these spiders
are doing some variety of search, while some exist so that people
can read online content when they are not actually connected to the
internet. Others pre-cache content, so that end-users can more
rapidly access that content. Still others are just trying to
generate statistical information about the web."
Complete Story