Linux Today: Linux News On Internet Time.

Apache Today: Apache Guide: Spiders and Robots

Nov 21, 2000, 13:40 (0 Talkback[s])
(Other stories by Rich Bowen)

"A robot, also called a 'bot,' a spider, a web crawler, and a variety of other names, is a program that automatically downloads web pages for a variety of purposes. Because it downloads one page, and then recursively downloads every page that is linked to from that page, one can imagine it crawling around the web, harvesting content. From these images come some of the names that these programs are called, as well as some of the names of particular spiders, like WebCrawler, Harvester, MomSpider, and so on."

"What they do with these web pages once they have downloaded them varies from application to application. Most of these spiders are doing some variety of search, while some exist so that people can read online content when they are not actually connected to the internet. Others pre-cache content, so that end-users can more rapidly access that content. Still others are just trying to generate statistical information about the web."

Complete Story