Linux Today: Linux News On Internet Time.
Search Linux Today
Linux News Sections:  Developer -  High Performance -  Infrastructure -  IT Management -  Security -  Storage -
Linux Today Navigation
LT Home
Contribute
Contribute
Link to Us
Linux Jobs


Top White Papers

More on LinuxToday


developerWorks: Build a Web Spider on Linux

Nov 16, 2006, 05:30 (0 Talkback[s])
(Other stories by M. Tim Jones)

[ Thanks to An Anonymous Reader for this link. ]

"A spider is a program that crawls the Internet in a specific way for a specific purpose. The purpose could be to gather information or to understand the structure and validity of a Web site. Spiders are the basis for modern search engines, such as Google and AltaVista. These spiders automatically retrieve data from the Web and pass it on to other applications that index the contents of the Web site for the best set of search terms.

"Similar to a spider, but with more interesting legal questions, is the Web scraper. A scraper is a type of spider that targets specific content from the Web, such as the cost of products or services..."

Complete Story

Related Stories: