dcsimg
Linux Today: Linux News On Internet Time.





More on LinuxToday


O'Reilly Network: How the Wayback Machine Works

Jan 23, 2002, 22:16 (1 Talkback[s])
(Other stories by Richard Koman)

[ Thanks to Jason Greenwood for this link. ]

"The Internet Archive made headlines back in November with the release of the Wayback Machine, a Web interface to the Archive's five-year, 100-terabyte collection of Web pages. The archive is the result of the efforts of its director, Brewster Kahle, to capture the ephemeral pages of the Web and store them in a publicly accessible library. In addition to the other millions of web pages you can find in the Wayback Machine, it has direct pointers to some of the pioneer sites from the early days of the Web, including the NCSA What's New page, The Trojan Room Coffee Pot, and Feed magazine.

How big is 100 terabytes? Kahle, who serves as archive director and president of Alexa Internet, a wholly-owned subsidiary of Amazon.com, says it's about five times as large as the Library of Congress, with its 20 million books.

"What we have on the Web is phenomenal," Kahle says. "There are more than 10 million people's voices evidenced on the Web. It's the people's medium, the opportunity for people to publish about anything -- the great, the noble, the absolute picayune, and the profane.""

Complete Story