O'Reilly Network: How the Wayback Machine Works

[ Thanks to Jason
Greenwood for this link. ]

“The Internet Archive made headlines back in November
with the release of the Wayback Machine, a Web interface to the
Archive’s five-year, 100-terabyte collection of Web pages. The
archive is the result of the efforts of its director, Brewster
Kahle, to capture the ephemeral pages of the Web and store them in
a publicly accessible library. In addition to the other millions of
web pages you can find in the Wayback Machine, it has direct
pointers to some of the pioneer sites from the early days of the
Web, including the NCSA What’s New page, The Trojan Room Coffee
Pot, and Feed magazine.

How big is 100 terabytes? Kahle, who serves as archive director
and president of Alexa Internet, a wholly-owned subsidiary of
Amazon.com, says it’s about five times as large as the Library of
Congress, with its 20 million books.

“What we have on the Web is phenomenal,” Kahle says. “There are
more than 10 million people’s voices evidenced on the Web. It’s the
people’s medium, the opportunity for people to publish about
anything — the great, the noble, the absolute picayune, and the
profane.””

Complete
Story

O’Reilly Network: How the Wayback Machine Works

Get the Free Newsletter!

Must Read

antiX Linux: A ‘Proudly Anti-Fascist’ Distro That’s ‘Suitable for Old and New Computers’

Collabora Takes First Place at ICME 2025 Grand Challenge

Immich 1.135 Photo and Video Backup Adds iOS Home Screen Widgets

ONLYOFFICE 9.0 Launches with Sleek UI, AI Features, and Markdown Support

BrosTrend 5 Port 2.5GB Switch Review

Our Brands