Linux Today: Linux News On Internet Time.

Distributed data processing with Hadoop

May 26, 2010, 09:02 (0 Talkback[s])
(Other stories by M. Tim Jones)

[ Thanks to An Anonymous Reader for this link. ]

"Although Hadoop is the core of data reduction for some of the largest search engines, it's better described as a framework for the distributed processing of data. And not just data, but massive amounts of data, as would be required for search engines and the crawled data they collect. As a distributed framework, Hadoop enables many applications that benefit from parallelization of data processing.

"This article is not meant to introduce you to Hadoop and its architecture but rather to demonstrate a simple Hadoop setup. In the Resources section, you can find more details on Hadoop architecture, components, and theory of operation. With that disclaimer in place, let's dive right into Hadoop installation and configuration."

Complete Story

Related Stories: