---

Distributed data processing with Hadoop

[ Thanks to An Anonymous Reader for
this link. ]

“Although Hadoop is the core of data reduction for some
of the largest search engines, it’s better described as a framework
for the distributed processing of data. And not just data, but
massive amounts of data, as would be required for search engines
and the crawled data they collect. As a distributed framework,
Hadoop enables many applications that benefit from parallelization
of data processing.

“This article is not meant to introduce you to Hadoop and its
architecture but rather to demonstrate a simple Hadoop setup. In
the Resources section, you can find more details on Hadoop
architecture, components, and theory of operation. With that
disclaimer in place, let’s dive right into Hadoop installation and
configuration.”


Complete Story

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends, & analysis