bashreduce: A Bare-Bones MapReduce

[ Thanks to An Anonymous Reader for
this link. ]

“If you’ve managed to somehow miss most of the
MapReduce revolution, Wikipedia describes it pretty well: MapReduce
is a framework for computing certain kinds of distributable
problems using a large number of computers (nodes), collectively
referred to as a cluster.

Computational processing can occur on data stored either in a
filesystem (unstructured) or within a database (structured).

“Map” step: The master node takes the input, chops
it up into smaller sub-problems, and distributes those to worker
nodes. A worker node may do this again in turn, leading to a
multi-level tree structure. The worker node processes that smaller
problem, and passes the answer back to its master node.

“Reduce” step: The master node then takes the
answers to all the sub-problems and combines them in a way to get
the output – the answer to the problem it was originally trying to

“In fact, the MapReduce model has proven so useful that the
Apache Hadoop project (an Open Source implementation of the
infrastructure described in the Google paper) has become very
popular in the last few years. Yahoo, which employs numerous Hadoop
committers, recently hosted their annual Hadoop Summit which
attracted over 500 users and developers.”


Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends, & analysis