Linux Today: Linux News On Internet Time.

bashreduce: A Bare-Bones MapReduce

Jul 07, 2009, 23:34 (0 Talkback[s])
(Other stories by Jeremy Zawodny)

[ Thanks to An Anonymous Reader for this link. ]

"If you’ve managed to somehow miss most of the MapReduce revolution, Wikipedia describes it pretty well: MapReduce is a framework for computing certain kinds of distributable problems using a large number of computers (nodes), collectively referred to as a cluster.

Computational processing can occur on data stored either in a filesystem (unstructured) or within a database (structured).

“Map” step: The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.

“Reduce” step: The master node then takes the answers to all the sub-problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve.

"In fact, the MapReduce model has proven so useful that the Apache Hadoop project (an Open Source implementation of the infrastructure described in the Google paper) has become very popular in the last few years. Yahoo, which employs numerous Hadoop committers, recently hosted their annual Hadoop Summit which attracted over 500 users and developers."

Complete Story

Related Stories: