Linux Today: Linux News On Internet Time.

More on LinuxToday

Distributed data processing with Hadoop - Part-3: App Build

Jul 21, 2010, 07:39 (0 Talkback[s])
(Other stories by M. Tim Jones)


Desktop-as-a-Service Designed for Any Cloud ? Nutanix Frame

[ Thanks to An Anonymous Reader for this link. ]

"With configuration, installation, and the use of Hadoop in single- and multi-node architectures under your belt, you can now turn to the task of developing applications within the Hadoop infrastructure. This final article in the series explores the Hadoop APIs and data flow and demonstrates their use with a simple mapper and reducer application.

"The first two articles of this series focused on the installation and configuration of Hadoop for single- and multinode clusters. This final article explores programming in Hadoop—in particular, the development of a map and a reduce application within the Ruby language. I chose Ruby, because first, it's an awesome object-oriented scripting language that you should know, and second, you'll find numerous references in the Resources section for tutorials addressing both the Java™ and Python languages. Through this exploration of MapReduce programming, I also introduce you to the streaming application programming interface (API). This API provides the means to develop applications in languages other than the Java language.

"Let's begin with a short introduction to map and reduce (from the functional perspective), and then take a deeper dive into the Hadoop programming model and its architecture and elements that carve, distribute, and manage the work."

Complete Story

Related Stories: