Linux Today: Linux News On Internet Time.

Big Data, MapReduce, Hadoop, NoSQL: The Relational Technology Behind the Curtain

Oct 20, 2011, 16:02 (0 Talkback[s])
(Other stories by Wayne Kernochan)

"The best way to understand the place of Hadoop in the computing universe is to view the history of data processing as a constant battle between parallelism and concurrency. Think of the database as a data store plus a protective layer of software that is constantly being bombarded by transactions – and often, another transaction on a piece of data arrives before the first is finished. To handle all the transactions, databases have two choices at each stage in computation: parallelism, in which two transactions are literally being processed at the same time; and concurrency, in which a processor switches between the two rapidly in the middle of the transaction.

"Pure parallelism is obviously faster, but to avoid inconsistencies in the results of the transaction, you often need coordinating software, and that coordinating software is hard to operate in parallel, because it involves frequent communication between the parallel "threads" of the two transactions. At a global level (like that of the Internet), the choice now translates into a choice between "distributed" and "scale-up" single-system processing."

Complete Story

Related Stories: