---

Big Data, MapReduce, Hadoop, NoSQL: The Relational Technology Behind the Curtain

“The best way to understand the place of Hadoop in the computing
universe is to view the history of data processing as a constant
battle between parallelism and concurrency. Think of the database
as a data store plus a protective layer of software that is
constantly being bombarded by transactions – and often,
another transaction on a piece of data arrives before the first is
finished. To handle all the transactions, databases have two
choices at each stage in computation: parallelism, in which two
transactions are literally being processed at the same time; and
concurrency, in which a processor switches between the two rapidly
in the middle of the transaction.

“Pure parallelism is obviously faster, but to avoid
inconsistencies in the results of the transaction, you often need
coordinating software, and that coordinating software is hard to
operate in parallel, because it involves frequent communication
between the parallel “threads” of the two transactions. At a global
level (like that of the Internet), the choice now translates into a
choice between “distributed” and “scale-up” single-system
processing.”


Complete Story