Big Data, MapReduce, Hadoop, NoSQL: The Relational Technology Behind the Curtain
Oct 20, 2011, 16:02 (0 Talkback[s])
(Other stories by Wayne Kernochan)
"The best way to understand the place of Hadoop in the computing
universe is to view the history of data processing as a constant
battle between parallelism and concurrency. Think of the database
as a data store plus a protective layer of software that is
constantly being bombarded by transactions – and often,
another transaction on a piece of data arrives before the first is
finished. To handle all the transactions, databases have two
choices at each stage in computation: parallelism, in which two
transactions are literally being processed at the same time; and
concurrency, in which a processor switches between the two rapidly
in the middle of the transaction.
"Pure parallelism is obviously faster, but to avoid
inconsistencies in the results of the transaction, you often need
coordinating software, and that coordinating software is hard to
operate in parallel, because it involves frequent communication
between the parallel "threads" of the two transactions. At a global
level (like that of the Internet), the choice now translates into a
choice between "distributed" and "scale-up" single-system
processing."
Complete Story
Related Stories:
- Hadoop: A Linux even Microsoft likes(Oct 14, 2011)
- Twitter Storm: Open Source Real-time Hadoop(Sep 28, 2011)
- Oracle Set to Announce New Database Product - Is it Hadoop?(Sep 15, 2011)
- Gluster Goes After Hadoop Big Data(Aug 24, 2011)
- Microsoft to Release Hadoop Connectors for SQL Server, Parallel Data Warehouse(Aug 11, 2011)
- Ensemble meets Hadoop on the cloud(Aug 09, 2011)
- Twitter to open source Hadoop-like tool(Aug 09, 2011)
- Hadoop, Big Data and Small Businesses(Aug 03, 2011)
- Doug Cutting talks about Hadoop, and open source(Aug 02, 2011)
- Hadoop & Startups: Where Open Source Meets Business Data(Jul 18, 2011)