One Billion Dollars! Wait… I Mean One Billion Files!!!
Oct 08, 2010, 03:04 (0 Talkback[s])
(Other stories by Jeffrey B. Layton)
How You Can Improve Your Machine Learning with Spark REGISTER >
"The world is awash in data. This fact is putting more and more
pressure on file systems to efficiently scale to handle
increasingly large amounts of data. Recently, Ric Wheeler from
Redhat experimented with putting 1 Billion files in a single file
system to understand what problems/issues the Linux community might
face in the future. Let's see what happened...
"No one is going to argue that the amount of data we generate
and want to keep is growing at an unprecedented rate. In a 2008
article blogger Dave Raffo highlighted some statistics from an IDC
model of enterprise data growth rate, that unstructured data was
increasing at about 61.7% CAGR (Compounded Annual Growth Rate). In
addition, data in the cloud (Google, Facebook, etc.) was expected
to increases at a rate of 91.8% through 2012. These are astonishing
growth rates that are causing file system developers to either bite
their finger nails to the quick or for them to start thinking about
some fairly outlandish file system requirements.
"As an example, on lwn, a poster mentioned that a single MRI
instrument can produce 20,000 files in a single scan. In about 9
months they had already produced about 23 million files from a
single MRI instrument"