---

LinuxPlanet: An In-Depth Look at Reiserfs

“Originally designed by Hans Reiser, Reiserfs carries the
analogy between databases and filesystems to its logical
conclusion. In essence, Reiserfs treats the entire disk partition
as if it were a single database table. Directories, files, and file
metadata are organized in an efficient data structure called a
“balanced tree.” This differs somewhat from the way in which
traditional filesystems operate, but it offers large speed
improvements for many applications, especially those which use lots
of small files.”

“Reading and writing of large files, such as CDROM images, is
often limited by the speed of the disk hardware or the I/O channel,
but access to small files such as shell scripts is often limited by
the efficiency of the filesystem design. The reason for this is
that opening a file requires the system first to locate the file,
and that means reading directories off the disk. Furthermore, the
system needs to examine the security metadata to see if the user
has permission to access the file, and that means additional disk
reads. The system can literally spend more time deciding whether to
allow the access, and then locating the data on the drive, than it
does actually reading such a small amount of information from the
file itself.”

“Reiserfs uses its balanced trees to streamline the process of
finding the files and retrieving their security (and other)
metadata. For extremely small files, the entire file’s data can
actually be stored physically near the file’s metadata, so that
both can be retrieved together with little or no movement of the
disk seek mechanism. If an application needs to open many small
files rapidly, this approach significantly improves
performance.”

“Another feature of Reiserfs is that the balanced tree stores
not just metadata, but also the file data itself. In a traditional
filesystem such as ext2, space on the disk is allocated in blocks
ranging in size from 512 bytes to 4096 bytes, or even larger. If a
file’s size happens to be anything other than an exact multiple of
the block size, space will be wasted. For example, suppose the
block size is 1024 bytes but you need to store a file that is 8195
bytes long. Eight blocks is 8192, so almost all of the file will
fit into eight blocks. The remaining three bytes have their own
block, which is mostly empty! The wasted space is almost one whole
block out of nine, or about 11 percent. Now imagine a file 1025
bytes long. It almost, but not quite, fits into one block, but
requires two. The wasted space is nearly 50 percent. The worst case
is a very tiny file, such as a trivial (but useful) one-line shell
script. Such a file may be only 50 bytes or so (for example) and
would fit into just one block. But if the block is 1024 bytes, then
the file has wasted about 95 percent of its allocated space. As you
can see, the wasted space (as a percentage) is smaller if the files
are larger.”

Complete
Story

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends, & analysis