Linux Today: Linux News On Internet Time.

LinuxPlanet: An In-Depth Look at Reiserfs

Jan 22, 2001, 13:59 (45 Talkback[s])
(Other stories by Scott Courtney)

"Originally designed by Hans Reiser, Reiserfs carries the analogy between databases and filesystems to its logical conclusion. In essence, Reiserfs treats the entire disk partition as if it were a single database table. Directories, files, and file metadata are organized in an efficient data structure called a "balanced tree." This differs somewhat from the way in which traditional filesystems operate, but it offers large speed improvements for many applications, especially those which use lots of small files."

"Reading and writing of large files, such as CDROM images, is often limited by the speed of the disk hardware or the I/O channel, but access to small files such as shell scripts is often limited by the efficiency of the filesystem design. The reason for this is that opening a file requires the system first to locate the file, and that means reading directories off the disk. Furthermore, the system needs to examine the security metadata to see if the user has permission to access the file, and that means additional disk reads. The system can literally spend more time deciding whether to allow the access, and then locating the data on the drive, than it does actually reading such a small amount of information from the file itself."

"Reiserfs uses its balanced trees to streamline the process of finding the files and retrieving their security (and other) metadata. For extremely small files, the entire file's data can actually be stored physically near the file's metadata, so that both can be retrieved together with little or no movement of the disk seek mechanism. If an application needs to open many small files rapidly, this approach significantly improves performance."

"Another feature of Reiserfs is that the balanced tree stores not just metadata, but also the file data itself. In a traditional filesystem such as ext2, space on the disk is allocated in blocks ranging in size from 512 bytes to 4096 bytes, or even larger. If a file's size happens to be anything other than an exact multiple of the block size, space will be wasted. For example, suppose the block size is 1024 bytes but you need to store a file that is 8195 bytes long. Eight blocks is 8192, so almost all of the file will fit into eight blocks. The remaining three bytes have their own block, which is mostly empty! The wasted space is almost one whole block out of nine, or about 11 percent. Now imagine a file 1025 bytes long. It almost, but not quite, fits into one block, but requires two. The wasted space is nearly 50 percent. The worst case is a very tiny file, such as a trivial (but useful) one-line shell script. Such a file may be only 50 bytes or so (for example) and would fit into just one block. But if the block is 1024 bytes, then the file has wasted about 95 percent of its allocated space. As you can see, the wasted space (as a percentage) is smaller if the files are larger."

Complete Story

Related Stories: