Red Hat: ext3 information
Oct 11, 2001, 05:41 (178 Talkback[s])
(Other stories by Michael Johnson)
No-Size-Fits-All! An Application-Down Approach for Your Cloud Transformation REGISTER >
Subject: ext3 information
From: Michael K. Johnson
Date: Tue, 21 Aug 2001 17:53:08 -0400
I wrote up a short piece that I hope to flesh out a bit more later
on why Red Hat chose to include ext3 in this release, why you want
to use it, and what we did to make it robust.
It's not an anti-any-other-filesystem tirade at all. Don't take
any part of it as meant to put down any other filesystem, even
ones we have not chosen to ship yet. No hidden agenda involving
alien abductions... :-)
Anyway, I hope it's useful. Feedback to the list, please.
"He that composes himself is wiser than he that composes a book."
Linux Application Development -- Ben Franklin
Why do you want to migrate from ext2 to ext3? Four main reasons:
availability, data integrity, speed, and easy transition.
After an unclean system shutdown (unexpected power failure, system
crash), each ext2 file system cannot be mounted until its
consistency has been checked by the e2fsck program. The amount of
time that the e2fsck program takes is determined primarily by the
size of the file system, and for today's relatively large (many
tens of gigabytes) file systems, this takes a long time. Also, the
more files you have on the file system, the longer the consistency
check takes. File systems several hundreds of gigabytes in size may
take an hour or more to check. This severely limits
By contrast, ext3 does not require a file system check even
after an unclean system shutdown, except for certain rare hardware
failure cases (e.g. hard drive failures), because the data is
written to disk in such a way that the file system is always
consistent. The time to recover an ext3 file system after an
unclean system shutdown does not depend on the size of the file
system or the number of files; rather, it depends on the size of
the "journal" used to maintain consistency. The default journal
size takes about a second to recover (depends on the speed of the
Using the ext3 file system can provide stronger guarantees about
data integrity in case of an unclean system shutdown. You have a
choice of how carefully to protect your data. Essentially, you can
choose either to keep the file system consistent but allow for
damage to data on the file system in the case of unclean system
shutdown (for a modest speed up under some but not all
circumstances) or to ensure that the data is consistent with the
state of the file system (which means that you will never see
garbage data in recently-written files after a crash.) The more
safe choice to keep the data consistent with the state of the file
system is the default.
Despite writing some data more than once, ext3 is often faster
(higher throughput) than ext2 because ext3's journaling optimizes
hard drive head motion. You can choose from three journaling modes
to optimize speed, optionally choosing to trade off some data
integrity. One mode, data=writeback, limits the data integrity
guarantees, allowing old data to show up in files after a crash,
for a potential increase in speed under some circumstances. This
mode, which is the default journaling mode for most journaling file
systems, essentially provides the more limited data integrity
guarantees of the ext2 file system and merely avoids the long file
system check at boot time. The second mode, data=ordered (the
default mode), guarantees that the data is consistent with the file
system: recently-written files will never show up with garbage
contents after a crash. The last mode, data=journal, requires a
larger journal for reasonable speed in most cases and therefore
takes longer to recover in case of unclean shutdown, but is
sometimes faster for certain database operations. The default mode
is recommended for all general-purpose computing needs.
It is easy to change from ext2 to ext3 and gain the benefits of a
robust journaling file system, without reformatting. That's right,
no need to do a long, tedious, and error-prone backup, reformat,
restore operation in order to experience the advantages of ext3.
There are two ways to do the transition:
- The Red Hat Linux installer program will offer to transition
your file systems when you upgrade your system. All you have to do
is check one checkbox per file system.
- The tune2fs program can add a journal to an existing ext2 file
system. If the file system is already mounted when it is being
transitioned, the journal will be visible as the file ".journal" in
the root directory of the file system. If the file system is not
mounted, the journal will be hidden and will not appear in the file
system. Just run tune2fs -j /dev/hda1 (or whatever device holds the
file system you are transitioning) and change "ext2" to "ext3" on
the matching lines in /etc/fstab. If you are transitioning your
root file system, you will have to use an initrd to boot; run the
"mkinitrd" program as described in the manual and make sure that
your lilo or grub configuration loads the initrd. (If you fail to
make that change, the system will still boot, but the root file
system will be mounted as ext2 instead of ext3 -- you can tell this
by looking at the output of the command "cat /proc/mounts") More
information on tune2fs can be found in the tune2fs man page.
A list of reasons Red Hat chose ext3 for our first supported
journaling file system follows. Note that these reasons are not
necessarily each unique to ext3 (some other journaling file systems
share several of the points here) but the whole set of reasons
taken together is unique to ext3.
- ext3 is forwards and backwards compatible with ext2, allowing
users to keep existing file systems while very simply adding
journaling capability. Any user who wishes to un-journal a file
system can do so easily. (Not that we expect many to do so...)
Furthermore, an ext3 file system can be mounted as ext2 without
even removing the journal, as long as a recent version of e2fsprogs
(such as the one shipped in this release) is installed.
- ext3 benefits from the long history of fixes and enhancements
to the ext2 file system, and will continue to do so. This means
that ext3 shares ext2's well-known robustness, but also that new
features are added to ext2, they can be carried over to ext3 with
little difficulty. When, for example, extended attributes or HTrees
are added to ext2, it will be relatively easy to add them to ext3.
(The extended attributes feature will enable things like access
control lists; HTrees make directory operations extremely fast and
highly scalable to very large directories.)
- ext3, like ext2, has a multi-vendor team of developers who
develop it and understand it well; its development does not depend
on any one person or organisation.
- ext3 provides and makes use of a generic journaling layer (jbd)
which can be used in other contexts, and can journal not only
within the file system, but also to other devices, so as NVRAM
devices become available and supported under Linux, ext3 will be
able to support them.
- ext3 has multiple journaling modes. It can journal all file
data and metadata (data=journal), or it can journal metadata but
not data (data=ordered or data=writeback). When not journaling file
data, you can choose whether to write file system data before
metadata (data=ordered; causes all metadata to point to valid data)
or not handle file data specially at all (data=writeback; file
system will be consistent, but old data may appear in files after
an unclean system shutdown). This gives the administrator the power
to make the trade off between speed and file data consistency, and
to tune speed for specialized usage patterns.
- ext3 has broad cross-platform compatibility, working on 32 and
64 bit architectures, and on both little-endian and big-endian
systems. Any system (currently including many Unix clones and
variants, BeOS, and Windows) capable of accessing files on an ext2
file system will also be able to access files on an ext3 file
- ext3 does not require extensive core kernel changes and
requires no new system calls, thus presenting Linus no challenges
to integrating ext3 into his official Linux kernel releases; ext3
is already integrated into Alan Cox's -ac kernels, slated for
migration to Linus's official kernel soon.
- The e2fsck file system recovery program has a long and proven
track record of successful data recovery when software or hardware
faults corrupt a file system. ext3 uses this same e2fsck code for
salvaging the file system after such corruption so it has the same
robustness against catastrophic data loss as ext2 in the presence
of data-corruption faults.
Again, we don't claim that every one of these points are unique
to ext3. Most of them are shared by at least one other filesystem.
We merely claim that the set of all of them together is true only
Here are some of the things Red Hat has done to ensure that ext3
is safe for users to use for their data:
- We have done extensive stress testing under a large set of
configurations. This has involved many thousands of hours of
"contrived" load testing on a wide variety of hardware and file
system configurations, as well as many use case tests.
- We have audited ext3 for multiple conditions, including memory
allocation errors happening at any point. We have tested that by
forcing false errors and testing file system consistency.
- We audited and tested ext3 for poor interactions with the VM
subsystem, finding and fixing several interactions. A journaling
file system puts more stress on the VM subsystem, and we found and
fixed bugs both in ext3 and in the VM subsystem in the process of
this audit and these tests. After thousands of hours of this
testing, we are extremely confident in the robustness of the ext3
- We have done an extensive year-long-plus beta program, starting
with ext3 on the 2.2 kernel series, and then moving forwards to the
2.4 kernel series. Even before the official beta program, ext3 was
put into production use in some circumstances; ext3 has been in
production use on some widely-accessed servers, including the
rpmfind.net servers, for over two years.