I’ve been writing a fair amount on file systems in Linux because I think we are in the midst of a file system Renaissance. We are seeing new file systems added to the kernel coupled with increasingly enterprise class features giving us more options than ever before. What is even more remarkable is that not too long ago Linux file system development was almost non-existent and in a few short years we have so much development and testing happening (for a brief commentary on that, read the interview with Valerie Aurora).
But getting file systems into the kernel is a very long and arduous process. It is complicated, difficult work that takes a great deal of determination. This usually means that file systems won’t address a niche requirement that is needed by a segment of Linux users, but instead will be general file systems addressing the broader Linux market. But what if you wanted to create a file system so you could take compressed tar files (.tar.gz) and mount them as a file system so you can easily read and write to them? Or what if you wanted to be able to mount remote file systems using secure tools such as ssh or sftp? I think even if you wrote some very conforming Linux kernel code to address file system need such as this, it would take a very long time for them to be added to the kernel if they are included at all (it’s likely they would not because they are more niche than mainstream).
But some developers were still determined to create file systems that scratched their proverbial itch. So what they decided to do was create file systems in user-space allowing them to quickly develop capabilities they needed or to solve immediate problems. Having file systems in user-space, that is, the non-privileged area where user applications are executed, can be done quickly and easily upgraded since they do no affect the kernel directly. Let’s take a look at why and how file systems can reside in user-space.
There is something of a delineation between what is called “kernel-space” and “user-space”. Kernel-space is roughly indicated by anything that is happening within the kernel code or the “space” of the kernel code and resources, typically as a privileged user. On the other hand “user-space” indicates anything that is happening outside of the kernel where any user can create and run an application and use system resources. But the delineation has implications.
For example, let’s assume a good programmer has an idea for something cool/nifty/neat that would benefit a number of people. Getting that “something” into the kernel, if that is where it belongs (another topic of discussion), is a very long and difficult process, and rightly so. If the code is in the kernel, the collective group of kernel developers has to worry about security of the code, maintainability (people come and go), general usefulness, interactions with other parts of the code, and even the coding style (having a consistent coding style makes it easier to read parts of the code for everyone). Moreover, they have to think about whether the code answers these concerns and belongs in the kernel, because adding code will make the kernel larger (i.e. “kernel bloat”) with a number of undesirable side effects. Yet there is some really cool code that people develop that can be very useful or at the very least, interesting.
One solution that even Linus himself has suggested, is to move the code to user space. Doing this gives a number of advantages to the code including the fact that a bug is not likely to bring down the entire system as a kernel bug might. In fact Linus said, “… From a technical standpoint, I believe the kernel will be “more of the same”, and that all the _really_ interesting stuff will be going on in user space.” Of course this was said almost 9 years ago, but it makes for a good quote in an article about user-space file systems. However, he does have a point. The kernel is designed to be stable and provide proper access to resources for user applications not to be the playground of the latest cool file system.
User-space file systems have some distinct advantages. The first one is that they don’t reside in the kernel, naturally, so it’s fairly easy to distribute your code. That is, the code does not have to go through the extremely rigorous and sometimes aggravating process of testing, review, justification, stylistic code changes, test, review, justification, etc. Hopefully the developer(s) of the user-space file system have paid attention to security, maintainability, interactions with other user applications, etc., so that the code is reasonably safe for testing. Being in user-space also means that the file system code can be tested sooner since a bug usually just crashes the application and doesn’t cause a kernel panic. Developers don’t have to wait for something complete and tested before releasing to a wider test audience and can get some very immediate feedback as to the usefulness of the file system.
The second advantage is that if there is an enhancement or a bug fix, you can quickly update the file system (no waiting for a couple of kernel revisions). The file system is in user-space so you can update as you would any other user-space application using the package tools for your distribution (e.g. yum or apt-get). Of course, it is likely you will have to unmount the file system, upgrade, and then remount the file system, but this is a fairly common process that most administrators have mastered, typically performing upgrades during a maintenance period. But since it’s a file system I still recommend that you run it on a test system first before putting it into production.
A third advantage is that if the file system crashes for some reason, it doesn’t necessarily take down the entire OS. If the kernel file system crashes, there is a distinct possibility that the kernel could also throw an exception and crash. The kernel and file system developers have taken great pains to prevent this from happening but the fact that the file system resides in kernel space increases the possibility of a problem causing a kernel panic compared to a user-space application that typically crashes without causing a kernel panic. If a user-space file system crashes, you can just kill any associated processes (I love “kill -9” – just be sure you’re killing the correct process) and then remount the file system. Of course, there is the possibility of data corruption in this case, but hopefully the user-space file system designers have mechanisms to limit this (and you do make backups – right?).
So user-space has some nice pros to it. What tools/mechanisms are there to write user-space file systems? Ah – I’m glad that you asked. Meet FUSE.
One of the difficulties in writing code in user-space is interacting with the kernel. In particular, a file system will likely have to interact with the kernel VFS (Virtual File System) at some point to access the hardware. So how does a user-space file system access the VFS?
For a long time this was basically impossible. Any file system code had to live in kernel space (i.e. part of the kernel or a module) and that was the end of the story. But what do you do if, for example, you want to automatically encrypt/decrypt the data in a file system? Or if you want to compress/uncompress the data on a file system automatically (this is really deduplication) giving users the ability to see the data inside a tar file without untarring the data? Or perhaps you want to present the data in an SQL table as a directory with associated files? All of these things are great ideas but were unlikely to make it into the kernel. Ideally, these file systems should live in user-space but they still need to interact with the kernel.
Several developers decided they wanted to develop file systems or really, virtual file systems, such as those previously mentioned, in user space but needed some “help” from the kernel to get there. So they created something called FUSE (File System in USErspace). The concept is to create a simple kernel module that interacts with the kernel, particularly the VFS, on behalf of non-privileged user applications and has an API that can be accessed from userspace. Figure 1 below is the classic illustration of how this works.
The illustration corresponds to the “hello world” file system on the FUSE website. At a high level the “hello world” file system is compiled to create a binary called “hello”. This binary is executed in the upper right hand corner of the illustration with a file system mount point of
/tmp/fuse. Then the user executes an
ls -l command against the mount point (
ls -l /tmp/fuse). This commands goes through glibc to the VFS. The VFS then goes to the FUSE module since the mount point corresponds to a FUSE based file system. The FUSE kernel module then goes through glibc and libfuse (libfuse is the FUSE library in user space) and contacts the actual file system binary (“hello”). The file system binary returns the results back down the stack to the FUSE kernel model, back through the VFS, and finally back to the
ls -l command. There are some details I have glossed over of course, but the figure illustrates the general flow of operations and data.
Using the FUSE API you can pretty much write just about any type of file system you want with almost any features you want. Even better, you can use almost any language you want because there are many bindings between FUSE and other languages. For example you can create file systems in,
- C (naturally)
- Haskell (not Eddie)
…among others (I am pretty disappointed that no one has developed bindings for Fortran. Perhaps a Google Summer of Code project?).
If you look around the web you will find various introductions to writing file systems using FUSE. Some of these are very simple and some are more complex. If you are interested in writing serious file systems with FUSE, I suggest you first understand file systems and their basic designs, before moving to FUSE. But at the same time, you can still write some very interesting FUSE file systems pretty easily.
Examples of FUSE File Systems
There are many examples of file systems that use FUSE. Sometimes FUSE is used for prototyping or testing file systems or it is used as the file system itself. It is beyond the scope of this article to list all of them or even a good chunk of them, but some that you might recognize (or might not) include:
- SSHFS This is a file system client that can mount and interact with directories and files on a remote system using sftp. Very handy file system for mounting remote file systems.
- GmailFS This FUSE based file system was written to use Google’s email storage as a file system. Originally it used the gmail web interface but this kept changing. The previous link takes you to a new version of GmailFS that uses IMAP to use the gmail email space as a file system. One of the interesting aspects of this file system is that it’s written in Python.
- EncFS This FUSE based file system provides an encrypted file system for Linux. For a discussion about encrypted file systems and Linux please read this.
- NTFS-3G NTFS-3G gives you read/write access to a Windows NTFS file system. According to the website it works with Windows XP, Windows Server 2003, Windows 2000, Windows Vista, Windows Server 2008 and Windows 7 NTFS file systems.
- archivemount This file system allows you to mount archive files such as tar (.tar) or gzipped tar files (.tar.gz) to a mount point and interact with them including reading and writing. It’s a very cool way to check out the contents of a .tar.gz file before uncompressing and untarring it, especially if you only need one file from it. It also allows you to easily manipulate and create .tar.gz files.
- ZFS-Fuse This file system allows you to create, mount, use, and manage ZFS file systems under Linux. Recall that the licensing of ZFS is not compatible with GPL so interfacing ZFS with FUSE keeps ZFS as a user-space application which runs on Linux and doesn’t violate any licensing. So if you want ZFS on Linux, this is your best option.
- CloudStore CloudStore is a distributed file system that is integrated with Hadoop and Hypertable.
- MountableHDFS There are several projects that allow you to mount a Hadoop file system (HDFS) and interact with it as you would a normal POSIX-style file system. For example, you can do an “ls” or a “cp” or a “mv” on HDFS using these FUSE based projects. This also means you can use POSIX conforming applications to read/write to HDFS without having to use the API.
- GlusterFS This is a high performance, distributed file system that uses a concept of “translators” that allow you to create file systems with various capabilities including mirroring and replication, striping, load-balancing, disk caching, read-ahead, write-behind, and self-healing. One of the strengths of GlusterFS is that it doesn’t use metadata but rather relies on the knowledge of the file layout and the underlying file system.
- MooseFS MooseFS is a distributed fault-tolerant file system with several unique features: when you delete files MooseFS retains them for a period of time so they can be recovered; it also can create coherent snapshots of files even while the file is being accessed or written.
- s3fs This FUSE file system allows you to take a S3 bucket and mount it as a local file system. There is commercial support for this type of service from Subcloud.
As you can see there are a fair number of very usable FUSE based file systems ranging from something as seemingly simple as mounting .tar.gz files as a file system giving you the ability to read and write to them without having to uncompress and untar them, to encrypted file systems, to high performance distributed parallel file systems.
This article is not a HOWTO for using FUSE to create user-space file systems but merely a quick introduction to the wonderful world of user-space file systems and the amazingly cool things people have created. FUSE gives developers a great deal of flexibility and capability in designing file systems that would otherwise not be available because of the need to put them in the kernel (highly unlikely this would happen). Some really cool file system code can be written in languages other than C (but you can write them in C) using FUSE, addressing some very interesting niche requirements.
Keep an eye out in the near future as we take a look at some FUSE based file systems. It should be lots of fun.