Linux Today: Linux News On Internet Time.

That massive filesystem thread

Apr 16, 2009, 12:32 (1 Talkback[s])
(Other stories by Jonathan Corbet)

"One of the problems is at least somewhat understood: a call to fsync() on an ext3 filesystem will force the filesystem journal (and related file data) to be committed to disk. That operation can create a lot of write activity which must be waited for. But contemporary I/O schedulers tend to favor read operations over writes. Most of the time, that is a rational choice: there is usually a process waiting for a read to complete, but writes can be done asynchronously. A journal commit is not asynchronous, though, and it can cause a lot of things to wait while it is in progress. So it would be better not to put journal I/O operations at the end of the queue.

"In fact, it would be better not to make journal operations contend with the rest of the system at all. To that end, Arjan van de Ven has long maintained a simple patch which gives the kjournald thread realtime I/O priority. According to Alan Cox, this patch alone is sufficient to make a lot of the problems go away. The patch has never made it into the mainline, though, because Andrew Morton has blocked it. This patch, he says, does not address the real problem, and it causes a lot of unrelated I/O traffic to benefit from elevated priority as well. Andrew says the real fix is harder:

"The bottom line is that someone needs to do some serious rooting through the very heart of JBD transaction logic and nobody has yet put their hand up. If we do that, and it turns out to be just too hard to fix then yes, perhaps that's the time to start looking at palliative bandaids.

"Bandaid or not, this approach has its adherents. The ext4 filesystem has a new mount option (journal_ioprio) which can be used to set the I/O priority for journaling operations; it defaults to something higher than normal (but not realtime). More recently, Ted Ts'o has posted a series of ext3 patches which sets the WRITE_SYNC flag on some journal writes. That flag marks the operations as synchronous, which will keep them from being blocked by a long series of read operations. According to Ted, this change helps quite a bit, at least when there is a lot of read activity going on. The ext3 changes have not yet been merged for 2.6.30 as of this writing (none of Ted's trees have), but chances are they will go in before 2.6.30-rc1."

Complete Story

Related Stories: