Threads
are programming abstractions used in concurrent processing. A
kernel thread
is a way to implement background tasks inside the kernel. A background task can be busy handling asynchronous events or can be asleep, waiting for an event to occur. Kernel threads are similar to user processes, except that they live in kernel space and have access to kernel functions and data structures. Like user processes, kernel threads appear to monopolize the processor because of preemptive scheduling.
In this month
’
s
“
Gearheads,
”
let
’
s discuss kernel threads and develop an example that also demonstrates such as
process states,
wait queues,
and
user-mode helpers.
Built-in Kernel Threads
To see the kernel threads (also called
kernel processes
) running on your system, run the command
ps
–
ef
. You should see something similar to
Figure One.
FIGURE ONE:
A typical list of
Linux kernel threads
$ ps –ef UID PID PPID C STIME TTY TIME CMDroot 1 0 0 22:36 ? 00:00:00 init [3]root 2 1 0 22:36 ? 00:00:00 [ksoftirqd/0]root 3 1 0 22:36 ? 00:00:00 [events/0]root 38 3 0 22:36 ? 00:00:00 [pdflush]root 39 3 0 22:36 ? 00:00:00 [pdflush]root 29 1 0 22:36 ? 00:00:00 [khubd]root 695 1 0 22:36 ? 00:00:00 [kjournald]…root 3914 1 0 22:37 ? 00:00:00 [nfsd]root 3915 1 0 22:37 ? 00:00:00 [nfsd]…root 4015 3364 0 22:55 tty3 00:00:00 -bashroot 4066 4015 0 22:59 tty3 00:00:00 ps -ef
The output of
ps
–
ef
is a list of user and kernel processes running on your system. Kernel process names are surrounded by square brackets (
[]
).
The
[ksoftirqd/0]
kernel thread is an aid to implement
soft IRQs.
Soft IRQs are raised by interrupt handlers to request
“
bottom half
”
processing of portions of the interrupt handler whose execution can be deferred. The idea is to minimize the code inside interrupt handlersm which results in reduced interrupt-off times in the system, thus resulting in lower latencies.
ksoftirqd
ensures that a high load of soft IRQs neither starves the soft IRQs nor overwhelms the system. (On
Symmetric Multi-Processing
(SMP) machines, where multiple thread instances can run on different processors in parallel, one instance of
ksoftirqd
is created per processor to improve throughput. On SMP machines, the kernel processes are named
ksoftirqd/
n
, where
n
is the processor number.)
The
events/n
threads (where
n
is the processor number) help implement work queues, which are another way of deferring work in the kernel. If a part of the kernel wants to defer execution of work, it can either create its own work queue or make use of the default
events/
n
worker thread.
The
pdflush
kernel thread flushes dirty pages from the page cache. The page cache buffers accesses to the disk. To improve performance, actual writes to the disk are delayed until the
pdflush
daemon writes out dirtied data to disk. This is done if the available free memory dips below a threshold or if the page has remained dirty for a sufficiently long time. In the
2.4.*
kernels, these two tasks were respectively performed by separate kernel threads,
bdflush
and
kupdated.
You may have noticed that there are
two
instances of
pdflush
in the
ps
output. A new instance is created if the kernel senses that existing instances are becoming intolerably busy servicing disk queues. Launching new instances of
pdflush
improves throughput, especially if your system has multiple disks and many of them are busy.
The
khubd
thread, part of the Linux USB core, monitors the machine
’
s USB hub and configures USB devices when they are hot-plugged into the system.
kjournald
is the generic kernel journaling thread, which is used by file systems like
ext3.
The
Linux Network File System
(NFS) server is implemented using a set of kernel threads named
nfsd.
Creating a Kernel Thread
To illustrate kernel threads, let
’
s implement a simple example. Assume that you
’
d like the kernel to asynchronously invoke a user-mode program to send you a page or an email alert whenever it senses that the health of certain kernel data structures is unsatisfactory
—
for instance, free space in network receive buffers has dipped below a low watermark.
This is a candidate for a kernel thread because:
*
It
’
s a background task, since it has to wait for asynchronous events.
*
It needs access to kernel data structures, since the actual detection of events must be done by other parts of the kernel.
*
It has to invoke a user-mode helper program, which is a time consuming operation.
The kernel thread relinquishes the processor till it gets woken up by parts of the kernel that are responsible for monitoring the data structures of interest. It then invokes the user-mode helper program and passes on the appropriate identity code to the program
’
s environment. The user-mode program is registered with the kernel via the
/proc
file system.
Listing One
creates the kernel thread.
Listing One:
Creating a
Linux
kernel
thread
ret = kernel_thread (mykthread, NULL,
CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD);
The thread can be created in an appropriate place, for example, in
init/main.c.
The flags specify the resources to be shared between the parent and child threads:
CLONE_FILES
specifies that open files are to be shared, while
CLONE_SIGHAND
requests that signal handlers be shared.
Listing Two
is the actual kernel thread.
daemonize()
creates the thread without attached user resources, while
reparent_to_init()
changes the parent of the calling thread to the
init
task.
Each Linux thread has a single parent. If a parent process dies without waiting for its child to exit, the child becomes a
zombie
process and wastes resources. Re-parenting the child to the
init
task avoids this. In the
2.6
kernel, the
daemonize()
function itself internally invokes
reparent_to_init
.
Since
daemonize()
blocks all signals by default, you have to call
allow_signal()
to enable delivery if your thread desires to handle a particular signal. There are no signal handlers inside the kernel, so use
signal_pending()
to check for signals and perform the appropriate action. For debugging purposes, the code in
Listing Two
requests delivery of
SIGKILL
and dies if it
’
s received.
Listing Two:
Implementing the Kernel Thread
static DECLARE_WAIT_QUEUE_HEAD (myevent_waitqueue);
rwlock_t myevent_lock;
static int mykthread (void *unused)
{
unsigned int event_id = 0;
DECLARE_WAITQUEUE (wait, current);
/* The stuff required to become a kernel thread
* without attached user resources */
daemonize (“mykthread”);
reparent_to_init (); /* In 2.4 kernels */
/* Request delivery of SIGKILL */
allow_signal (SIGKILL);
/* The thread will sleep on this wait queue till it is
* woken up by parts of the kernel in charge of sensing
* the health of data structures of interest */
add_wait_queue (&myevent_waitqueue, &wait);
for (;;) {
/* Relinquish the processor till the event occurs */
set_current_state (TASK_INTERRUPTIBLE);
schedule ();
/* Die if I receive SIGKILL */
if (signal_pending (current)) break;
/* Control gets here when the thread is woken up */
read_lock (&myevent_lock); /* Critical section starts */
if (myevent_id) { /* Guard against spurious wakeups */
event_id = myevent_id;
read_unlock (&myevent_lock); /* Critical section ends */
/* Invoke the registered user-mode helper and
* pass the identity code in its environment */
run_umode_handler (event_id); /* See Listing Four */
} else {
read_unlock (&myevent_lock);
}
}
set_current_state (TASK_RUNNING);
remove_wait_queue (&myevent_waitqueue, &wait);
return 0;
}
If you compile this as part of the kernel, you can see the newly created thread,
mykthread,
in the
ps
output, as shown in
Figure Two.
FIGURE TWO:
The new thread,
mykthread,
is a child of
init
$ ps –efUID PID PPID C STIME TTY TIME CMDroot 1 0 0 21:56 ? 00:00:00 init [3]root 2 1 0 22:36 ? 00:00:00 [ksoftirqd/0]…root 111 1 0 21:56 ? 00:00:00 [mykthread]…
Before delving further into the thread implementation, let
’
s look at a code snippet that detects the event and awakens
mykthread
. Refer to
Listing Three
.
Listing Three:
Waking up the kernel thread
/* Executed by parts of the kernel that own the
data structures whose health you want to monitor */
/* … */
if (my_key_datastructure looks troubled) {
write_lock (&myevent_lock);
/* Fill in the identity of the data structure */
myevent_id = datastructure_id;
write_unlock (&myevent_lock);
/* Wake up mykthread */
wake_up_interruptible (&myevent_waitqueue);
}
/* … */
The kernel accomplishes useful work using a combination of
process contexts
and
interrupt contexts.
Process contexts aren
’
t tied to any interrupt context and vice versa.
Listing Two
executes in a process context, while
Listing Three
can run from both process and interrupt contexts.
Process and interrupt contexts communicate via kernel data structures. In the example,
myevent_id
and
myevent_waitqueue
are used for this communication.
myevent_id
contains the identity of the data structure that
’
s in trouble. Access to
myevent_id
is serialized using
spin locks.
(Kernel threads are preemptible only if
CONFIG_PREEMPT
is turned on during compile time. If
CONFIG_PREEMPT
is off or if you are running a 2.4 kernel without the preemption patch, your thread will freeze the system if it doesn
’
t go to sleep. If you comment out
schedule()
in
Listing Two
and disable
CONFIG_PREEMPT
in your kernel configuration, your system will lock up, too.)
Process States and Wait Queues
Let
’
s take a closer look at the code snippet that puts
mykthread
to sleep while waiting for events. The snippet is shown in
Listing Four.
LISTING FOUR:
How to put a
thread
to sleep
add_wait_queue (&myevent_waitqueue, &wait);
for (;;) {
/* .. */
set_current_state (TASK_INTERRUPTIBLE);
schedule ();
/* Point A */
/* .. */
}
set_current_state (TASK_RUNNING);
remove_wait_queue (&myevent_waitqueue, &wait);
Wait queues hold threads that need to wait for an event or a system resource. A thread in a wait queue sleeps until it
’
s woken by another thread or an interrupt handler that
’
s responsible for detecting the event. Queuing and de-queuing are done using the
add_wait_queue()
and
remove_wait_queue()
functions, while waking up queued tasks is accomplished via the
wake_up_interruptible()
routine.
In the above code snippet,
set_current_state()
is used to set the run state of the kernel thread. A kernel thread (or a normal process) can be in either of the following states:
running, interruptible, uninterruptible, zombie, stopped, traced,
or
dead
. These states are defined in
include/linux/sched.h.
*
A process in the running state (
TASK_RUNNING
) is in the scheduler
run queue
and is a candidate for CPU time according to the scheduling algorithm.
*
A task in the
interruptible
state (
TASK_INTERRUPTIBLE
) is waiting for an event to occur and isn
’
t in the scheduler run queue. When the task gets woken up or if a signal is delivered to it, it re-enters the run queue.
*
The
uninterruptible
state (
TASK_UNINTERRUPTIBLE
) is similar to the
interruptible
state except that receipt of a signal won
’
t put the task back into the run queue.
*
A task in the zombie state (
EXIT_ZOMBIE
) has terminated, but its parent did not wait for the task to complete.
*
A stopped task (
TASK_STOPPED
) has stopped execution due to receipt of certain signals.
mykthread
sleeps on a wait queue (
myevent_waitqueue
) and changes its state to
TASK_INTERRUPTIBLE
, signaling that it desires to opt out of the scheduler run queue. The call to
schedule()
asks the scheduler to choose and run a new task from its run queue.
When another part of the kernel awakens
mykthread
using
wake_up_interruptible()
as shown in
Listing Three,
the thread is put back into the scheduler run queue. The process state also gets changed to
TASK_RUNNING
, so there
’
s no race condition even if the wake up occurs between the time the task state is set to
TASK_INTERRUPTIBLE
and the
schedule()
function is called. The thread also gets back into the run queue if a
SIGKILL
signal is delivered to it. When the scheduler subsequently picks
mykthread
from the run queue, execution resumes at
Point A.
User-Mode Helpers
The kernel supports a mechanism for invoking user-mode programs to help perform certain functions. For example, if module auto-loading is enabled, the kernel dynamically loads necessary modules on demand using a user-mode module loader. The default loader is
/sbin/modprobe,
but you can change it by registering your own loader in
/proc/sys/kernel/modprobe.
Similarly, the kernel notifies user space about hot-plug events by invoking the program registered in
/proc/sys/kernel/hotplug,
which is by default
/sbin/hotplug.
Listing Four
contains the function used by
mykthread
to notify user space about detected events. The user-mode program to invoke can be registered via the
sysctl
interface in the
/proc
file system. To do this, make sure that
CONFIG_SYSCTL
is enabled in your kernel configuration and add an entry to the
kern_table
array in
kernel/sysctl.c:
{KERN_MYEVENT_HANDLER, "myevent_handler", &myevent_handler, 256, 0644, NULL, &proc_dostring, &sysctl_string}
This creates an entry
/proc/sys/kernel/myevent_handler
in the
/proc
file system. To register your user-mode helper, do the following:
$ echo /path/to/helper > /proc/sys/kernel/myevent_handler
This makes
/path/to/helper
execute when the function in
Listing Five
runs.
Listing Five:
Invoking User Mode Helpers
/* Called from Listing Two */
static void run_umode_handler (int event_id)
{
int i = 0;
char *argv[2], *envp[4], *buffer = NULL;
int value;
argv[i++] = myevent_handler; /* Defined earlier in kernel/sysctl.c */
/* Fill in the id corresponding to the data structure in trouble */
if (!(buffer = kmalloc (32, GFP_KERNEL))) return;
sprintf (buffer, “TROUBLED_DS=%d”, event_id);
/* If no user-mode handlers are found, return */
if (!argv[0]) return;
argv[i] = 0;
/* Prepare the environment for /path/to/helper */
i = 0;
envp[i++] = “HOME=/”;
envp[i++] = “PATH=/sbin:/bin:/usr/sbin:/usr/bin”;
envp[i++] = buffer;
envp[i] = 0;
/* Execute the user-mode program, /path/to/helper */
value = call_usermodehelper (argv[0], argv, envp, 0);
/* Check return values */
…
kfree (buffer);
}
The identity of the troubled kernel data structure is passed as an environment variable (
TROUBLED_DS
) to the user-mode helper. The helper can be a simple script like the following that sends you an email alert containing the information that it gleaned from its environment:
#!/bin/bashecho Kernel datastructure $TROUBLED_DS is in trouble | mail –s Alert root
call_usermodehelper()
has to be executed from a process context and runs with root capabilities. It
’
s implemented using a work queue in 2.6 kernels.
Looking at the Sources
In the 2.6 source tree, the
ksoftirqd, pdflush,
and
khubd
kernel threads live in
kernel/softirq.c,
mm/pdflush.c,
and
drivers/usb/core/hub.c,
respectively.
The
daemonize()
function can be found in
kernel/exit.c
in the 2.6 sources and in
kernel/sched.c
in the 2.4 sources. For the implementation of invoking user-mode helpers, look at
kernel/kmod.c.
Sreekrishnan Venkateswaran has been working for IBM India since 1996. His recent Linux projects include putting Linux onto a wristwatch, a PDA, and a pacemaker programmer. You can reach Krishnan at class=”emailaddress”>krishhna@gmail.com.