try/catch in Linux Kernel

In higher level languages like Java and C#, one can recover from unexpected bahaviour using try/catch like mechanism. Things are different inside Linux kernel. The code is considered trusted and reliable. Faults and exceptions have severe penalities. For example, division by zero will likely cause a kernel oops and the whole system to hang. So how would you run some code which you know can fault? Enter extable, a kernel mechanism by which one can have try/catch like facility. It is not a high-level extension to C programming language. Instead it works at assembly level.

We will take following piece of blatant division by zero and make it safe to run inside kernel using this exception handling mechanism. We will be assuming x86_64 platform in this article.

void blatant_div_by_zero(void)
{
        /* quotient and divisor */
        int q, d;

        d = 0;
        asm ("movl $20, %%eax;"
        "movl $0, %%edx;"
        "div %1;"
        "movl %%eax, %0;"
        : "=r"(q)
        : "b"(d)
        :"%eax"
        );

        pr_debug("quotient is %d\n", q);
}

extable

extable is basically an ELF section inside Linux kernel binary image which contains mappings between potentially faulting instructions and their respective handlers. Users interface with extable through _ASM_EXTABLE* family of macros. All macros delegate to one macro:

_ASM_EXTABLE_HANDLE(from, to, handler)

`from`: address of faulting instruction, i.e. the ‘try’ part
`to`: address to which control will be transferred when fault occurs, i.e. the ‘catch’ part
`handler`: is the function which transfers execution to catch part

`handler` function is important here. It has following signature:

bool handler(const struct exception_table_entry *fixup,
        struct pt_regs *regs, int trapnr)

`fixup` contains address of catch part, i.e. ‘to’ argument of _ASM_EXTABLE_HANDLE. `regs` contains registers, as they were when the fault happened and `trapnr` is the fault number. handler transfers control to catch part by simply setting instruction pointer register in regs argument to fixup address. However, before doing that, it has an opportunity to set up environment for catch part. That’s where the different wrappers around _ASM_EXTABLE_HANDLE come in. Each wrapper uses a differnt handler function. Let’s take a couple of examples from arch/x86/include/asm/asm.h.

# define _ASM_EXTABLE(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_default)

# define _ASM_EXTABLE_FAULT(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_fault)

ex_handler_default() is a handler function which simply sets instruction pointer to catch part. ex_handler_fault() moves fault number to rax register in addition to setting instruction pointer to catch part, so catch part can access fault number.

Context of handler()

At this point, some might be wondering which context does handler() function execute in. Short answer is interrupt handler context. If that works for you, you may jump to the next section. Here we will do a very quick detour of how this interrupt context comes about.

After executing an instruction and before going into next instruction, the control unit checks if the just-executed instruction resulted in an interrupt or exception. If an exception did occur, then it determines vector associated with it. In case of our divide-by-zero fault, it will be vector 0. The control unit will then read the corresponding entry – zeroth entry for divide-by-zero – of Interrupt Descriptor Table (IDT). It is an array of entries inside processor’s RAM. Address of that table is stored inside the ‘idtr’ register. That entry in IDT in turn points to an entry in another table of entries called Global Descriptor Table (GDT). The entry inside GDT then info – such as base address of segment – related to handler of the exception whose vector we started out with. Processor then performs some checks and stores context when the exception occurred (e.g. register contents) on stack. When the handler() function discussed above sets instruction pointer, it actually sets instruction pointer in this saved context. After that process jumps to interrupt handler. It is this interrupt handler which then looks up extable section to find the handler() function for the faulting instruction. If such a handler function is found, the processor jumps to it. Therefore our handler function is executed inside interrupt context. Ultimately, when execution returns from interrupt context, the context that was saved on the stack before interrupt handler was executed, is restored. Since our handler function above sets instruction pointer and potentially other registers which are part of that pre-interrupt context which will be restored, execution will jump to ‘catch’ part after interrupt handler returns.

Solution and How it Works

Using extable macro’s we can re-write our blatant_div_by_zero() function in a safe way. Let’s first see what the final solution looks like. Then we will break it down into easily understandable parts.

void blatant_div_by_zero(void)
{
        /* quotient and divisor */
        int q, d;

        d = 0;
        asm volatile ("movl $20, %%eax;"
        "movl $0, %%edx;"
        "1: div %1;"
        "movl %%eax, %0;"
        "2:\t\n"
        "\t.section .fixup,\"ax\"\n"
        "3:\tmov\t$-1, %0\n"
        "\tjmp\t2b\n"
        "\t.previous\n"
        _ASM_EXTABLE(1b, 3b)
        : "=r"(q)
        : "b"(d)
        :"%eax"
        );

        pr_debug("quotient is %d\n", q);
}

There are two key differences here. First, we are adding some code to a section called .fixup. Second, _ASM_EXTABLE() macro. Code in .fixup section is the ‘catch’ part. _ASM_EXTABLE(), as we saw above, uses a default handler which merely sets instruction pointer to ‘catch’ part:

__visible bool ex_handler_default(const struct exception_table_entry *fixup,
        struct pt_regs *regs, int trapnr)
{
        regs->ip = ex_fixup_addr(fixup);
        return true;
}

First argument of _ASM_EXTABLE is label 1b for faulting instruction “1: div %1;”. Letter ‘b’ in 1b means backwards and doesn’t have functional significance in our context. Second argument, 3b, is the catch part which is inside .fixup section. Now let’s look at the execution flow with this set up. Division by zero instruction is executed, it faults, goes into exception handler at vector 0, which finds the correct handler – in our case ex_handler_default() – which in turn, sets instruction pointer to address at label 3, our catch part. As noted, label 3 is inside .fixup section. Inside label 3, we set quotient q to -1 and jump to label 2 which is known safe point after faulting instruction. From there, execution continues to next instruction like normal. Note that the next instruction is neither “\t.section .fixup,\”ax\”\n” (which is a directive to linker) nor “3:\tmov\t$-1, %0\n” (which lives in a different section from where instruction at label 2 is).

Now let’s see what a real object file looks like when compiled with above extable code. Here is a hello-world kernel module, modified to contain blatant_div_by_zero() function. After compiling it, we can inspect its sections using readelf*:

$ readelf -S hello.ko
There are 34 section headers, starting at offset 0x10b8:

Section Headers:
[Nr] Name   Type Address Offset
        Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
       0000000000000000 0000000000000000 0 0 0
[ 1] .note.gnu.build-i NOTE 0000000000000000 00000040
       0000000000000024 0000000000000000 A 0 0 4
[ 2] .text PROGBITS 0000000000000000 00000064
       0000000000000000 0000000000000000 AX 0 0 1
[ 3] .init.text PROGBITS 0000000000000000 00000064
       0000000000000034 0000000000000000 AX 0 0 1
[ 4] .rela.init.text RELA 0000000000000000 00000cd8
       0000000000000060 0000000000000018 I 31 3 8
[ 5] .fixup PROGBITS 0000000000000000 00000098
       000000000000000a 0000000000000000 AX 0 0 1
[ 6] .rela.fixup RELA 0000000000000000 00000d38
       0000000000000018 0000000000000018 I 31 5 8
[ 7] .exit.text PROGBITS 0000000000000000 000000a2
       000000000000000c 0000000000000000 AX 0 0 1
[ 8] .rela.exit.text RELA 0000000000000000 00000d50
       0000000000000030 0000000000000018 I 31 7 8
[ 9] .rodata.str1.1 PROGBITS 0000000000000000 000000ae
       000000000000001c 0000000000000001 AMS 0 0 1
[10] __ex_table PROGBITS 0000000000000000 000000cc
       000000000000000c 0000000000000000 A 0 0 4
[11] .rela__ex_table RELA 0000000000000000 00000d80
       0000000000000048 0000000000000018 I 31 10 8
[...]

Above is partial output, containing the sections which are relevant to our discussion. First, there is .fixup section at number 5. Its type of PROGBITS means it will be loaded in memory and flag X means it will be executable, which is what we want since this will execute the catch part. Further down at number 10 is __ex_table section. This is loaded into memory (type PROGBITS) but not marked as executable. We will only use it to read address of handler function but won’t need to execute the section itself. There is a function in kernel/extable.c which searches extables for correct handler for a given faulting address. It is reproduced below:

/* Given an address, look for it in the exception tables. */
const struct exception_table_entry *search_exception_tables(unsigned long addr)
{
        const struct exception_table_entry *e;

        e = search_extable(__start___ex_table,
        __stop___ex_table - __start___ex_table, addr);
        if (!e)
                e = search_module_extables(addr);
        return e;
}

As you can see, if the search in kernel image fails, the function looks in kernel modules’ extables for a corresponding entry through the call to search_module_extables(addr). This is the function which will search our hello.ko module for entry corresponding to division-by-zero instruction.


* Apologies for the formatting. Please bear until I find a way to fix it in the wordpress editor.

Title Image: Catch taken when playing cricket in street – a popular pastime in South Asia. Taken from https://www.flickr.com/photos/flickcoolpix/

Advertisements

2 thoughts on “try/catch in Linux Kernel

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s