System Call Table in Linux

System call table is an array of function pointers. It is defined in kernel space as variable sys_call_table and it contains pointers to functions which implement system calls. Index of each function pointer in the array is the system call number for that syscall. These are denoted by NR_* macros in header files, such as /usr/include/asm/unistd_64.h for x86_64.

On x86 systems, when a user mode program makes a system call it puts the system call number in RAX register and calls sysenter assembly instruction. This instruction switches CPU from user mode into kernel mode. It sets instruction pointer RIP to the value stored in SYSENTER_EIP_MSR register and stack pointer RSP to the value stored in SYSENTER_ESP_MSR register. MSR is short for Model Specific Register.  These are registers which are present on specific models of Intel processors, such as 64-bit processors only. The above mentioned MSRs are set up by Linux kernel to contain addresses of system_call() kernel function for RIP and kernel mode stack belonging to the process which started the system call for RSP (yes each process Рor more specifically thread Рhas a kernel mode stack in addition to user mode stack).

system_call() function is like a multiplexer for syscalls. It saves hardware context on stack, performs some checks, e.g. whether the process is being syscall-traced in which case it needs to notify the tracer, and if all checks pass, ultimately jump into function pointed to by the pointer at syscall number index inside system call table. Return from syscall happens with sysexit assembly instruction. Upon return, the hardware context is restored and execution continues in user-space code which usually is a libc wrapper routine.

Advertisements