Skip to content

System Calls

The contract between userspace and the kernel

What system calls are

A system call is a controlled entry point into the kernel. It:

  1. Saves CPU state — registers, flags
  2. Switches privilege — user mode (ring 3) → kernel mode (ring 0)
  3. Validates arguments — checks pointers, copies data from userspace
  4. Executes kernel code — the actual operation
  5. Returns to userspace — restores registers, switches back to ring 3

On x86-64, the syscall instruction triggers this transition. The kernel uses the value in rax as the syscall number to dispatch to the right handler.

Userspace                    Kernel
─────────────────────────────────────────────────────
glibc: write(fd, buf, len)
  │  mov rax, 1  (SYS_write)
  │  mov rdi, fd
  │  mov rsi, buf
  │  mov rdx, len
  │  syscall
  │              ──────────→  entry_SYSCALL_64
  │                           saves registers
  │                           calls sys_write()
  │                               → vfs_write()
  │              ←──────────  returns result in rax
  │  return rax (bytes written, or -errno)

Pages in this section

Page What it covers
Syscall Entry Path x86-64 entry, privilege switch, argument passing, vDSO
SYSCALL_DEFINE and dispatch How syscalls are defined, the dispatch table, ABI
Adding a new syscall Step-by-step walkthrough for kernel contributors
32-bit Compat Syscalls Supporting 32-bit userspace on 64-bit kernels; compat types and helpers
ptrace and Syscall Interception How strace, debuggers, and seccomp-notify intercept syscalls
vDSO and Virtual System Calls Kernel-accelerated clock reads without ring transitions
Syscall Auditing audit subsystem, audit_context, NETLINK_AUDIT, auditctl/ausearch/aureport
Syscall Restart Mechanisms ERESTARTSYS, ERESTART_RESTARTBLOCK, restart_block, restart_syscall()
Syscall War Stories Real bugs, ABI breaks, and lessons from the syscall interface

Quick reference

# List all syscalls with numbers
ausyscall --dump | head -20

# Trace syscalls of a process
strace ls

# Count syscalls
strace -c ls 2>&1

# See syscall overhead
perf stat -e 'syscalls:sys_enter_*' -- ls

# See syscall table in kernel
grep -r "SYSCALL_DEFINE" kernel/sys.c | head -10