System Calls
The contract between userspace and the kernel
What system calls are
A system call is a controlled entry point into the kernel. It:
- Saves CPU state — registers, flags
- Switches privilege — user mode (ring 3) → kernel mode (ring 0)
- Validates arguments — checks pointers, copies data from userspace
- Executes kernel code — the actual operation
- Returns to userspace — restores registers, switches back to ring 3
On x86-64, the syscall instruction triggers this transition. The kernel uses the value in rax as the syscall number to dispatch to the right handler.
Userspace Kernel
─────────────────────────────────────────────────────
glibc: write(fd, buf, len)
│
│ mov rax, 1 (SYS_write)
│ mov rdi, fd
│ mov rsi, buf
│ mov rdx, len
│ syscall
│ ──────────→ entry_SYSCALL_64
│ saves registers
│ calls sys_write()
│ → vfs_write()
│ ←────────── returns result in rax
│
│ return rax (bytes written, or -errno)
Pages in this section
| Page | What it covers |
|---|---|
| Syscall Entry Path | x86-64 entry, privilege switch, argument passing, vDSO |
| SYSCALL_DEFINE and dispatch | How syscalls are defined, the dispatch table, ABI |
| Adding a new syscall | Step-by-step walkthrough for kernel contributors |
| 32-bit Compat Syscalls | Supporting 32-bit userspace on 64-bit kernels; compat types and helpers |
| ptrace and Syscall Interception | How strace, debuggers, and seccomp-notify intercept syscalls |
| vDSO and Virtual System Calls | Kernel-accelerated clock reads without ring transitions |
| Syscall Auditing | audit subsystem, audit_context, NETLINK_AUDIT, auditctl/ausearch/aureport |
| Syscall Restart Mechanisms | ERESTARTSYS, ERESTART_RESTARTBLOCK, restart_block, restart_syscall() |
| Syscall War Stories | Real bugs, ABI breaks, and lessons from the syscall interface |