Skip to content

ftrace: Function Tracer

The kernel's built-in function tracing infrastructure

What ftrace is

ftrace is the kernel's main tracing framework, accessible via tracefs at /sys/kernel/tracing/. Created by Steven Rostedt and introduced in Linux 2.6.27 (commit) (LWN), it:

  • Records every kernel function call (function tracer)
  • Traces execution paths and call graphs (function_graph tracer)
  • Provides a ring buffer for low-overhead event collection
  • Is the backend for BPF fentry/fexit (Linux 5.5) (commit), kprobes, and tracepoints

tracefs interface

tracefs was introduced as a standalone filesystem in Linux 4.1 (commit).

# Mount tracefs (if not already mounted)
mount -t tracefs tracefs /sys/kernel/tracing

# Key files:
/sys/kernel/tracing/
├── current_tracer           # active tracer (nop/function/function_graph/etc.)
├── tracing_on               # 1=enable, 0=disable
├── trace                    # read the trace buffer (blocking)
├── trace_pipe               # stream trace output (consuming read)
├── trace_options            # comma-separated options
├── available_tracers        # list of compiled-in tracers
├── available_filter_funcs   # functions that can be filtered
├── set_ftrace_filter        # limit function tracing to these functions
├── set_ftrace_notrace       # exclude these functions
├── set_graph_function       # function_graph: enter graph at these
├── set_graph_notrace
├── tracing_cpumask          # which CPUs to trace
├── per_cpu/                 # per-CPU ring buffers
   └── cpu0/
       ├── trace
       └── stats
├── events/                  # tracepoint events
   ├── enable               # enable all events
   └── syscalls/
       ├── sys_enter_read/
          ├── enable
          ├── filter
          └── format
       └── ...
├── instances/               # separate trace instances (non-interfering)
└── snapshot                 # freeze a snapshot of the ring buffer

Function tracer

# Trace all function calls
echo function > /sys/kernel/tracing/current_tracer
echo 1 > /sys/kernel/tracing/tracing_on
sleep 0.001
echo 0 > /sys/kernel/tracing/tracing_on
cat /sys/kernel/tracing/trace | head -30
# bash-1234  [000] ..... 12345.678901: do_sys_open <-do_open_execat
# bash-1234  [000] ..... 12345.678902: _raw_spin_lock <-jiffies_to_timespec64

# Limit to specific functions
echo "tcp_*" > /sys/kernel/tracing/set_ftrace_filter
echo function > /sys/kernel/tracing/current_tracer
echo 1 > /sys/kernel/tracing/tracing_on

# Filter by module
echo ":mod:ext4" > /sys/kernel/tracing/set_ftrace_filter

# Multiple patterns (append with >>)
echo "schedule" >> /sys/kernel/tracing/set_ftrace_filter

# Clear filter (trace all)
echo "" > /sys/kernel/tracing/set_ftrace_filter

Function graph tracer

Shows call hierarchy and execution time. Added by Frédéric Weisbecker in Linux 2.6.29 (commit).

echo function_graph > /sys/kernel/tracing/current_tracer
echo vfs_read > /sys/kernel/tracing/set_graph_function  # entry point
echo 1 > /sys/kernel/tracing/tracing_on
cat /some/file  # trigger the trace
echo 0 > /sys/kernel/tracing/tracing_on
cat /sys/kernel/tracing/trace
# CPU DURATION              FUNCTION CALLS
#  0  |  8.234 us  |        } /* vfs_read */
#  0  |            |        vfs_read() {
#  0  |            |          rw_verify_area() {
#  0  |  0.512 us  |            security_file_permission();
#  0  |  1.234 us  |          }
#  0  |            |          generic_file_read_iter() {
#  0  |            |            filemap_read() {

How ftrace instruments functions

When CONFIG_DYNAMIC_FTRACE=y and CONFIG_HAVE_FENTRY=y, the compiler inserts a call to __fentry__ (or mcount) at the start of every traceable function:

/* Every kernel function begins with: */
func:
    callq   __fentry__      ; 5 bytes (E8 + 4-byte offset)
    ; ... actual function code ...

When tracing is disabled, these calls are patched to nop instructions at boot time — zero overhead when not tracing.

When enabled:

func:
    callq   ftrace_caller   ; patched back in

ftrace_caller looks up which tracers are registered for this function and calls them. The patching is done with text_poke_bp() — a breakpoint-based patching technique that avoids stopping all CPUs.

Since Linux 5.x with CONFIG_HAVE_FENTRY, the compiler uses __fentry__ (called before the function prologue saves registers) rather than mcount, which makes the hook faster and allows reliable stack unwinding.

Ring buffer

ftrace uses a lockless per-CPU ring buffer (kernel/trace/ring_buffer.c):

/* Architecture: */
struct ring_buffer {
    struct ring_buffer_per_cpu *buffers[NR_CPUS];
};

struct ring_buffer_per_cpu {
    /* Writer: */
    unsigned long   tail_page;   /* current write page */
    unsigned long   commit_page; /* last committed page */
    /* Reader: */
    unsigned long   reader_page;
    unsigned long   head_page;
    /* ... */
};
  • Each CPU writes to its own buffer (no lock contention)
  • Ring behavior: old events are overwritten when full (unless trace_options has overwrite disabled)
  • Events are timestamped with per-CPU clock
# Check buffer size
cat /sys/kernel/tracing/buffer_size_kb
# 1408 (per CPU)

# Increase for long captures
echo 65536 > /sys/kernel/tracing/buffer_size_kb

Dynamic events: kprobe_events and uprobe_events

# Add a kprobe on tcp_sendmsg: print first argument (sk)
echo 'p:my_probe tcp_sendmsg sk=%di' > /sys/kernel/tracing/kprobe_events
echo 1 > /sys/kernel/tracing/events/kprobes/my_probe/enable

cat /sys/kernel/tracing/trace_pipe
# bash-1234 [000] ..... 12345.0: my_probe: (tcp_sendmsg+0x0) sk=0xffff888...

# Remove the probe
echo '-:my_probe' >> /sys/kernel/tracing/kprobe_events

# Add a kretprobe: print return value
echo 'r:my_ret tcp_sendmsg retval=$retval' > /sys/kernel/tracing/kprobe_events

# Add uprobe
echo 'p:myapp /usr/bin/myapp:0x1234 arg1=%di' > /sys/kernel/tracing/uprobe_events

Event tracing: tracepoints

# List available tracepoint events
ls /sys/kernel/tracing/events/

# Enable all events in a subsystem
echo 1 > /sys/kernel/tracing/events/block/enable

# Enable a specific event
echo 1 > /sys/kernel/tracing/events/syscalls/sys_enter_read/enable

# Filter: only events from PID 1234
echo "common_pid == 1234" > /sys/kernel/tracing/events/syscalls/sys_enter_read/filter

# View event format
cat /sys/kernel/tracing/events/syscalls/sys_enter_read/format
# name: sys_enter_read
# ID: 682
# format:
#     field:unsigned short common_type;    offset:0; size:2; signed:0;
#     field:unsigned char common_flags;    offset:2; size:1; signed:0;
#     field:unsigned char common_preempt_count;  offset:3; size:1; signed:0;
#     field:int common_pid;                offset:4; size:4; signed:1;
#     field:int __syscall_nr;              offset:8; size:4; signed:1;
#     field:unsigned int fd;               offset:16; size:8; signed:0;
#     field:char * buf;                    offset:24; size:8; signed:0;
#     field:size_t count;                  offset:32; size:8; signed:0;

trace-cmd: the ftrace frontend

# Record syscalls for 5 seconds
trace-cmd record -e 'syscalls:sys_enter_*' sleep 5

# Record function graph for ext4
trace-cmd record -p function_graph -l "ext4_*" sleep 1

# Report
trace-cmd report | head -50

# Filter by CPU
trace-cmd report --cpu 0 | head -20

# Live tracing
trace-cmd stream -e sched:sched_switch

# kernel-shark: GUI frontend for trace-cmd output
kernelshark trace.dat

trace instances: non-interfering sessions

# Create a private instance (doesn't affect the global trace)
mkdir /sys/kernel/tracing/instances/myinstance

# Each instance has its own ring buffer and tracer
echo function > /sys/kernel/tracing/instances/myinstance/current_tracer
echo "net_*" > /sys/kernel/tracing/instances/myinstance/set_ftrace_filter
echo 1 > /sys/kernel/tracing/instances/myinstance/tracing_on

# Multiple tools can trace simultaneously without interfering
cat /sys/kernel/tracing/instances/myinstance/trace_pipe &
# ... while main trace continues normally ...

rmdir /sys/kernel/tracing/instances/myinstance  # cleanup

Further reading