Skip to content

Understanding /proc/vmstat

Counters and gauges for memory subsystem diagnostics

What is /proc/vmstat?

/proc/vmstat exposes the kernel's internal memory management counters. Unlike /proc/meminfo which shows current state, /proc/vmstat includes both gauges (current values) and counters (monotonically increasing since boot).

The output is generated by vmstat_show() in mm/vmstat.c, which aggregates per-zone, per-node, and per-CPU statistics.

$ cat /proc/vmstat | head -20
nr_free_pages 131072
nr_zone_inactive_anon 262144
nr_zone_active_anon 524288
nr_zone_inactive_file 196608
nr_zone_active_file 131072
...
pgfault 48392017
pgmajfault 12847
pgscan_kswapd 0
pgscan_direct 0

Counters vs Gauges

This distinction is critical for monitoring. /proc/vmstat mixes both types.

Gauges (current state — can go up or down)

Prefixed with nr_. Reflect the current value, not cumulative. Read directly for the current state.

Field What it measures
nr_free_pages Free pages in the buddy allocator
nr_inactive_anon / nr_active_anon Anonymous pages on LRU lists
nr_inactive_file / nr_active_file File pages on LRU lists
nr_dirty Pages modified but not yet written to disk
nr_writeback Pages currently being written to disk
nr_slab_reclaimable Slab pages the kernel can free
nr_slab_unreclaimable Slab pages that cannot be freed
nr_page_table_pages Pages used for page tables
nr_shmem Shared memory / tmpfs pages
nr_file_pages Total pages in the page cache
nr_anon_pages Mapped anonymous pages
nr_unevictable Locked/unevictable pages
nr_swapcached Pages in the swap cache
nr_kernel_stack Kernel stack usage (in KiB)

Counters (monotonically increasing since boot)

Defined in enum vm_event_item. You must compute deltas between two readings to get rates.

Gotcha: nr_dirtied and nr_written have the nr_ prefix but are actually counters (total pages dirtied/written since boot), not gauges.

Page Fault Counters

Field What it counts Source
pgfault Total page faults (minor + major) handle_mm_fault()
pgmajfault Major faults only (required disk I/O) mm/memory.c, mm/filemap.c

Minor faults = pgfault - pgmajfault. A minor fault means the page was already in memory (page cache or COW) and just needed a page table entry. A major fault means the kernel had to read from disk.

Diagnostic: If pgmajfault / pgfault ratio exceeds ~1%, I/O latency is likely impacting application performance.

Reclaim Counters

These are the most important counters for diagnosing memory pressure.

Scanning vs Stealing

Field What it counts
pgscan_kswapd Pages examined by kswapd (background reclaim)
pgscan_direct Pages examined by direct reclaim (allocating process, synchronous)
pgsteal_kswapd Pages successfully reclaimed by kswapd
pgsteal_direct Pages successfully reclaimed by direct reclaim

All incremented in mm/vmscan.c in shrink_inactive_list() and related functions.

By page type:

Field What it counts
pgscan_anon Anonymous pages scanned
pgscan_file File-backed pages scanned
pgsteal_anon Anonymous pages reclaimed
pgsteal_file File-backed pages reclaimed

Reclaim Efficiency

efficiency = pgsteal / pgscan
Ratio Meaning
> 0.8 Efficient — plenty of cold/clean pages to reclaim
0.3 - 0.8 Moderate pressure — some pages are dirty or pinned
< 0.3 Struggling — most scanned pages cannot be freed, OOM may follow

Compute separately for kswapd and direct reclaim, and for anon vs file, to pinpoint the bottleneck.

Allocation Stalls

Field What it counts
allocstall_normal Processes that entered direct reclaim in the Normal zone
allocstall_dma32 Same for DMA32 zone
allocstall_dma Same for DMA zone
allocstall_movable Same for Movable zone

Incremented in mm/page_alloc.c when __alloc_pages_direct_reclaim() is entered.

Every increment means a process blocked waiting for memory. This directly causes latency spikes. If allocstall_normal is incrementing at more than a few per second sustained, the system needs more free memory headroom.

Swap Counters

Field What it counts
pswpin Pages read from swap (swap-in)
pswpout Pages written to swap (swap-out)
pgpgin All pages read from block devices (KiB) — includes filesystem I/O
pgpgout All pages written to block devices (KiB)

pswpin/pswpout are specifically swap operations. pgpgin/pgpgout are broader (all block I/O).

LRU Movement

Field What it counts
pgactivate Pages promoted: inactive -> active (accessed while inactive)
pgdeactivate Pages demoted: active -> inactive (candidate for reclaim)
pglazyfree Pages marked for lazy free via MADV_FREE
pglazyfreed Lazyfree pages actually reclaimed

Compaction Counters

Field What it counts
compact_stall Processes that blocked waiting for compaction
compact_success Compaction runs that created a contiguous block
compact_fail Compaction runs that failed
compact_migrate_scanned Pages scanned by migration scanner
compact_free_scanned Pages scanned by free scanner
compact_isolated Pages isolated for migration
compact_daemon_wake Times kcompactd was woken

Source: mm/compaction.c.

Failure rate: compact_fail / (compact_fail + compact_success) above 50% means severe fragmentation with unmovable pages blocking compaction.

THP Counters

Field What it counts
thp_fault_alloc THP allocated on page fault (synchronous)
thp_fault_fallback THP allocation failed on fault, fell back to 4KB
thp_collapse_alloc THP allocated by khugepaged (asynchronous collapse)
thp_collapse_alloc_failed khugepaged failed to allocate THP
thp_split_page THP split back into base pages
thp_split_pmd PMD-level split (PTEs created to replace PMD entry)
thp_swpout THP swapped out as a whole (since v4.13)
thp_swpout_fallback THP swap-out fell back to splitting

Source: mm/huge_memory.c, mm/khugepaged.c.

OOM Counter

Field What it counts
oom_kill Times the OOM killer was invoked

Incremented in mm/oom_kill.c.

Diagnostic Patterns

Detecting memory pressure

# Watch reclaim activity (1-second deltas)
watch -n 1 'grep -E "pgscan_|pgsteal_|allocstall" /proc/vmstat'
Signal Meaning
pgscan_kswapd increasing Background reclaim active — moderate pressure
pgscan_direct increasing Direct reclaim active — serious pressure, processes blocking
allocstall_normal increasing Allocation stalls — latency impact on applications
pgsteal / pgscan ratio dropping Reclaim efficiency falling — running out of easy-to-reclaim pages

Detecting swap thrashing

grep -E "pswpin|pswpout|pgmajfault" /proc/vmstat

High pswpin + pswpout with high pgmajfault = the system is swapping actively and applications are faulting pages back in. The working set does not fit in RAM.

Detecting THP problems

grep "thp_" /proc/vmstat
Pattern Problem
thp_fault_fallback rising rapidly Memory too fragmented for 2MB allocations
thp_fault_alloc + thp_split_page both rising THP churn — allocated then split, wasting CPU
compact_stall high + thp_fault_fallback high Stalling on compaction but still failing

Fix: Switch to madvise mode or set defrag=defer:

echo madvise > /sys/kernel/mm/transparent_hugepage/enabled

Detecting compaction problems

Signal Problem
compact_stall increasing Processes blocking on compaction
compact_fail >> compact_success Unmovable pages blocking defragmentation
compact_migrate_scanned >> compact_isolated Migration scanner finds pages but cannot move them

Using /proc/vmstat with Tools

Direct reading (most complete)

# Compute deltas (rate per second)
paste <(cat /proc/vmstat) <(sleep 1; cat /proc/vmstat) | \
  awk '$1 == $3 && $2 != $4 {print $1, $4-$2, "delta/s"}'

vmstat command (procps)

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
Column /proc/vmstat source
si (swap in) pswpin delta, converted to KB/s
so (swap out) pswpout delta, converted to KB/s
bi (block in) pgpgin delta
bo (block out) pgpgout delta

sar command (sysstat)

# Page fault and reclaim rates
sar -B 1
# fault/s = pgfault delta, majflt/s = pgmajfault delta
# pgscank/s = pgscan_kswapd delta, pgscand/s = pgscan_direct delta

# Swap rates
sar -W 1
# pswpin/s, pswpout/s from their respective counters

Try It Yourself

# See which counters are actively changing (memory pressure test)
# Terminal 1: generate pressure
stress --vm 2 --vm-bytes 80% --timeout 30s

# Terminal 2: watch counters change
watch -d -n 1 'grep -E "pgscan|pgsteal|allocstall|pgmajfault|pswp|oom_kill|compact_stall" /proc/vmstat'

# Check reclaim efficiency
awk '/pgscan_kswapd/{scan=$2} /pgsteal_kswapd/{steal=$2} END{if(scan>0) printf "kswapd efficiency: %.1f%%\n", steal/scan*100; else print "No kswapd activity"}' /proc/vmstat

# Check THP success rate
awk '/thp_fault_alloc /{ok=$2} /thp_fault_fallback /{fail=$2} END{total=ok+fail; if(total>0) printf "THP success: %.1f%% (%d/%d)\n", ok/total*100, ok, total; else print "No THP faults"}' /proc/vmstat

Key Source Files

File What it contains
mm/vmstat.c Output generation, per-CPU aggregation
include/linux/vm_event_item.h Counter definitions
include/linux/mmzone.h Gauge definitions (zone/node stat items)
mm/vmscan.c Reclaim counters
mm/page_alloc.c Allocation and stall counters
mm/compaction.c Compaction counters

Further Reading