Skip to content

Why is my process slow? (memory diagnosis)

A systematic troubleshooting guide for memory-related performance problems

The problem

Your process is slow, and you suspect memory. But "memory problem" is vague -- it could be swapping, reclaim pressure, NUMA effects, THP compaction stalls, or kernel slab growth eating your available memory. This guide gives you a systematic flowchart so you stop guessing and start measuring.

Decision flowchart

Start here and follow the arrows. Each step either identifies the problem or sends you deeper.

flowchart TD
    START["Process is slow"] --> PSI{"Step 1: Check PSI<br/>/proc/pressure/memory<br/>Is there memory pressure?"}
    PSI -->|"some avg10 > 5"| MEMINFO{"Step 2: Check /proc/meminfo<br/>MemAvailable low?<br/>SwapUsed high?"}
    PSI -->|"avg10 ~ 0"| NOTMEM["Not a memory problem.<br/>Check CPU, I/O, locks."]

    MEMINFO -->|"MemAvailable < 10%"| PERPROC{"Step 3: Which process?<br/>Check /proc/pid/status<br/>VmRSS, VmSwap"}
    MEMINFO -->|"SwapUsed growing"| SWAP{"Step 5: Swapping<br/>pswpin/pswpout rates"}
    MEMINFO -->|"MemAvailable OK"| VMSTAT{"Step 4: Check /proc/vmstat<br/>pgscan_direct? pgmajfault?"}

    PERPROC --> VMSTAT

    VMSTAT -->|"pgscan_direct high"| RECLAIM["Direct reclaim stalls.<br/>See: Page Reclaim"]
    VMSTAT -->|"pgmajfault high"| SWAP
    VMSTAT -->|"thp_fault_fallback"| THP{"Step 6: THP compaction<br/>compact_stall?"}
    VMSTAT -->|"numa_miss high"| NUMA{"Step 7: NUMA effects<br/>numastat"}

    SWAP --> SWAPFIX["Swapping bottleneck.<br/>See: Swapping"]

    THP --> THPFIX["Compaction stalls.<br/>See: THP"]

    NUMA --> NUMAFIX["NUMA misplacement.<br/>See: NUMA"]

    MEMINFO -->|"SUnreclaim growing"| SLAB{"Step 8: Kernel slab<br/>slabtop"}
    SLAB --> SLABFIX["Kernel memory leak.<br/>See: SLUB"]

Step 1: Is it memory at all?

Before digging into memory internals, confirm that memory pressure actually exists. Since Linux 4.20 (commit 0e94682b73bf), PSI (Pressure Stall Information) tells you directly whether tasks are stalling on memory.

cat /proc/pressure/memory
# some avg10=3.42 avg60=1.09 avg300=0.37 total=123456789
# full avg10=0.12 avg60=0.04 avg300=0.01 total=12345678

What the numbers mean:

Metric Meaning
some Percentage of time at least one task is stalled on memory
full Percentage of time all tasks are stalled (nothing productive happening)
avg10 10-second moving average
avg60 / avg300 60-second and 5-minute averages

How to interpret:

  • some avg10 = 0: No memory pressure. Your slowness is elsewhere -- check CPU (/proc/pressure/cpu), I/O (/proc/pressure/io), or application-level contention.
  • some avg10 > 5: Meaningful memory pressure. Continue to Step 2.
  • full avg10 > 1: Severe. The entire system is stalling on memory.

PSI is implemented in kernel/sched/psi.c. It tracks stall states per task and aggregates them into time-weighted averages. Facebook developed PSI because load average and MemAvailable alone were not sufficient -- they needed to know if anyone was actually waiting for memory. See the LWN coverage: Tracking pressure-stall information.

Why PSI first?

Traditional metrics like MemAvailable tell you about supply. PSI tells you about impact. A system can have low available memory and still perform fine if the working set fits in the remaining pages. Conversely, a system with "enough" free memory can still stall if reclaim is thrashing.

Step 2: Check /proc/meminfo

Once PSI confirms memory pressure, understand where the memory is going.

grep -E '^(MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree|SReclaimable|SUnreclaim|Active|Inactive)' /proc/meminfo

Key fields:

Field What it tells you
MemAvailable How much memory is available for new allocations without swapping. This is not MemFree -- it includes reclaimable page cache and slab. Added in kernel 3.14 (commit 34e431b0ae39).
SwapTotal - SwapFree How much swap is in use. If this is large and growing, you are actively swapping.
Cached Page cache size. Large Cached is normal and healthy -- it means the kernel is caching file data.
SReclaimable Slab memory the kernel can free under pressure (dentry cache, inode cache).
SUnreclaim Slab memory the kernel cannot free. If this is large and growing, suspect a kernel memory leak.

How to interpret:

# Quick health check: what percentage of memory is available?
awk '/MemTotal/{t=$2} /MemAvailable/{a=$2} END{printf "Available: %.1f%%\n", a/t*100}' /proc/meminfo
  • MemAvailable < 10% of MemTotal: System is under pressure. The kernel is likely reclaiming or swapping.
  • Large Cached but low MemAvailable: The page cache is being used and is reclaimable, but even after reclaiming it there would not be enough. Processes are using too much anonymous memory.
  • SwapTotal - SwapFree > 0 and growing: Active swapping. Jump to Step 5.
  • SUnreclaim growing over time: Kernel slab leak. Jump to Step 8.

The MemAvailable estimate is calculated in mm/show_mem.c (via si_mem_available()). It accounts for the zone watermarks, reclaimable page cache, and reclaimable slab.

Step 3: Check per-process memory

Now find which process is the culprit.

# Top memory consumers by RSS
ps aux --sort=-%mem | head -20

# For a specific process, get detailed breakdown
grep -E '^(VmSize|VmRSS|VmSwap|RssAnon|RssFile|RssShmem)' /proc/<pid>/status

Key fields in /proc/<pid>/status:

Field Meaning
VmRSS Resident Set Size -- physical memory currently used by this process
VmSwap How much of this process has been swapped out
RssAnon Anonymous pages (heap, stack) -- these can only go to swap
RssFile File-backed pages -- can be dropped and re-read
RssShmem Shared memory and tmpfs pages

For a more detailed per-mapping breakdown:

# Aggregated summary (fast, available since kernel 4.14)
cat /proc/<pid>/smaps_rollup

The smaps_rollup file (commit 493b0e9d945f) gives you totals across all VMAs without the overhead of reading the full smaps file. Key fields to watch:

Field Meaning
Pss Proportional Set Size -- accounts for shared pages by dividing by number of sharers
Swap Total swapped pages across all VMAs
Referenced Pages accessed since last clear -- indicates working set size

PSS vs RSS

RSS double-counts shared pages. If two processes share a 100MB library, both show 100MB in RSS. PSS divides shared pages proportionally, so each shows ~50MB. For capacity planning, PSS is more accurate. For "is this process the problem?", RSS is usually enough.

Step 4: Check /proc/vmstat for reclaim activity

This is where you identify what the kernel is doing about memory pressure.

# Snapshot key counters
grep -E '^(pgscan_direct|pgscan_kswapd|pgsteal_direct|pgsteal_kswapd|allocstall|pgmajfault|pgfault)' /proc/vmstat

These are cumulative counters. Take two snapshots and compare:

# Watch the rate of change
watch -d -n1 'grep -E "pgscan_direct|allocstall|pgmajfault" /proc/vmstat'

Key counters:

Counter What it means Why it matters
pgscan_direct Pages scanned by direct reclaim Process is blocking to free memory. This is the latency killer.
pgscan_kswapd Pages scanned by kswapd Background reclaim -- less harmful but indicates pressure.
pgsteal_direct / pgsteal_kswapd Pages actually freed Compare with pgscan to get reclaim efficiency. Low steal/scan ratio means the kernel is scanning many pages but few are reclaimable.
allocstall Number of direct reclaim events Each one means a process blocked waiting for memory.
pgmajfault Major page faults (required disk I/O) If high, pages are being faulted in from swap or disk.
pgfault All page faults (minor + major) Minor faults are normal. Focus on major faults.

How to interpret:

  • pgscan_direct growing: Direct reclaim is happening. Processes are stalling. See Page Reclaim for the full reclaim path.
  • allocstall increasing: Each increment means at least one allocation had to wait for reclaim. High rates (>10/sec) will cause visible latency.
  • pgmajfault high: Pages are being read from swap or disk. If the process is faulting on anonymous pages, it is thrashing swap.
  • High pgscan_direct but low pgsteal_direct: Reclaim is inefficient. The kernel is scanning pages but cannot free them -- they are all in active use. This is the worst case: high CPU overhead with no benefit.

These counters are maintained in mm/vmscan.c. Direct reclaim happens in __alloc_pages_direct_reclaim() within mm/page_alloc.c.

Step 5: Is it swapping?

Swapping is often the first suspect when a process is slow, and it is usually correct -- swap I/O is orders of magnitude slower than memory access.

# Current swap activity rates
vmstat 1 5
# Look at 'si' (swap in) and 'so' (swap out) columns

# Or from /proc/vmstat (cumulative pages)
grep -E '^(pswpin|pswpout)' /proc/vmstat
# Which processes are swapped?
for pid in /proc/[0-9]*; do
    swap=$(grep VmSwap "$pid/status" 2>/dev/null | awk '{print $2}')
    name=$(cat "$pid/comm" 2>/dev/null)
    if [ -n "$swap" ] && [ "$swap" -gt 0 ] 2>/dev/null; then
        echo "$swap kB $name (pid $(basename $pid))"
    fi
done | sort -rn | head -20

How to interpret:

  • pswpout growing, pswpin stable: The kernel is evicting pages to swap. Processes are losing resident pages.
  • pswpin growing: Processes are faulting pages back from swap -- this means swapped-out pages are still being accessed. Each swap-in is a disk I/O that blocks the faulting process.
  • Both high: Active thrashing. The working set does not fit in memory and the kernel is churning pages between RAM and swap.
  • Per-process VmSwap large: That specific process has been victimized by reclaim.

Common causes and fixes:

Cause Evidence Fix
Working set exceeds RAM All processes have VmSwap > 0 Add RAM or reduce workload
One process leaked memory One process has huge VmRSS + VmSwap Fix the application
Swappiness too high pswpout high even with MemAvailable Tune vm.swappiness (see Swap)

For the full lifecycle of a swapped page, see What happens during swapping.

Step 6: Is it THP compaction?

Transparent Huge Pages can cause surprise latency spikes. When a process faults on memory and the kernel tries to allocate a 2MB huge page, it may stall to compact memory.

grep -E '^(thp_fault_alloc|thp_fault_fallback|thp_collapse_alloc|thp_collapse_alloc_failed|compact_stall|compact_success|compact_fail)' /proc/vmstat

Key counters:

Counter Meaning
thp_fault_alloc Successful THP allocations on page fault
thp_fault_fallback THP allocation failed, fell back to 4KB pages
compact_stall Number of times a process stalled waiting for compaction
compact_fail Compaction attempted but failed to create a huge page

How to interpret:

  • compact_stall growing rapidly: Processes are blocking while the kernel rearranges pages. Each stall can be milliseconds to tens of milliseconds.
  • thp_fault_fallback >> thp_fault_alloc: Memory is too fragmented for huge pages. The kernel keeps trying and failing.
  • compact_fail high: Compaction is running but not helping. The kernel is doing expensive page migration for nothing.

Mitigations:

# Option 1: Switch THP to madvise-only (no automatic THP on faults)
echo madvise > /sys/kernel/mm/transparent_hugepage/enabled

# Option 2: Disable THP compaction on fault (still allow khugepaged)
echo never > /sys/kernel/mm/transparent_hugepage/defrag

The compaction code lives in mm/compaction.c. The defrag knob controls whether the kernel stalls on allocation to attempt compaction, defers it to khugepaged, or skips it entirely. See the LWN article Proactive compaction for background on how compaction has evolved.

For the full THP picture, see Transparent Huge Pages.

Step 7: Is it NUMA effects?

On multi-socket systems, memory access latency depends on which node the memory lives on. A process running on node 0 accessing memory on node 1 pays a 30-50% latency penalty on each access.

# System-wide NUMA statistics
numastat

# Per-process NUMA allocation
numastat -p <pid>

# From /proc/vmstat
grep -E '^numa_' /proc/vmstat

Key counters:

Counter Meaning
numa_hit Allocation satisfied from the preferred/local node
numa_miss Allocation went to a non-preferred node
numa_foreign This node was preferred but allocation went elsewhere
numa_interleave Allocation via interleave policy (intentional)

How to interpret:

  • numa_miss / (numa_hit + numa_miss) > 10%: Significant remote memory access. The process's memory is spread across nodes.
  • numastat -p <pid> shows memory on non-local nodes: The process's working set is not on the node where it is running.

Common causes:

Cause Evidence Fix
Process migrated between nodes Memory on old node, CPU on new Pin with numactl --cpunodebind=N --membind=N
Memory interleaved by default Even spread across nodes Use numactl --localalloc or --membind
Allocation overflow Local node was full Add memory or balance workload across nodes

The kernel's NUMA allocation policy is implemented in mm/mempolicy.c. AutoNUMA (automatic NUMA balancing) was added in kernel 3.8 (commit 217db1ef6c47) and periodically scans pages to migrate them closer to the accessing CPU. See LWN: AutoNUMA: the other approach to NUMA scheduling.

For the full NUMA story, see NUMA Memory Management.

Step 8: Is it a kernel memory issue?

Sometimes the problem is not user-space memory at all -- the kernel itself is consuming memory through slab caches.

# Watch SUnreclaim over time
watch -n5 'grep SUnreclaim /proc/meminfo'

# See which slab caches are largest
sudo slabtop -o -s c | head -20

# Detailed per-cache stats
cat /proc/slabinfo | head -5

Key indicators:

  • SUnreclaim in /proc/meminfo growing over time: Kernel objects are accumulating and cannot be freed.
  • SReclaimable is large but not being reclaimed: The kernel has not been under enough pressure to shrink dentry/inode caches. This is usually benign.

Common offenders:

Cache What it is Why it grows
dentry Directory entry cache Many files accessed, deep directory trees
inode_cache Inode metadata cache Same as dentry
kmalloc-* General-purpose allocations Possible kernel memory leak
sock_inode_cache Socket structures Many network connections
# Force the kernel to reclaim slab caches (drops page cache + slab)
echo 3 > /proc/sys/vm/drop_caches
# WARNING: This drops page cache too, which can cause a temporary I/O spike.
# Use echo 2 to drop slab only (dentries + inodes).

drop_caches is for diagnosis, not production

If you need to regularly drop caches to keep a system healthy, you have a leak or a sizing problem. Fix the root cause.

The slab allocator is implemented in mm/slub.c. Slab shrinking during reclaim is handled by shrinker callbacks registered via register_shrinker(). See the LWN article The slab and protected memory allocations for background.

For the full slab story, see Slab Allocator (SLUB).

Try It Yourself

Run this all-in-one diagnostic script to get a snapshot of your system's memory health:

#!/bin/bash
# memory-diag.sh -- Quick memory health snapshot

echo "=== PSI (Pressure Stall Information) ==="
if [ -f /proc/pressure/memory ]; then
    cat /proc/pressure/memory
else
    echo "PSI not available (kernel < 4.20 or CONFIG_PSI=n)"
fi

echo ""
echo "=== Memory Overview ==="
grep -E '^(MemTotal|MemAvailable|SwapTotal|SwapFree|SReclaimable|SUnreclaim)' /proc/meminfo

echo ""
echo "=== Top 10 Processes by RSS ==="
ps aux --sort=-%mem | head -11

echo ""
echo "=== Reclaim Activity (rates need two samples to compare) ==="
grep -E '^(pgscan_direct|pgscan_kswapd|allocstall|pgmajfault|pswpin|pswpout)' /proc/vmstat

echo ""
echo "=== THP / Compaction ==="
grep -E '^(thp_fault_alloc|thp_fault_fallback|compact_stall|compact_fail)' /proc/vmstat

echo ""
echo "=== NUMA (if applicable) ==="
grep -E '^numa_' /proc/vmstat 2>/dev/null || echo "No NUMA counters"

echo ""
echo "=== Top 10 Processes with Swap ==="
for pid in /proc/[0-9]*; do
    swap=$(grep VmSwap "$pid/status" 2>/dev/null | awk '{print $2}')
    name=$(cat "$pid/comm" 2>/dev/null)
    if [ -n "$swap" ] && [ "$swap" -gt 0 ] 2>/dev/null; then
        echo "$swap kB $name (pid $(basename $pid))"
    fi
done | sort -rn | head -10

For continuous monitoring, combine PSI with vmstat:

# Terminal 1: Watch PSI
watch -n1 cat /proc/pressure/memory

# Terminal 2: Watch vmstat (si/so = swap in/out, free = free pages)
vmstat 1

# Terminal 3: Watch reclaim counters
watch -d -n1 'grep -E "pgscan_direct|allocstall|compact_stall" /proc/vmstat'

Quick reference: which metric points where

Symptom Key metric Likely cause Deep-dive doc
PSI some > 0, MemAvailable low /proc/meminfo System-wide memory shortage Page Reclaim
allocstall increasing /proc/vmstat Direct reclaim stalls Page Reclaim
pswpin / pswpout high /proc/vmstat, vmstat Active swapping Swapping
pgmajfault high /proc/vmstat Pages faulted from disk/swap Swapping
compact_stall high /proc/vmstat THP compaction latency THP, Compaction
numa_miss high /proc/vmstat Remote NUMA access NUMA
SUnreclaim growing /proc/meminfo Kernel slab leak SLUB
Large VmSwap on one process /proc/<pid>/status Process swapped out Swapping

Kernel source reference

File What it contains
kernel/sched/psi.c PSI tracking and averaging
mm/vmscan.c Page reclaim, direct reclaim, kswapd
mm/page_alloc.c Page allocator, watermarks, alloc stalls
mm/compaction.c Memory compaction for huge pages
mm/mempolicy.c NUMA memory policy
mm/slub.c SLUB slab allocator
fs/proc/meminfo.c /proc/meminfo implementation
fs/proc/array.c /proc/pid/status implementation

Further reading