Page Allocator (Buddy System)

The foundation of Linux memory management - allocating physical page frames

What is the Page Allocator?

The page allocator manages physical memory at the page granularity (typically 4KB). It's the lowest-level allocator in the kernel - every other allocator (SLUB, vmalloc) ultimately gets pages from here.

struct page *alloc_pages(gfp_t gfp, unsigned int order);
void free_pages(unsigned long addr, unsigned int order);

The Buddy System

Linux uses a buddy allocator, an algorithm dating back to 1965 (Knowlton). The key insight: manage memory in power-of-2 sized blocks, and when you free memory, merge it with its "buddy" if also free.

How It Works

Memory divided into blocks of order 0, 1, 2, ... 10
Order 0 = 1 page (4KB)
Order 1 = 2 pages (8KB)
Order 2 = 4 pages (16KB)
...
Order 10 = 1024 pages (4MB) - max order (MAX_ORDER=11, so 0-10 valid)

Free lists per order:
+-------------------------------------------+
| Order 0: [4KB] [4KB] [4KB] ...            |
| Order 1: [8KB] [8KB] ...                  |
| Order 2: [16KB] ...                       |
| ...                                       |
| Order 10: [4MB] ...                       |
+-------------------------------------------+

Allocation Example

Request for 3 pages (needs order 2 = 4 pages):

1. Check order 2 free list
   - If available: remove block, return it
   - If empty: go to step 2

2. Check order 3 free list (8 pages)
   - Split: one 4-page block to satisfy request
   - Other 4-page block goes to order 2 free list

3. Continue splitting from higher orders if needed

Free Example (Buddy Merging)

Freeing a 4-page block at address 0x4000 (order-2 blocks must be 4-page aligned):

1. Find buddy address: 0x4000 XOR (4 * PAGE_SIZE) = 0x0000

2. Is buddy free and same order?
   - Yes: merge into 8-page block, try to merge again
   - No: add to order 2 free list

Merging continues up to MAX_ORDER or until buddy isn't free

Why Buddies?

The buddy's address can be computed with a single XOR operation - no searching required. Two adjacent blocks of the same size are buddies only if they came from splitting the same larger block.

Memory Zones

Physical memory is divided into zones based on hardware constraints:

Zone	Purpose	Typical Range (x86-64)
`ZONE_DMA`	Legacy 16-bit DMA devices	0 - 16MB
`ZONE_DMA32`	32-bit DMA devices	16MB - 4GB
`ZONE_NORMAL`	Regular kernel allocations	Above 4GB
`ZONE_MOVABLE`	Migratable pages (memory hotplug, isolation)	Configurable

Each zone has its own set of buddy free lists.

GFP Flags

GFP (Get Free Pages) flags control allocation behavior:

/* Common combinations */
GFP_KERNEL    /* Normal allocation, can sleep, can reclaim */
GFP_ATOMIC    /* Cannot sleep (interrupt context) */
GFP_USER      /* User-space allocation */
GFP_DMA       /* Must be in ZONE_DMA */
GFP_DMA32     /* Must be in ZONE_DMA32 */

/* Modifiers */
__GFP_ZERO    /* Zero the memory */
__GFP_NOWARN  /* Suppress allocation failure warnings */
__GFP_RETRY_MAYFAIL  /* Try hard but may fail */
__GFP_NOFAIL  /* Never fail - use sparingly, can deadlock */

See include/linux/gfp.h for the full list.

History & Evolution

Origins (1991)

The buddy allocator has been in Linux since the beginning. The basic algorithm is from:

Knowlton, K. C. "A Fast Storage Allocator" Communications of the ACM, 1965

Note: Pre-LKML era. Early implementation details are in the git history starting from v2.6.12.

Per-CPU Page Lists (v2.5/2.6)

Problem: The zone lock became a bottleneck on SMP systems - every allocation/free needed it.

Solution: Per-CPU page lists (per_cpu_pages) cache single pages, reducing lock contention.

Note: Developed incrementally during 2.5 cycle. Predates searchable git history (v2.6.12+).

CPU 0: [page] [page] [page] ...  (hot cache)
CPU 1: [page] [page] ...
...
        |
        v
   Zone free lists (locked)

NUMA Awareness (v2.6)

Problem: On NUMA systems, accessing remote memory is slower than local memory.

Solution: Per-node zones. Allocations prefer local node memory.

/* Allocate from specific node */
alloc_pages_node(node_id, gfp, order);

Note: NUMA support evolved over many releases. See mm/mempolicy.c for allocation policies.

Compaction (v2.6.35, 2010)

Commit: 748446bb6b5a ("mm: compaction: memory compaction core") | LKML

Author: Mel Gorman

Problem: External fragmentation - free pages scattered, can't satisfy high-order allocations.

Solution: Memory compaction - move pages to create contiguous free blocks.

CMA (Contiguous Memory Allocator, v3.5, 2012)

Commit: c64be2bb1c6e ("drivers: add Contiguous Memory Allocator")

Author: Marek Szyprowski (Samsung)

Note: Pre-2013 LKML archives on lore.kernel.org are sparse. See LWN article for design discussion.

Problem: Devices need large contiguous buffers but memory fragments over time.

Solution: Reserve regions that can be used by movable pages normally, but reclaimed for contiguous allocation when needed.

Try It Yourself

View Buddy Allocator State

# Free blocks per order, per zone
cat /proc/buddyinfo

# Example output:
# Node 0, zone   Normal  1024  512  256  128   64   32   16    8    4    2    1
#                        ^order 0              ^order 5                    ^order 10

# Each number = count of free blocks at that order

View Zone Information

# Detailed zone stats
cat /proc/zoneinfo

# Key fields:
# - min/low/high: watermarks for reclaim
# - free: current free pages
# - managed: pages managed by this zone

Memory Pressure Test

# Watch buddy state while allocating memory
watch -n 1 cat /proc/buddyinfo

# In another terminal, consume memory
stress --vm 1 --vm-bytes 1G

Trace Allocations

# Enable page allocation tracing (requires debugfs)
echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable
cat /sys/kernel/debug/tracing/trace_pipe

# Output shows: page address, order, gfp flags, migratetype

Key Data Structures

struct zone

struct zone {
    unsigned long watermark[NR_WMARK];     /* min/low/high thresholds */
    unsigned long nr_reserved_highatomic;
    struct per_cpu_pages __percpu *per_cpu_pageset;
    struct free_area free_area[NR_PAGE_ORDERS];  /* buddy free lists */
    unsigned long zone_start_pfn;
    unsigned long managed_pages;
    unsigned long spanned_pages;
    unsigned long present_pages;
    const char *name;
    /* ... */
};

Note: NR_PAGE_ORDERS replaced MAX_ORDER in recent kernels. When reading older code, MAX_ORDER is the same concept.

Watermarks

Each zone has watermarks that control reclaim behavior:

Watermark	When Reached
`high`	Zone is healthy, no action needed
`low`	Wake `kswapd` to reclaim pages in background
`min`	Direct reclaim - allocating process must help free pages
`promo`	For tiered memory (CXL/NUMA), controls page promotion

Watermarks are derived from vm.min_free_kbytes:

# View watermarks
cat /proc/zoneinfo | grep -E "pages free|min|low|high"

# Adjust minimum free memory (affects all watermarks)
sysctl vm.min_free_kbytes=65536

Zone fallback: When a zone is exhausted, allocations fall back to other zones in order: NORMAL → DMA32 → DMA (preferring higher zones first).

struct free_area

struct free_area {
    struct list_head free_list[MIGRATE_TYPES];  /* per-migratetype lists */
    unsigned long nr_free;
};

Migrate Types

Pages are grouped by mobility to reduce fragmentation:

Type	Description
`MIGRATE_UNMOVABLE`	Kernel allocations that can't move
`MIGRATE_MOVABLE`	User pages, can be migrated/compacted
`MIGRATE_RECLAIMABLE`	Caches that can be freed under pressure
`MIGRATE_HIGHATOMIC`	Reserved for high-priority atomic allocations
`MIGRATE_CMA`	Contiguous Memory Allocator regions
`MIGRATE_ISOLATE`	Pages being isolated for migration/offlining

# View migrate type distribution
cat /proc/pagetypeinfo

Common Issues

High-Order Allocation Failures

Large contiguous allocations (order > 3) can fail due to fragmentation even with free memory.

Solutions: - Use __GFP_RETRY_MAYFAIL to try harder - Use vmalloc instead (virtual contiguity) - Enable compaction (/proc/sys/vm/compact_memory)

Zone Imbalance

Memory pressure in one zone while others have free memory.

Check: cat /proc/zoneinfo | grep -E "zone|free|min"

OOM Despite Free Pages

Free pages exist but in wrong zone or wrong migratetype.

Debug: Check /proc/buddyinfo and /proc/pagetypeinfo

References

Key Code

File	Description
`mm/page_alloc.c`	Core allocator
`include/linux/gfp.h`	GFP flags
`include/linux/mmzone.h`	Zone structures

Key Commits

Commit	Kernel	Description
748446bb6b5a	v2.6.35	Memory compaction
c64be2bb1c6e	v3.5	CMA introduction

overview - How page allocator fits with other allocators
slab - SLUB uses pages from buddy allocator
vmalloc - Gets individual pages from buddy