Why can't I allocate contiguous memory?
The fragmentation problem that plagues every long-running system
The question
You have 4GB of free memory. You need a 4MB contiguous buffer for DMA. The allocation fails. Why?
// This might fail even with gigabytes free
struct page *pages = alloc_pages(GFP_KERNEL, 10); // order 10 = 4MB (1024 pages)
// kmalloc for large sizes falls back to the page allocator internally,
// so it's subject to the same fragmentation limits
void *buf = kmalloc(4 * 1024 * 1024, GFP_KERNEL);
Note: MAX_PAGE_ORDER is typically 10 (4MB with 4KB pages) on most configs. kmalloc() doesn't have a separate size cap - large requests become high-order page allocations internally (see __kmalloc_large_noprof() in mm/slub.c).
The answer is external fragmentation - free memory exists, but not in contiguous chunks.
Why contiguous memory matters
Most kernel allocations don't need physical contiguity. Virtual memory lets you map scattered physical pages into a contiguous virtual address range (that's what vmalloc() does).
But some things genuinely need physically contiguous memory:
| Use case | Why contiguous? |
|---|---|
| DMA buffers | Many devices can't handle scatter-gather lists |
| Huge pages | 2MB/1GB pages require aligned contiguous regions |
| Network buffers | Some devices/paths benefit from contiguity (though many use scatter-gather) |
| GPU memory | Graphics drivers often need large contiguous allocations |
The fragmentation problem
The buddy allocator divides memory into power-of-2 blocks. After running for days or weeks, a system's memory looks like this:
Physical memory after extended use:
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│U│F│U│U│F│U│F│F│U│U│U│F│U│F│U│F│F│F│U│U│F│U│F│U│U│F│F│U│F│U│F│U│
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
U = Used page, F = Free page
Free pages: 14 (56KB)
Largest contiguous block: 3 pages (12KB)
You have 56KB free but can't allocate a 16KB block. This is external fragmentation.
How it happens
- System boots with large contiguous free regions
- Allocations of various sizes come and go
- Some allocations persist (kernel structures, pinned pages)
- Free memory becomes scattered around persistent allocations
The persistent allocations act like rocks in a stream - free memory flows around them but can't coalesce.
Migrate types: the kernel's first defense
The kernel groups pages by mobility to reduce fragmentation:
MIGRATE_UNMOVABLE - Kernel structures that can't move
MIGRATE_MOVABLE - User pages that can be migrated
MIGRATE_RECLAIMABLE - Caches that can be freed
The buddy allocator maintains separate free lists per migrate type (mm/page_alloc.c):
┌─────────────────────────────────────────────────────────────┐
│ Physical memory │
├───────────────────┬────────────────────┬────────────────────┤
│ UNMOVABLE │ MOVABLE │ RECLAIMABLE │
│ (kernel stuff) │ (user pages) │ (caches) │
└───────────────────┴────────────────────┴────────────────────┘
This helps because: - Movable pages can be migrated to create contiguous free space - Reclaimable pages can be freed under pressure - Unmovable pages are grouped together, limiting fragmentation spread
But it's not perfect. Under pressure, the allocator will "steal" pages from other migrate types, polluting movable regions with unmovable allocations.
Memory compaction: defragmentation for Linux
When high-order allocations fail, the kernel can try compaction - moving pages to create contiguous free regions.
Before compaction:
┌───┬───┬───┬───┬───┬───┬───┬───┐
│ M │ F │ M │ F │ M │ F │ M │ F │ M = Movable, F = Free
└───┴───┴───┴───┴───┴───┴───┴───┘
After compaction:
┌───┬───┬───┬───┬───┬───┬───┬───┐
│ M │ M │ M │ M │ F │ F │ F │ F │ Contiguous free region!
└───┴───┴───┴───┴───┴───┴───┴───┘
Compaction was added by Mel Gorman in kernel 2.6.35 (commit 748446bb6b5a | LKML, 2010). Before this, high-order allocation failures on long-running systems were common and the only solution was rebooting.
See compaction for details on the algorithm.
Limitations of compaction
Compaction can only move movable pages. Unmovable pages (kernel allocations, pinned memory) are permanent obstacles:
┌───┬───┬───┬───┬───┬───┬───┬───┐
│ U │ M │ F │ U │ M │ F │ U │ F │ U = Unmovable (can't move!)
└───┴───┴───┴───┴───┴───┴───┴───┘
│
▼ After compaction
┌───┬───┬───┬───┬───┬───┬───┬───┐
│ U │ M │ M │ U │ F │ F │ U │ F │ Still fragmented around U
└───┴───┴───┴───┴───┴───┴───┴───┘
CMA: the solution for device drivers
The Contiguous Memory Allocator (CMA) solves the device driver problem by reserving regions that serve double duty:
- Normally used for movable pages (page cache, user memory)
- Reclaimed for contiguous allocation when devices need them
CMA region:
┌─────────────────────────────────────────────────────────────┐
│ Normal operation: filled with movable pages (page cache) │
└─────────────────────────────────────────────────────────────┘
│
▼ Device requests contiguous buffer
┌─────────────────────────────────────────────────────────────┐
│ Pages migrated out, region available for DMA buffer │
└─────────────────────────────────────────────────────────────┘
CMA was developed by Marek Szyprowski at Samsung and merged in kernel 3.5 (commit c64be2bb1c6e, 2012). See the LWN article for design discussion.
Using CMA
From a driver:
#include <linux/dma-mapping.h>
// Allocate from CMA (DMA API handles this automatically)
void *buf = dma_alloc_coherent(dev, size, &dma_handle, GFP_KERNEL);
// Or directly (rarely needed):
#include <linux/cma.h>
struct page *pages = cma_alloc(cma, count, align, false /* no_warn */);
CMA regions are configured at boot:
# Reserve 256MB for CMA
cma=256M
# Or in device tree (ARM):
reserved-memory {
linux,cma {
compatible = "shared-dma-pool";
size = <0x10000000>; /* 256MB */
reusable;
};
};
Best practices for driver developers
1. Use the DMA API
Don't allocate contiguous memory directly. Use dma_alloc_coherent() or dma_alloc_attrs():
// Good: Let the DMA API handle it
void *buf = dma_alloc_coherent(dev, size, &dma_handle, GFP_KERNEL);
// Avoid: Direct contiguous allocation
void *buf = kmalloc(size, GFP_KERNEL); // May fail with large sizes
The DMA API can use CMA, IOMMU remapping, or bounce buffers as appropriate.
2. Use scatter-gather when possible
Many modern devices support scatter-gather DMA:
// Instead of one large contiguous buffer
struct scatterlist sg[MAX_SEGS];
sg_init_table(sg, nents);
// Map scattered pages
dma_map_sg(dev, sg, nents, direction);
This eliminates the contiguous memory requirement entirely.
3. Allocate early or use CMA
If you need large contiguous buffers:
- Reserve memory at boot (via
memblockor CMA) - Use CMA regions sized for your needs
- Allocate during driver probe, not on-demand
4. Handle allocation failures gracefully
High-order allocations can fail. Don't use __GFP_NOFAIL for large allocations:
// Bad: Can hang the system
void *buf = kmalloc(large_size, GFP_KERNEL | __GFP_NOFAIL);
// Good: Handle failure
void *buf = kmalloc(large_size, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
if (!buf)
return -ENOMEM;
Try it yourself
# View fragmentation per order
cat /proc/buddyinfo
# Example: Node 0, zone Normal 1024 512 256 128 64 32 16 8 4 2 1
# (counts at each order 0-10)
# View fragmentation index (debugfs required, may not exist on all configs)
cat /sys/kernel/debug/extfrag/extfrag_index
# 0.0 = no fragmentation, 1.0 = severe fragmentation
# View CMA regions
cat /proc/meminfo | grep Cma
# CmaTotal: 262144 kB
# CmaFree: 131072 kB
# Trigger manual compaction
echo 1 > /proc/sys/vm/compact_memory
# Watch compaction statistics
watch -n 1 'grep -E "compact_" /proc/vmstat'
Evolution
The kernel's approach to contiguous allocation has evolved significantly:
ZONE_MOVABLE (v2.6.23, 2007)
Commit: 2a1e274acf0b ("Create the ZONE_MOVABLE zone")
Author: Mel Gorman
Note: This commit predates modern LKML archiving on lore.kernel.org.
The first major anti-fragmentation feature. Created a zone that only accepts movable allocations, guaranteeing that memory in this zone can always be migrated or reclaimed. Configured via kernelcore= or movablecore= boot parameters.
# Reserve 4GB for unmovable kernel allocations, rest is ZONE_MOVABLE
kernelcore=4G
# Or specify movable zone size directly
movablecore=8G
Migrate types (v2.6.24, 2007)
Commit: b2a0ac8875a0 ("Split the free lists for movable and unmovable allocations")
Author: Mel Gorman
Note: This commit predates modern LKML archiving on lore.kernel.org.
Split the buddy allocator free lists by page mobility. This was a less invasive alternative to ZONE_MOVABLE that works automatically without boot parameters. The two approaches complement each other.
Memory compaction (v2.6.35, 2010)
Commit: 748446bb6b5a ("mm: compaction: memory compaction core") | LKML
Author: Mel Gorman
Active defragmentation by migrating pages. Before this, the kernel could only prevent fragmentation (via migrate types), not fix it. Compaction made THP viable on long-running systems.
CMA (v3.5, 2012)
Commit: c64be2bb1c6e ("drivers: add Contiguous Memory Allocator")
Author: Marek Szyprowski (Samsung)
Note: The original patch series is not archived on lore.kernel.org; see LWN coverage for discussion.
Solved the device driver problem by creating regions that serve double duty - used for movable pages normally, reclaimed for contiguous DMA buffers on demand. See LWN article.
Proactive compaction (v5.9, 2020)
Commit: facdaa917c4d ("mm: proactive compaction") | LKML
Author: Nitin Gupta (NVIDIA)
Background compaction based on fragmentation levels, reducing direct compaction stalls. Controlled via vm.compaction_proactiveness.
Further reading
- Memory compaction - How compaction works
- Page allocator - Buddy system and migrate types
- LWN: CMA - CMA design and rationale
- LWN: Memory compaction - Original compaction proposal
- LWN: In defense of fragmentation avoidance - Mel Gorman's 2007 explanation of the approach
- Kernel docs: DMA API - Using the DMA API correctly