zswap Internals: Compressed Swap Cache
A compressed RAM cache that intercepts pages on their way to the swap device — trading CPU cycles for I/O reduction
What zswap Is (and Is Not)
zswap is a write-back compressed cache that sits in front of the swap device. When the kernel decides to swap out an anonymous page, zswap intercepts it, compresses it, and stores the compressed data in a RAM-based pool. If the compressed pool fills up, zswap evicts the coldest entries back to the backing swap device.
The key insight: reading from compressed RAM is orders of magnitude faster than a disk or even SSD read. zswap trades CPU cycles for I/O reduction, which is usually a favorable trade.
zswap vs zram — they are not the same thing
These two are frequently confused:
- zswap: A compressed cache in front of any swap device. It does not appear as a block device. The backing swap device (disk, SSD, or even a zram device) still exists.
- zram: A compressed block device that acts as the swap device itself. It replaces the swap device entirely in memory.
You can stack them: use zram as the swap device and zswap in front of it. However, this means pages get compressed twice — once by zswap and once by zram — which wastes CPU with minimal additional benefit. See Interaction with zram.
Architecture Overview
graph TD
subgraph "Store path (swap-out)"
A[Anonymous page\nneeds reclaim] --> B{zswap enabled\nand pool not full?}
B -- No --> G[Write to swap device\ndirectly]
B -- Yes --> C[zswap_store]
C --> D[zswap_compress\nper-CPU acomp_ctx]
D --> E{Compresses\nsmaller than PAGE_SIZE?}
E -- Yes --> F[zs_malloc + zs_obj_write\ninto zsmalloc pool]
E -- No --> F2[Store uncompressed\nlength == PAGE_SIZE]
F --> H[zswap_entry added\nto xarray + LRU]
F2 --> H
end
subgraph "Fault path (swap-in)"
I[Page fault on\nswap PTE] --> J[zswap_load]
J --> K{Entry in\nxarray?}
K -- No --> L[Read from swap device]
K -- Yes --> M[zswap_decompress\ninto target folio]
M --> N[Invalidate entry,\nfree from pool]
end
subgraph "Writeback path (pool full)"
O[shrink_worker /\nzswap_shrinker] --> P[Walk LRU,\nshrink_memcg_cb]
P --> Q{entry.referenced?}
Q -- Yes --> R[Clear referenced,\nrotate in LRU]
Q -- No --> S[zswap_writeback_entry]
S --> T[Decompress into\nswap cache folio]
T --> U[__swap_writepage\nto backing device]
T --> V[zswap_entry_free]
end
H -.->|pool full| O
Frontswap is gone
Prior to Linux 6.x, zswap hooked into the kernel via the frontswap interface (frontswap_store, frontswap_load, etc.). That interface was removed. zswap now integrates directly: zswap_store() and zswap_load() are called explicitly from the swap writeback and swapin code paths. The entry points are declared in include/linux/zswap.h.
Key Data Structures
struct zswap_pool
struct zswap_pool {
struct zs_pool *zs_pool; /* zsmalloc pool for compressed data */
struct crypto_acomp_ctx __percpu *acomp_ctx; /* per-CPU compression contexts */
struct percpu_ref ref; /* reference count */
struct list_head list; /* entry in global zswap_pools list */
struct work_struct release_work;
struct hlist_node node; /* for cpuhp callback */
char tfm_name[CRYPTO_MAX_ALG_NAME]; /* compressor name */
};
Each pool is tied to a single compression algorithm. When you change the compressor at runtime, a new pool is created. The old pool lives on until all its entries are evicted or loaded, then it is destroyed via zswap_pool_destroy().
struct zswap_entry
struct zswap_entry {
swp_entry_t swpentry; /* swap type + offset — the lookup key */
unsigned int length; /* compressed size in bytes;
== PAGE_SIZE means incompressible */
bool referenced; /* second-chance bit for LRU writeback */
struct zswap_pool *pool; /* owning pool */
unsigned long handle; /* zsmalloc allocation handle */
struct obj_cgroup *objcg;/* cgroup charge */
struct list_head lru; /* position in global zswap_list_lru */
};
Each compressed page stored in zswap has one zswap_entry. Entries are stored in per-swap-type xarrays, sharded into 64 MiB regions:
#define ZSWAP_ADDRESS_SPACE_SHIFT 14 /* 2^14 = 16384 pages = 64 MiB */
static struct xarray *zswap_trees[MAX_SWAPFILES];
The swap_zswap_tree() helper maps a swp_entry_t to the right xarray shard.
struct crypto_acomp_ctx
struct crypto_acomp_ctx {
struct crypto_acomp *acomp; /* async compression transform */
struct acomp_req *req; /* pre-allocated request */
struct crypto_wait wait; /* completion wait */
u8 *buffer;/* PAGE_SIZE scratch buffer */
struct mutex mutex; /* one operation at a time per CPU */
};
One context per CPU per pool. acomp_ctx_get_cpu_lock() acquires the mutex for the current CPU's context; acomp_ctx_put_unlock() releases it.
The Compression Pipeline
Store Path: zswap_store()
When the MM subsystem wants to swap out a folio, it calls zswap_store(struct folio *folio):
zswap_store(folio)
├── Check zswap_enabled
├── obj_cgroup_may_zswap() ← cgroup zswap.max limit
├── zswap_check_limits() ← global max_pool_percent limit
├── zswap_pool_current_get() ← RCU-safe ref on current pool
└── for each page in folio:
zswap_store_page(page, objcg, pool)
├── zswap_entry_cache_alloc() ← kmem_cache alloc
├── zswap_compress(page, entry, pool)
│ ├── acomp_ctx_get_cpu_lock()
│ ├── crypto_acomp_compress() + crypto_wait_req()
│ ├── if dlen >= PAGE_SIZE and writeback enabled:
│ │ store uncompressed (dlen = PAGE_SIZE)
│ └── zs_malloc() + zs_obj_write()
├── xa_store() into swap xarray
└── zswap_lru_add() ← entry.referenced = true
If the store fails for any reason (pool full, allocation failure, compression error), any previously stored entry for that swap slot is invalidated via xa_erase() + zswap_entry_free(). This prevents stale compressed data from being written over by a new version of the page.
Compression: zswap_compress()
zswap uses the kernel's async compression API (crypto/acompress.h):
- The input scatter-gather list is set to the source page.
crypto_acomp_compress()is called — despite the async API, zswap waits synchronously viacrypto_wait_req().- If
dlen >= PAGE_SIZE(incompressible) and writeback is permitted for this cgroup, the page is stored uncompressed (entry->length == PAGE_SIZE). This preserves LRU ordering so cold incompressible pages can still be written back. zs_malloc()allocates space in the zsmalloc pool;zs_obj_write()writes the compressed bytes.
Incompressible pages
When a page cannot be compressed below PAGE_SIZE, zswap stores the raw content using kmap_local_page() and copies it directly. The entry is tracked as incompressible via zswap_stored_incompressible_pages. This counter is exposed in debugfs.
If writeback is disabled for the cgroup (memory.zswap.writeback=0), incompressible pages are rejected outright — there is no point storing an uncompressible page if it can never be evicted.
Decompression: zswap_decompress()
Called from both zswap_load() (fault path) and zswap_writeback_entry() (writeback path):
zs_obj_read_sg_begin() // map zsmalloc object as scatter-gather
if entry->length == PAGE_SIZE:
memcpy_from_sglist() // incompressible: straight copy
else:
crypto_acomp_decompress() + crypto_wait_req()
zs_obj_read_sg_end()
zsmalloc objects can span page boundaries, so the input is always a scatter-gather list of 1–2 entries.
Load Path: zswap_load()
On a page fault, swapin code calls zswap_load(struct folio *folio):
- Look up
swp_entry_tin the xarray → getzswap_entry. - Call
zswap_decompress()into the fault's target folio. - If loading into the swap cache (the common case), invalidate the zswap entry immediately — the swap cache becomes the authoritative owner.
- Mark the folio up-to-date and unlock it.
Large folio limitation
zswap_load() explicitly rejects large folios with WARN_ON_ONCE(folio_test_large(folio)) and returns -EINVAL. Large folios may be only partially stored in zswap (each constituent page independently), and the load path does not handle this case. The result is a SIGBUS from do_swap_page().
Pool Backend: zsmalloc
In current kernels, zswap exclusively uses zsmalloc as its memory allocator. The older backends (zbud, z3fold) are no longer used by zswap.
CONFIG_ZSWAP selects CONFIG_ZSMALLOC automatically.
How zsmalloc Works
zsmalloc is a variable-size slab allocator designed for compressed page storage. Its key properties:
| Property | Detail |
|---|---|
| Allocation granularity | 8-byte aligned size classes from ~32 bytes to PAGE_SIZE |
| Internal structure | Groups of physical pages called zspages |
| Addressing | Objects are not directly addressable — accessed via opaque handle |
| Fragmentation control | Pages grouped by fullness ratio (0%, 10%, …, 99%, 100%) |
| Maximum pages per zspage | Configurable via CONFIG_ZSMALLOC_CHAIN_SIZE |
Because zsmalloc objects are not directly addressable, zs_obj_write() and zs_obj_read_sg_begin() handle the mapping and scatter-gather setup internally.
graph LR
subgraph "zsmalloc pool"
subgraph "size class: 128 bytes"
Z1[zspage\n4KB physical\n→ 31 objects]
Z2[zspage\n4KB physical\n→ 31 objects]
end
subgraph "size class: 512 bytes"
Z3[zspage\n4KB physical\n→ 7 objects]
end
subgraph "size class: 3840 bytes"
Z4[zspage\n8KB physical\n→ 2 objects]
end
end
E1[zswap_entry\nhandle=H1] -->|zs_obj_write| Z1
E2[zswap_entry\nhandle=H2] -->|zs_obj_write| Z3
The high object density of zsmalloc (especially for small compressed outputs) is why it replaced zbud (2 pages per physical page) and z3fold (up to 3 pages per physical page) — workloads with good compression ratios benefit enormously.
Pool Management and Writeback
Global Pool Size Limit
The pool is limited to totalram_pages() * zswap_max_pool_percent / 100 physical pages of compressed storage. zswap_check_limits() is called on every store:
static bool zswap_check_limits(void)
{
unsigned long cur_pages = zswap_total_pages();
unsigned long max_pages = zswap_max_pages();
if (cur_pages >= max_pages) {
zswap_pool_limit_hit++;
zswap_pool_reached_full = true;
} else if (zswap_pool_reached_full &&
cur_pages <= zswap_accept_thr_pages()) {
zswap_pool_reached_full = false;
}
return zswap_pool_reached_full;
}
When the pool hits the limit, zswap_pool_reached_full = true. New stores are rejected until the pool drains back below accept_threshold_percent of the maximum (default 90%). This hysteresis prevents thrashing at the boundary.
When the pool is full and a store fails, zswap_store() queues zswap_shrink_work to trigger background writeback.
The Global LRU
All zswap_entry objects across all pools share a single global list_lru:
The LRU is NUMA-aware and memcg-aware. Entries are added to the LRU at store time with entry->referenced = true.
The Shrinker and Writeback: shrink_worker()
Two mechanisms trigger writeback to the backing swap device:
1. Memory pressure shrinker (zswap_shrinker):
Registered as a standard kernel shrinker (SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE). When the memory allocator sees pressure, it calls zswap_shrinker_scan() → list_lru_shrink_walk() → shrink_memcg_cb() for each LRU entry.
2. Pool-full work queue (shrink_worker):
When zswap_pool_reached_full is set and a store fails, shrink_worker() is queued on the zswap-shrink workqueue. It iterates memcgs round-robin and calls shrink_memcg() on each, continuing until the pool drains below zswap_accept_thr_pages().
Second-Chance LRU Algorithm
shrink_memcg_cb() implements a second-chance eviction policy:
if (entry->referenced) {
entry->referenced = false;
return LRU_ROTATE; /* give it another chance */
}
/* referenced == false: write back to swap device */
writeback_result = zswap_writeback_entry(entry, swpentry);
New entries start with referenced = true. The shrinker clears this flag on first encounter and rotates the entry to the tail. Only on the second encounter (when referenced is already false) does writeback happen.
Shrinker Calibration
zswap_shrinker_count() scales the number of shrinkable objects by several factors to avoid over-shrinking:
- Compression ratio:
mult_frac(nr_freeable, nr_backing, nr_stored)— if the pool compresses 4:1, fewer physical pages are freed per writeback, so fewer candidates are reported. - Disk swapin penalty:
nr_disk_swapins(tracked inzswap_lruvec_state.nr_disk_swapins) is subtracted fromnr_freeable. If we observe swapins from disk, it means we previously over-evicted from zswap, so we slow down.
zswap_writeback_entry()
The writeback of a single entry:
zswap_writeback_entry(entry, swpentry)
├── swap_cache_alloc_folio() ← allocate folio in swap cache
├── xa_load() to verify entry is still valid
├── zswap_decompress(entry, folio)
├── xa_erase() from xarray
├── zswap_entry_free() ← free from pool
├── folio_mark_uptodate()
└── __swap_writepage() ← issue write to backing swap device
If a concurrent swapin allocated the folio first (!folio_was_allocated), writeback is skipped — the page just became hot, so evicting it would be wrong.
Same-Filled Page Detection
Note on this kernel version
The upstream kernel documentation for older versions describes a same-value-filled (zero page) optimization where pages are checked for uniform content before compression, and only the fill pattern is stored. However, searching the current mm/zswap.c source reveals that this optimization is not present in this version of the code. There is no page_same_filled() call or similar logic in zswap_store_page() or zswap_compress(). Incompressible pages are handled (stored uncompressed at full PAGE_SIZE when writeback is enabled), but there is no separate path for same-value detection.
The stored_incompressible_pages debugfs counter tracks pages stored at full size.
Per-Cgroup Limits
Three cgroup v2 files control zswap per cgroup:
| File | Description |
|---|---|
memory.zswap.max |
Maximum bytes of compressed memory this cgroup can use in zswap. Default: max (unlimited). |
memory.zswap.current |
Read-only: current bytes used in zswap by this cgroup. |
memory.zswap.writeback |
Whether pages can be written back from zswap to the swap device. 0 disables writeback for this cgroup. Default: 1. |
obj_cgroup_may_zswap() checks memory.zswap.max before each store. If the cgroup is over its limit, zswap_store() first tries to shrink this cgroup's pages via shrink_memcg(). If shrinking fails, the store is rejected and the folio goes directly to the swap device.
# Limit a cgroup to 512 MiB of zswap usage
echo 536870912 > /sys/fs/cgroup/myapp/memory.zswap.max
# Disable zswap writeback for a cgroup (pages stay in zswap or are rejected)
echo 0 > /sys/fs/cgroup/myapp/memory.zswap.writeback
writeback=0 and incompressible pages
If memory.zswap.writeback=0 and a page is incompressible, zswap rejects the store entirely (returns false from zswap_compress()). The page then falls through to the swap device. This can cause a feedback loop: the same incompressible page may be rejected repeatedly by zswap and keep triggering swap I/O.
Sysfs Tunables
All tunables live under /sys/module/zswap/parameters/:
| Parameter | Default | Description |
|---|---|---|
enabled |
CONFIG_ZSWAP_DEFAULT_ON |
Enable/disable zswap at runtime. Disabling stops new stores but does not flush existing entries. |
compressor |
CONFIG_ZSWAP_COMPRESSOR_DEFAULT (lzo) |
Compression algorithm. Changing creates a new pool; old pools drain and are freed. |
max_pool_percent |
20 |
Maximum percentage of total RAM the compressed pool may occupy. |
accept_threshold_percent |
90 |
After hitting max_pool_percent, resume accepting pages only when pool drops to this percentage of max. Setting to 100 disables hysteresis. |
shrinker_enabled |
CONFIG_ZSWAP_SHRINKER_DEFAULT_ON (off) |
Enable the memory-pressure-driven shrinker. When off, writeback only happens when the pool is full. |
# Enable zswap at runtime
echo 1 > /sys/module/zswap/parameters/enabled
# Switch compressor to zstd (creates a new pool; old entries remain in lzo pool)
echo zstd > /sys/module/zswap/parameters/compressor
# Increase pool limit to 30% of RAM
echo 30 > /sys/module/zswap/parameters/max_pool_percent
# Enable the memory pressure shrinker
echo Y > /sys/module/zswap/parameters/shrinker_enabled
# Reduce hysteresis — resume accepting when pool is at 70% of max
echo 70 > /sys/module/zswap/parameters/accept_threshold_percent
Alternatively, set these at boot time:
Kconfig Options
| Option | Description |
|---|---|
CONFIG_ZSWAP |
Enable zswap support. Automatically selects CONFIG_ZSMALLOC. |
CONFIG_ZSWAP_DEFAULT_ON |
Enable zswap by default at boot. |
CONFIG_ZSWAP_SHRINKER_DEFAULT_ON |
Enable the memory pressure shrinker by default. |
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO |
Default compressor: LZO |
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 |
Default compressor: LZ4 |
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC |
Default compressor: LZ4HC |
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD |
Default compressor: zstd |
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE |
Default compressor: deflate |
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 |
Default compressor: 842 |
CONFIG_ZSMALLOC |
zsmalloc allocator (required by zswap). |
CONFIG_ZSMALLOC_STAT |
Export per-size-class statistics for zsmalloc via debugfs. |
Observability
debugfs: /sys/kernel/debug/zswap/
All counters are read-only and approximate (not atomically consistent with each other, but individually accurate to u64):
| File | Description |
|---|---|
pool_total_size |
Total bytes of physical RAM consumed by all zswap pools (compressed data only). |
stored_pages |
Number of pages currently stored in zswap (compressed + incompressible). |
stored_incompressible_pages |
Pages stored at full PAGE_SIZE (could not be compressed below PAGE_SIZE). |
pool_limit_hit |
Times a store was rejected because the pool was at max_pool_percent. |
written_back_pages |
Pages successfully written back to the backing swap device. |
reject_reclaim_fail |
Store rejected: pool was full and background writeback failed. |
reject_alloc_fail |
Store rejected: zs_malloc() returned an error (pool exhausted). |
reject_kmemcache_fail |
Store rejected: zswap_entry metadata allocation failed (rare). |
reject_compress_fail |
Store rejected: compression algorithm returned an error. |
reject_compress_poor |
Store rejected: compressed output did not fit in the allocator (ENOSPC). |
decompress_fail |
Decompression failed during load or writeback (data corruption indicator). |
# Quick zswap health check
for f in /sys/kernel/debug/zswap/*; do
printf "%-40s %s\n" "$(basename $f)" "$(cat $f)"
done
vmstat Events
Three events are counted in /proc/vmstat:
| Event | Description |
|---|---|
zswpout |
Pages stored into zswap (successful zswap_store()). |
zswpin |
Pages loaded from zswap (successful zswap_load()). |
zswpwb |
Pages written back from zswap to the swap device. |
Per-Cgroup Stats
For cgroup v2, memory.stat includes:
| Field | Description |
|---|---|
zswap |
Bytes of compressed memory used in zswap by this cgroup. |
zswapped |
Number of pages currently stored in zswap by this cgroup. |
Estimating Compression Ratio
stored=$(cat /sys/kernel/debug/zswap/stored_pages)
pool_bytes=$(cat /sys/kernel/debug/zswap/pool_total_size)
page_size=4096
if [ "$stored" -gt 0 ]; then
uncompressed=$((stored * page_size))
echo "Stored pages: $stored"
echo "Uncompressed: $((uncompressed / 1048576)) MiB"
echo "Compressed: $((pool_bytes / 1048576)) MiB"
echo "Ratio: $(echo "scale=2; $uncompressed / $pool_bytes" | bc):1"
fi
Interaction with zram
zswap can sit in front of a zram device. The full stack looks like this:
graph TD
A[Anonymous page\nneeds to be swapped] --> B[zswap_store]
B -->|compressed + stored| C[zswap pool\nin RAM]
B -->|pool full / rejected| D[swap_writepage]
D --> E[zram block device\n/dev/zram0]
E -->|compressed again!| F[zram pool\nin RAM]
C -->|writeback| D
Double compression
When zswap sits in front of zram, pages that miss the zswap pool (or get written back by the shrinker) are compressed a second time by zram. This means:
- CPU overhead is doubled for those pages.
- The additional compression benefit is minimal — data already compressed by zswap will not compress meaningfully again.
- Memory saving is marginal because zswap already compressed the data.
Recommendation: Use either zswap (with a real swap device) or zram, not both. If your goal is to avoid all disk I/O, zram alone is simpler and more efficient. If you have a swap device and want to reduce I/O, use zswap in front of it.
If you must use both (e.g., the system has both a zram swap and a disk swap), consider disabling zswap for the zram swap type.
Performance Tradeoffs
CPU vs. Memory vs. I/O
Low compression ratio (data is random, already compressed):
→ zswap stores at PAGE_SIZE (incompressible)
→ Pool fills quickly (no density gain over uncompressed swap)
→ Writeback to swap device happens soon anyway
→ Net result: CPU overhead for no benefit
High compression ratio (text, code, zeroed pages):
→ 4:1 or better compression is common
→ Pool holds 4x more pages than uncompressed
→ Significant reduction in swap I/O
→ Net result: CPU overhead well justified
Compressor Comparison
| Algorithm | Speed | Ratio | Best for |
|---|---|---|---|
lzo |
Fastest | Moderate | Interactive desktops, latency-sensitive |
lz4 |
Very fast | Moderate | High-throughput servers, low-overhead environments |
lz4hc |
Moderate | Better than lz4 | Balanced; slower compression, same decompression speed as lz4 |
zstd |
Moderate | Best | High memory pressure; worth the CPU cost for good ratios |
deflate |
Slow | Good | Legacy; generally superseded by zstd |
842 |
Hardware-accelerated (IBM POWER) | Moderate | POWER systems with hardware offload |
Choosing a compressor
- For latency-sensitive workloads:
lzoorlz4— fast decompression keeps fault latency low. - For memory-constrained systems where saving RAM matters most:
zstd— highest density. - The default
lzois a safe middle ground for most use cases.
The compressor can be changed at runtime without flushing existing entries:
Old entries remain in their old pool and are decompressed with the old algorithm until they are evicted.When to Enable the Shrinker
By default (CONFIG_ZSWAP_SHRINKER_DEFAULT_ON=n), writeback only triggers when the pool hits max_pool_percent. This means cold pages can accumulate in the pool indefinitely until the limit is reached.
Enabling the shrinker (shrinker_enabled=Y) allows the memory allocator to proactively write cold zswap pages to the backing swap device under memory pressure — before the pool fills. This:
- Reduces the risk of the pool becoming a "black hole" for cold memory.
- Introduces more swap I/O under sustained memory pressure.
- Is governed by the compression ratio and disk swapin penalty to avoid over-eviction.
Pool Size Tuning
The default max_pool_percent=20 is conservative. On systems with ample RAM and slow swap (spinning disk), a larger pool is beneficial:
On systems where RAM is more precious (containers, embedded), keep the default or reduce it.
Initialization and Lifecycle
zswap initializes lazily via late_initcall(zswap_init). This ensures the crypto subsystem is available:
zswap_init()
└── zswap_setup()
├── KMEM_CACHE(zswap_entry, 0)
├── cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE,
│ zswap_cpu_comp_prepare, zswap_cpu_comp_dead)
├── alloc_workqueue("zswap-shrink", WQ_UNBOUND|WQ_MEM_RECLAIM, 1)
├── zswap_alloc_shrinker()
├── list_lru_init_memcg(&zswap_list_lru, zswap_shrinker)
├── shrinker_register(zswap_shrinker)
├── __zswap_pool_create_fallback() ← creates the initial pool
└── zswap_debugfs_init()
CPU hotplug callbacks (zswap_cpu_comp_prepare / zswap_cpu_comp_dead) allocate and free the per-CPU crypto_acomp_ctx as CPUs come online and offline.
When a swap device is enabled (swapon), zswap_swapon() allocates the xarray shards for that swap type. When disabled (swapoff), zswap_swapoff() frees them (all entries must have been invalidated by try_to_unuse() first).
Key Source Files
| File | Purpose |
|---|---|
mm/zswap.c |
Main zswap implementation: store, load, writeback, shrinker, pool management, debugfs |
include/linux/zswap.h |
Public API: zswap_store(), zswap_load(), zswap_invalidate(), zswap_swapon(), zswap_swapoff(); struct zswap_lruvec_state with nr_disk_swapins |
mm/zsmalloc.c |
zsmalloc allocator: variable-size compressed object pool used by zswap |
include/linux/zsmalloc.h |
zsmalloc API: zs_create_pool(), zs_malloc(), zs_obj_write(), zs_obj_read_sg_begin(), zs_free(), zs_destroy_pool() |
mm/memcontrol.c |
Per-cgroup zswap limits: memory.zswap.max, memory.zswap.current, memory.zswap.writeback, obj_cgroup_may_zswap() |
mm/Kconfig |
Kconfig options: CONFIG_ZSWAP, CONFIG_ZSWAP_DEFAULT_ON, CONFIG_ZSWAP_SHRINKER_DEFAULT_ON, compressor defaults |
include/linux/vm_event_item.h |
vmstat events: ZSWPIN, ZSWPOUT, ZSWPWB |
include/linux/memcontrol.h |
MEMCG_ZSWAP_B, MEMCG_ZSWAPPED, obj_cgroup_may_zswap(), mem_cgroup_zswap_writeback_enabled(); zswap_max field in struct mem_cgroup |
Documentation/admin-guide/mm/zswap.rst |
Upstream admin guide (may lag the source) |
Further reading
- mm/zswap.c — complete zswap implementation: store, load, writeback, shrinker, pool management, and debugfs counters
Documentation/admin-guide/mm/zswap.rst— upstream admin guide covering tunables, cgroup controls, and observability- swap — swap architecture overview including how zswap fits into the broader swap stack alongside zram and disk swap
- swap-thrashing — how enabling zswap or zram can reduce I/O-driven thrashing by keeping compressed pages in RAM
- slab — the zsmalloc allocator used internally by zswap to store variable-size compressed objects with high density
- zswap: compressed swap caching (LWN, 2013) — original LWN coverage of Seth Jennings's zswap patch at the time of its v3.11 merge
- Cleancache and frontswap (LWN, 2011) — background on the frontswap interface that zswap originally used (since removed in favor of direct integration)