Skip to content

Crypto War Stories

Real incidents: timing attacks, IV reuse, key leaks, and broken RNG

These are five real-world problems rooted in kernel crypto behavior. Each follows the same structure: the problem, how it was diagnosed, the root cause in kernel internals, and the fix.


1. IV reuse in dm-crypt with plain IV mode

Problem

A system administrator configured a dm-crypt volume using the legacy aes-cbc-plain cipher specification. After deploying it to a fleet of servers, a security review found that an attacker with read access to the raw block device could determine when two sectors contained identical plaintext.

Diagnosis

The cipher specification was:

cryptsetup create myvolume /dev/sda2 --cipher aes-cbc-plain --key-size 256

The "plain" IV mode means the IV for each sector is simply the 32-bit sector number:

/* drivers/md/dm-crypt.c */
static int crypt_iv_plain_gen(struct crypt_config *cc, u8 *iv,
                               struct dm_crypt_request *dmreq)
{
    memset(iv, 0, cc->iv_size);
    *(__le32 *)iv = cpu_to_le32(dmreq->iv_sector & 0xffffffff);
    return 0;
}

With AES-CBC and a static IV derived only from a 32-bit sector number, any two sectors with the same sector number that happen to have the same first block of plaintext will produce identical first blocks of ciphertext. This is the key CBC IV-reuse failure mode: if two messages share the same IV and key, an attacker can detect when their first plaintext blocks are identical by comparing the first ciphertext blocks.

AES_CBC(key, IV=42, plaintext_v1)[block_0] == AES_CBC(key, IV=42, plaintext_v2)[block_0]
    iff plaintext_v1[block_0] == plaintext_v2[block_0]

This is a watermarking / traffic analysis attack: an attacker with read access to the raw device can detect when two versions of the same sector begin with the same 16-byte block. This is distinct from the XOR-of-ciphertexts-equals-XOR-of-plaintexts leakage, which applies to stream cipher modes (CTR, OFB) — not CBC. In CBC, IV reuse reveals block equality, not plaintext content directly.

# Confirm the cipher mode
cryptsetup status myvolume
# cipher: aes-cbc-plain    ← vulnerable

Root cause

aes-cbc-plain was the historical default in older cryptsetup versions. It was superseded by aes-cbc-essiv:sha256 (which derives the IV as AES_encrypt(SHA256(volume_key), sector_num), preventing watermarking) and then by aes-xts-plain64 (which uses a mathematically sound tweakable block cipher mode that does not have CBC's IV-reuse vulnerability).

The "plain" suffix means: the IV is the literal sector number, no derivation. The "plain64" suffix uses a 64-bit sector number, fixing the original plain's 32-bit overflow issue.

Fix

Migrate to AES-XTS, which eliminates the IV-reuse problem by design:

# Check current setup
cryptsetup benchmark
# Testing 128 bit cipher AES-XTS...   1623.5 MiB/s (encrypt)  1607.2 MiB/s (decrypt)
# Testing 128 bit cipher AES-CBC...   1784.2 MiB/s (encrypt)   532.1 MiB/s (decrypt)
# Note: AES-XTS decrypt is ~3x faster than AES-CBC decrypt

# New volumes: use LUKS2 with the default cipher (aes-xts-plain64)
cryptsetup luksFormat --type luks2 /dev/sda2
# Default cipher since cryptsetup 2.x: aes-xts-plain64

# Existing volume: must re-encrypt (in-place re-encryption available in LUKS2)
cryptsetup reencrypt --cipher aes-xts-plain64 --key-size 512 /dev/sda2

AES-XTS uses two keys (hence 512 bits for AES-256-XTS: 256-bit data key + 256-bit tweak key) and a tweak derived from the sector number in GF(2^128), making each sector's encryption mathematically independent even for identical plaintext sectors.


2. Timing side-channel in MAC verification: crypto_memneq

Problem

A kernel module implementing a custom authentication protocol compared authentication tags using memcmp(). Under carefully crafted network inputs, an attacker on the local network was able to distinguish valid from invalid authentication tags by measuring response latency with microsecond precision.

Diagnosis

The vulnerable code:

/* Incorrect: timing-variable comparison */
if (memcmp(received_tag, expected_tag, TAG_LEN) != 0) {
    return -EBADMSG;
}

memcmp() returns as soon as it finds a differing byte. An attacker sending MAC tags that differ at the last byte (rather than the first) causes memcmp() to run longer. With enough samples (millions of requests), the byte-by-byte comparison is statistically distinguishable, allowing a timing oracle attack to reconstruct the expected tag one byte at a time.

This is the same class of vulnerability as the 2013 Lucky13 TLS attack and various HMAC verification bugs.

Root cause

memcmp() is defined to return after the first difference — this is correct for general use, but wrong for secret comparison. The compiler is also free to optimize comparison code in ways that create timing variation.

Linux added crypto_memneq() in kernel 3.14 (commit b839da0f) specifically to address this:

/* include/crypto/algapi.h */
static inline int crypto_memneq(const void *a, const void *b, size_t size)
{
    return __crypto_memneq(a, b, size);
}

/* crypto/memneq.c */
noinline int __crypto_memneq(const void *a, const void *b, size_t size)
{
    unsigned long neq = 0;

    /* Accumulate XOR of all bytes. Result is 0 iff a == b.
     * Every byte is always read; no early exit. */
    while (size >= sizeof(unsigned long)) {
        neq |= *(unsigned long *)a ^ *(unsigned long *)b;
        a   += sizeof(unsigned long);
        b   += sizeof(unsigned long);
        size -= sizeof(unsigned long);
    }
    while (size > 0) {
        neq |= *(unsigned char *)a ^ *(unsigned char *)b;
        a++;
        b++;
        size--;
    }
    return (neq != 0) ? 1 : 0;
}

The function is compiled with special care:

# crypto/Makefile
CFLAGS_memneq.o := -Os

The -Os (optimize for size) flag prevents loop unrolling and early-exit optimizations that could introduce timing variation. Combined with the noinline attribute (which prevents inlining that could allow the surrounding code's optimization context to affect it) and OPTIMIZER_HIDE_VAR() (which hides the accumulator from the optimizer), the function maintains its constant-time property.

Fix

Replace all memcmp() calls in authentication paths with crypto_memneq():

/* Correct: constant-time comparison */
#include <crypto/algapi.h>

if (crypto_memneq(received_tag, expected_tag, TAG_LEN)) {
    return -EBADMSG;
}

The AEAD crypto API (crypto_aead_decrypt()) already uses constant-time comparison internally for authentication tag verification. For custom code, crypto_memneq() is the right tool. The kernel also provides crypto_authenc_extractkeys() and higher-level constructs that handle this correctly.


3. getrandom() blocking at boot: services hang waiting for entropy

Problem

A fleet of KVM virtual machines running a custom Linux-based appliance exhibited a reproducible hang during boot: sshd took 60–90 seconds to start, systemd journal showed services timing out, and the boot would eventually continue but with a severely delayed network stack.

Diagnosis

The symptom was traced to getrandom() blocking:

# On a slow-boot VM, check blocked processes
cat /proc/*/wchan | sort | uniq -c | sort -rn | head
# 12 random_read_iter     ← 12 processes blocked in random_read_iter

The blocked call stack (from a kernel oops or via sysrq-T):

[<0>] __schedule+0x3c4/0xa80
[<0>] schedule+0x4a/0xb0
[<0>] getrandom_wait+0x...     ← waiting for crng_init_done
[<0>] sys_getrandom+0x...

getrandom() blocks until 256 bits of entropy are collected into the CRNG. On this VM:

# Check whether virtio-rng is present
lsmod | grep virtio_rng
# (empty — not loaded)

# No virtio-rng, no RDRAND passthrough, no saved seed:
dmesg | grep -E "random:|crng"
# [    0.301234] random: fast init done
# [  127.441821] random: crng init done  ← 127 seconds!

The VM was: - Running under KVM without virtio-rng device configured - CPU feature masking prevented RDRAND from being visible to the guest - systemd-random-seed.service was loading the seed file too late in the boot sequence (after network.target, which itself needed sshd which needed getrandom...)

Root cause

Circular dependency in boot:

network.target
    needs sshd
        needs getrandom() (blocks for entropy)
            needs: interrupt jitter, virtio-rng, or saved seed
                saved seed loaded by systemd-random-seed.service
                    After basic.target (late)

On virtual machines, interrupt jitter is low (the hypervisor delivers predictable timer interrupts). Without virtio-rng, the kernel collects entropy very slowly.

Fix

Three independent fixes, applied together for defense in depth:

1. Add virtio-rng to the VM definition (best fix):

<!-- QEMU/libvirt: add a virtio-rng device -->
<rng model='virtio'>
  <backend model='random'>/dev/urandom</backend>
</rng>

The guest kernel driver (drivers/char/hw_random/virtio-rng.c) calls add_hwgenerator_randomness(), feeding host entropy into the guest pool immediately.

2. Ensure systemd-random-seed loads early:

# Verify the service is enabled and not in a late target
systemctl cat systemd-random-seed.service | grep -A 5 '\[Unit\]'
# Before= should include sysinit.target or similar early target

3. On bare metal with a TPM or Intel CPU:

# Enable rng-tools to harvest from hardware
apt install rng-tools
systemctl enable --now rngd
# rngd reads /dev/hwrng (which uses RDRAND/TPM) and feeds /dev/random

After adding virtio-rng:

dmesg | grep "crng init done"
# [    1.234567] random: crng init done  ← 1.2 seconds, normal

4. Kernel keyring leak via /proc

Problem

A security audit found that a non-privileged process could enumerate key descriptions from other users' keyrings by reading /proc/keys. The concern was: could this leak sensitive information about key existence or purpose?

Diagnosis

# Any process can read /proc/keys
cat /proc/keys
# 0c7ec5e3 I--Q--   1 perm 1f3f0000     0     0 keyring _uid_ses.0
# 2effa75e I--Q--   1 perm 1f3f0000  1000  1000 user    myapp:db_password
#                                    ^^^^
#                                    This key's description leaks the fact
#                                    that "myapp" has a "db_password" key

The key description (name) is visible to all processes that can read /proc/keys, even if they cannot read the payload. This is by design — /proc/keys shows all keys in the system but is filtered by the calling process's view permission.

Root cause

/proc/keys is implemented in security/keys/proc.c. The kernel calls key_task_permission() for each key, with the KEY_NEED_VIEW permission bit:

/* security/keys/proc.c */
static int proc_keys_show(struct seq_file *m, void *v)
{
    struct key *key = v;
    ...

    /* Skip keys this process can't view */
    rc = key_task_permission(make_key_ref(key, 0), current_cred(),
                              KEY_NEED_VIEW);
    if (rc < 0)
        return 0;   /* silently skip */

    /* Show description, type, permissions, uid, gid, expiry */
    /* Does NOT show payload */
    seq_printf(m, "%08x %s%s%s%s%s%s %5d %3d %s %s\n",
               key->serial, ...
               key->type->name,
               key->description);
    return 0;
}

The default permissions for user-created keys give view permission to the world (other bits include 0x01 = view). A key with perm = 0x1f3f0000 has: - possessor: 0x1f = view|read|write|search|link (setattr bit 0x20 is NOT set) - user: 0x3f = all permissions (view|read|write|search|link|setattr) - group: 0x00 = no permissions - other: 0x00 = no permissions

But: in this case the key was created with permissive other permissions:

keyctl setperm <serial> 0x3f3f3f01  # other can "view" — bug!

Or the application used a key description that exposed sensitive information (e.g., the description contained a hostname or database name).

Root cause (continued): logon keys prevent this

Keys of type "logon" cannot be read even by the owning process — the .read callback returns -EPERM. This is why fscrypt and cryptsetup use logon keys for actual key material: even with view permission on the description, the payload is unreachable:

# logon key: visible in /proc/keys but payload unreadable
keyctl add logon fscrypt:abc123 "$(dd if=/dev/urandom bs=32 count=1 2>/dev/null)" @s

# Description visible:
# 1234abcd I--Q--   1 perm 1f1f0000  1000  1000 logon   fscrypt:abc123

# Payload is inaccessible:
keyctl print 1234abcd
# keyctl_read_alloc: Permission denied

Fix

  1. Use logon keys for sensitive material — applications that store secrets in the keyring should use "logon" type so the payload is never readable from userspace.

  2. Use opaque descriptions — if using "user" type keys, the description should not encode sensitive information (e.g., use a UUID rather than myapp:db_password).

  3. Set restrictive permissions — create keys with other=0x00:

keyctl add user myapp:token "..." @s
keyctl setperm <serial> 0x3f3f0000  # possessor and user only; group and other: nothing
  1. Use /proc/key-users (not /proc/keys) to monitor quota without exposing descriptions — it only shows counts per UID.

5. Hardware accelerator returning wrong results: silent ciphertext corruption

Problem

A server using an Intel QuickAssist Technology (QAT) accelerator for AES-GCM encryption in a TLS proxy began producing authentication failures on approximately 0.01% of requests. Application logs showed intermittent -EBADMSG from the kernel AEAD decrypt path. The failures were random in timing and not reproducible with specific inputs.

Diagnosis

The proxy was using the qat kernel driver which registers a high-priority AES-GCM implementation:

cat /proc/crypto | grep -A 15 "name.*gcm(aes)"
# name         : gcm(aes)
# driver       : qat_aes_gcm
# module       : intel_qat
# priority     : 4001        ← higher than AES-NI (800)
# type         : aead
# async        : yes

The hardware was selected by priority. Disabling the QAT module forced fallback to AES-NI:

# Test: bypass QAT by unloading the driver
rmmod intel_qat
# Failures stopped. The QAT hardware was producing corrupt output.

The root cause was a firmware bug in a specific QAT revision that incorrectly handled scatter-gather lists spanning a 4GB physical address boundary.

Root cause: CRYPTO_ALG_TESTED and the self-test framework

The kernel's crypto subsystem has a mandatory self-test framework (crypto/testmgr.c) that runs known-answer tests on every registered algorithm before it is marked usable. An algorithm that fails its self-test gets the CRYPTO_ALG_TESTED flag withheld and cannot be used:

/* include/linux/crypto.h */
#define CRYPTO_ALG_TESTED       0x00000400

/* An algorithm is usable only if this bit is set (or testing is disabled) */

The self-test for AES-GCM uses hardcoded test vectors from the NIST CAVP test suite:

/* crypto/testmgr.h — excerpt (illustrative; actual file has hundreds of vectors) */
static const struct aead_testvec aes_gcm_tv_template[] = {
    {
        .key   = "\x00\x00\x00\x00\x00\x00\x00\x00"
                 "\x00\x00\x00\x00\x00\x00\x00\x00",
        .klen  = 16,
        .iv    = "\x00\x00\x00\x00\x00\x00\x00\x00"
                 "\x00\x00\x00\x00",
        .ptext = "",
        .plen  = 0,
        .aad   = "",
        .alen  = 0,
        .ctext = "\x58\xe2\xfc\xce\xfa\x7e\x30\x61"
                 "\x36\x7f\x1d\x57\xa4\xe7\x45\x5a", /* auth tag only */
        .clen  = 16,
    },
    /* ... many more vectors ... */
};

The QAT driver's implementation passed the self-tests (which use small, aligned test vectors) but failed on real workloads with large scatter-gather lists crossing address boundaries. The self-test framework caught algorithm-level bugs, but this was a hardware bug that only manifested with specific DMA layouts.

Fix

Immediate: disable the QAT for AES-GCM by reducing its priority or unloading the driver.

In the driver: add a DMA boundary check in the scatter-gather setup:

/* Check that no DMA segment crosses a 4GB boundary */
static int qat_check_sg_alignment(struct scatterlist *sg, int nents)
{
    struct scatterlist *s;
    int i;

    for_each_sg(sg, s, nents, i) {
        dma_addr_t start = sg_dma_address(s);
        dma_addr_t end   = start + sg_dma_len(s) - 1;
        if ((start >> 32) != (end >> 32))
            return -EINVAL;   /* crosses 4GB boundary, use fallback */
    }
    return 0;
}

When the boundary check fails, the driver falls back to the software AES-GCM implementation using the same fallback pattern described in crypto_engine.

Longer term: the kernel's tcrypt selftest mode can be used to stress-test with larger and more varied inputs:

# Run the AEAD test suite (requires CONFIG_CRYPTO_TEST)
# Note: tcrypt mode numbers are not stable across kernel versions.
# The gcm(aes) aead test is in the mode 150s range; verify in your kernel's crypto/tcrypt.c.
modprobe tcrypt mode=154   # gcm(aes) aead test — verify in your kernel's crypto/tcrypt.c
# testing gcm(aes)...
# test 0 (512 byte blocks): passed.
# ...

The CRYPTO_ALG_TESTED mechanism ensures that a driver with broken self-tests is never exposed to callers. However, it cannot protect against latent hardware bugs that pass self-tests but fail under real DMA conditions. Drivers should implement their own correctness checks for hardware-specific edge cases, and production deployments should validate hardware with workload-realistic test patterns before relying on new accelerators.


Summary: lessons from the field

Incident Core mistake Correct approach
IV reuse (dm-crypt) Legacy aes-cbc-plain cipher string Use aes-xts-plain64 (LUKS2 default)
Timing attack (memcmp) memcmp() for secret comparison crypto_memneq() from <crypto/algapi.h>
Boot entropy starvation No hardware RNG in VM virtio-rng device + saved seed
Keyring description leak Sensitive data in key description Opaque descriptions + logon key type
Hardware silent corruption Trusted self-tests as sufficient validation DMA boundary checks + fallback

Further reading