Skip to content

What Happens When a Process Wakes Up

From wake_up_process() to running on a CPU

Overview

A process wakes up when something it was waiting for becomes available — a lock is released, data arrives on a socket, a timer fires, or a signal is delivered. The wakeup path must transition the task from sleeping to runnable, place it on the right CPU, and potentially preempt whatever is currently running there.

flowchart TB
    A["Event occurs<br/>(I/O complete, lock release, timer)"]
    B["wake_up_process(p)<br/>or wake_up_interruptible()"]
    C["try_to_wake_up(p, state, flags)"]
    D["select_task_rq(p)<br/>Pick target CPU"]
    E["ttwu_queue(p, cpu, flags)"]
    F["activate_task()<br/>enqueue_task() via sched class"]
    G["wakeup_preempt()<br/>Should we preempt current?"]
    H["TIF_NEED_RESCHED set<br/>on target CPU"]
    I["Task runs at next<br/>scheduling point"]

    A --> B --> C --> D --> E --> F --> G --> H --> I

Task states

Before diving in, the relevant sleeping states:

// include/linux/sched.h
#define TASK_INTERRUPTIBLE      0x00000001  // can be woken by signals
#define TASK_UNINTERRUPTIBLE    0x00000002  // only explicit wakeup
#define TASK_WAKING             0x00000200  // being woken right now

TASK_INTERRUPTIBLE is the most common sleep state — used when waiting for I/O, locks, or events that might never come (so signals should be able to abort the wait). TASK_UNINTERRUPTIBLE is for waits that must complete — disk I/O in progress, kernel critical sections. This is the "D" state in ps output.

wake_up_process() uses TASK_NORMAL = TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE and will wake either.

Entry: wake_up_process()

// kernel/sched/core.c
int wake_up_process(struct task_struct *p)
{
    return try_to_wake_up(p, TASK_NORMAL, 0);
}

The kernel has several wake_up_* variants depending on which sleep states to wake:

Function Wakes
wake_up_process(p) TASK_INTERRUPTIBLE \| TASK_UNINTERRUPTIBLE
wake_up_interruptible(x) TASK_INTERRUPTIBLE only
wake_up(x) TASK_NORMAL (via wait queue)
wake_up_all(x) All tasks on a wait queue

try_to_wake_up()

The core wakeup function:

// kernel/sched/core.c
int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
{
    wake_flags |= WF_TTWU;

    // Fast path: waking ourselves (rare but valid)
    if (p == current) {
        if (!(READ_ONCE(p->__state) & state))
            return 0;
        WRITE_ONCE(p->__state, TASK_RUNNING);
        return 1;
    }

    // Acquire p->pi_lock to serialize with other wakers
    raw_spin_lock_irqsave(&p->pi_lock, flags);

    // Check if the task is actually in a wakeable state
    if (!(READ_ONCE(p->__state) & state))
        goto unlock;

    trace_sched_waking(p);

    // Mark transitioning: prevents signal-based double wakeup
    WRITE_ONCE(p->__state, TASK_WAKING);

    // If already on a runqueue, just fix up preemption
    if (READ_ONCE(p->on_rq)) {
        ttwu_runnable(p, wake_flags);
        goto unlock;
    }

    // Wait for task to be fully off the CPU it was running on
    smp_cond_load_acquire(&p->on_cpu, !VAL);  // spin until on_cpu == 0

    // Select target CPU
    cpu = select_task_rq(p, p->wake_cpu, &wake_flags);
    if (task_cpu(p) != cpu)
        set_task_cpu(p, cpu);

    // Enqueue on target CPU
    ttwu_queue(p, cpu, wake_flags);

unlock:
    raw_spin_unlock_irqrestore(&p->pi_lock, flags);
    ttwu_stat(p, task_cpu(p), wake_flags);
    return success;
}

The on_cpu spin

smp_cond_load_acquire(&p->on_cpu, !VAL);

If the task is still executing on another CPU (in the middle of __schedule() or running), smp_cond_load_acquire spins until p->on_cpu becomes 0. finish_task_switch() clears it with smp_store_release() after the task is fully off the CPU. This prevents placing the task on a new runqueue while it's still executing somewhere.

CPU selection: select_task_rq()

The scheduler must decide which CPU to place the woken task on. For SCHED_NORMAL tasks, select_task_rq_fair() makes this decision:

// kernel/sched/core.c
int select_task_rq(struct task_struct *p, int cpu, int wake_flags)
{
    if (p->nr_cpus_allowed > 1 && !is_migration_disabled(p))
        cpu = p->sched_class->select_task_rq(p, cpu, wake_flags);
    else
        cpu = cpumask_any(p->cpus_ptr);

    return cpu;
}

select_task_rq_fair() considers several factors:

Wake affinity: If the waker and wakee communicate frequently, placing the wakee near the waker improves cache locality. This is especially effective when the waker calls wake_up() and then blocks — the wakee can reuse the waker's cache contents.

Load balancing: The selected CPU should be idle or lightly loaded. The function walks scheduling domains (SMT → LLC → NUMA node → system) looking for a suitable CPU.

The WF_SYNC flag: Set when the waking task is about to block immediately after the wakeup (e.g., a producer sending to a consumer). In this case, the wakee might be placed on the waker's CPU since the waker is vacating it.

// kernel/sched/sched.h
#define WF_SYNC         0x10  // waker goes to sleep after wakeup
#define WF_FORK         0x02  // new task (fork)
#define WF_MIGRATED     0x20  // task was migrated to new CPU

Enqueuing: ttwu_queue() and activate_task()

// kernel/sched/core.c
static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
{
    struct rq *rq = cpu_rq(cpu);

    rq_lock(rq, &rf);
    update_rq_clock(rq);
    ttwu_do_activate(rq, p, wake_flags, &rf);
    rq_unlock(rq, &rf);
}

static void ttwu_do_activate(struct rq *rq, struct task_struct *p,
                              int wake_flags, struct rq_flags *rf)
{
    int en_flags = ENQUEUE_WAKEUP | ENQUEUE_NOCLOCK;

    // Activate (enqueue into sched class data structures)
    activate_task(rq, p, en_flags);

    // Check if newly runnable task should preempt current
    wakeup_preempt(rq, p, wake_flags);

    // Mark as runnable
    ttwu_do_wakeup(p);   // sets p->__state = TASK_RUNNING
}

activate_task() calls p->sched_class->enqueue_task(). For fair tasks this puts the task into the CFS RB-tree with a vruntime adjusted for how long it slept (via place_entity()).

Vruntime placement on wakeup

A task that slept for a long time has a stale (low) vruntime. If it returned to the RB-tree at that old vruntime, it could monopolize the CPU catching up. Instead, place_entity() places the waking task at approximately avg_vruntime - half_slice:

Before sleep:  [A: vr=100] [B: vr=105] [C: vr=110]
After A sleeps for 500ms:  avg_vruntime ≈ 300
A wakes, placed at: 300 - half_slice ≈ 297

A gets to run soon (it's near the front) but doesn't starve B and C (it's not so far back that it needs to run for 200ms straight to "catch up").

Preemption check: wakeup_preempt()

After enqueueing, the scheduler checks whether the newly runnable task should preempt whatever is currently running on the target CPU:

// kernel/sched/core.c
void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags)
{
    // Delegate to the running task's sched class
    rq->curr->sched_class->wakeup_preempt(rq, p, flags);
}

For fair_sched_class, wakeup_preempt_fair() checks whether the woken task's virtual deadline is earlier than the current task's. If so, it calls resched_curr(rq) to set TIF_NEED_RESCHED.

The target CPU won't preempt immediately — it sets the flag and waits for the next preemption point (interrupt exit, syscall return, or explicit schedule() call).

Cross-CPU wakeup

When waking a task on a different CPU:

sequenceDiagram
    participant W as Waker (CPU 0)
    participant T as Target (CPU 1)
    participant P as Prev task on CPU 1

    W->>W: try_to_wake_up(p)
    W->>T: set_task_cpu(p, 1)
    W->>T: ttwu_queue(p, 1, ...)
    T->>T: rq_lock(cpu_rq(1))
    T->>T: activate_task → enqueue_task_fair
    T->>T: wakeup_preempt → resched_curr(rq)
    T->>P: TIF_NEED_RESCHED set
    T->>T: rq_unlock
    P->>T: Hits preemption point
    T->>T: __schedule() → context_switch to p

The waker enqueues the task and sets TIF_NEED_RESCHED on the target CPU using an inter-processor interrupt (IPI) if needed via smp_send_reschedule().

Wait queues

Most kernel code doesn't call wake_up_process() directly — it uses wait queues:

// Kernel side: wake all tasks waiting on a queue
wake_up(&wq_head);            // wake TASK_NORMAL tasks
wake_up_interruptible(&wq);   // wake TASK_INTERRUPTIBLE only

// User side (inside driver/subsystem code):
wait_event(wq, condition);           // TASK_UNINTERRUPTIBLE
wait_event_interruptible(wq, cond);  // TASK_INTERRUPTIBLE

Wait queues serialize access to the condition variable, ensuring that wakeups aren't lost between when the condition is checked and when the task sleeps.

Observing wakeups

# Wakeup latency: time from wakeup to first execution
perf sched record -a sleep 5
perf sched latency --sort=max

# Trace individual wakeup events
trace-cmd record -e sched:sched_wakeup -e sched:sched_switch ./workload
trace-cmd report

# See which CPUs tasks are waking up on
perf script -i perf.data | grep sched_wakeup | awk '{print $NF}' | sort | uniq -c

# /proc/PID/schedstat: wakeup stats per task
cat /proc/$PID/schedstat
# Fields: cpu_time, runqueue_wait, timeslices_run

Further reading

  • Life of a Context Switch — What happens after TIF_NEED_RESCHED is set
  • CFS — How vruntime is adjusted on wakeup via place_entity()
  • EEVDF — How eligibility and deadlines affect wakeup preemption decisions