Tutorial

Tuxscope Lab 5: Tracing Process Lifecycle with eBPF

Trace fork, exec, and exit events in real time to understand how Linux creates, transforms, and destroys processes.

7 min read intermediate

Prerequisites

  • Completed Tuxscope Labs 1-4
  • Basic understanding of Linux processes
  • A built tuxscope binary (see Lab 1)
  • Linux 5.8+ with root privileges

Part 5 of 7 in Tuxscope: Linux Kernel Observability with eBPF

Table of Contents

Every program you run on Linux begins as a clone of another process. Your shell calls fork() to duplicate itself, calls exec() to replace the duplicate’s memory image with the new program, and eventually the new program calls exit() to terminate. This three-step lifecycle, fork, exec, exit, is the foundation of how Linux manages work.

In this lab you will attach eBPF programs to three kernel tracepoints simultaneously and watch the complete lifecycle of processes in real time. Lab 4 loaded two programs (a kprobe and a kretprobe) for the first time; this lab extends that pattern to three tracepoint programs sharing a single event struct and ring buffer.

Note

Prerequisites This tutorial is part of the Tuxscope series. You need a built tuxscope binary, Linux 5.8+, and root privileges.

The process lifecycle

When you type ls in a terminal, the kernel does not simply “run ls.” Three distinct operations happen in sequence.

fork: duplicating the parent

fork() (or its modern variant clone()) creates a new process by duplicating the calling process. The child gets a copy of the parent’s memory, file descriptors, and execution context. Immediately after fork, parent and child are nearly identical, only their return values from fork() differ.

  Parent (bash, PID 1000)

         │  fork()

         ├──────────────┐
         │              │
    Parent (1000)   Child (1001)
    returns 1001    returns 0

The kernel assigns a new PID to the child but preserves the parent-child relationship through the ppid (parent PID) field. This builds a process tree rooted at PID 1 (init/systemd).

exec: replacing the image

The child process is still running a copy of bash. To become ls, it calls execve("/usr/bin/ls", ...), which replaces the entire memory image, code, data, stack, heap, with the new program. The PID stays the same. Only the contents change.

  Child (PID 1001)                 Child (PID 1001)
  ┌─────────────────┐             ┌─────────────────┐
  │  bash code       │   exec()   │  ls code         │
  │  bash data       │  ──────►   │  ls data         │
  │  bash stack      │            │  ls stack        │
  └─────────────────┘             └─────────────────┘
  same PID, completely different program

exit: terminating

When ls finishes, it calls exit(). The kernel reclaims the process’s memory and file descriptors, records the exit code, and notifies the parent. The parent calls wait() to collect the exit status, which removes the process from the process table entirely.

PID vs TGID

Linux threads share the same address space but each has a unique task ID. The kernel tracks two values:

  • PID (in kernel terms): the unique identifier for each task (thread)
  • TGID (thread group ID): the PID of the thread group leader, which is what userspace sees as the “PID”

When you call bpf_get_current_pid_tgid(), it returns a 64-bit value: the upper 32 bits are the TGID and the lower 32 bits are the PID. For single-threaded processes, they are identical. For multithreaded programs, the TGID is the main thread’s PID.

bpf_get_current_pid_tgid() returns:

  ┌──────────────────────────────────────────┐
  │  TGID (upper 32 bits) │ PID (lower 32)  │
  └──────────────────────────────────────────┘

  tgid = value >> 32
  pid  = value & 0xFFFFFFFF

The eBPF programs

This lab attaches to three tracepoints in the sched subsystem:

TracepointFires when
sched/sched_process_forkA process calls fork()/clone()
sched/sched_process_execA process calls execve()
sched/sched_process_exitA task exits

The event struct

All three programs push events into a shared ring buffer using the same struct:

#[repr(C)]
pub struct ProcEvent {
    pub pid: u32,
    pub ppid: u32,
    pub event_type: u8, // 0 = fork, 1 = exec, 2 = exit
    pub _padding: [u8; 3],
    pub timestamp_ns: u64,
    pub comm: [u8; 16],
}

The event_type field distinguishes which tracepoint generated the event. This is a common eBPF pattern: use a single event struct with a discriminator field rather than separate structs for each probe.

Note that ppid is only populated for fork events, where the tracepoint context provides both parent and child PIDs. The exec and exit handlers run in the context of the current task and do not look up the parent, so their ppid field is zero.

The fork handler

The sched_process_fork tracepoint provides parent and child PIDs at fixed offsets in its context structure. The eBPF program reads them directly:

fn try_proc_fork(ctx: &TracePointContext) -> Result<u32, i64> {
    let parent_pid: i32 = unsafe { ctx.read_at(24)? };
    let child_pid: i32 = unsafe { ctx.read_at(44)? };

    let event = ProcEvent {
        pid: child_pid as u32,
        ppid: parent_pid as u32,
        event_type: 0, // fork
        _padding: [0; 3],
        timestamp_ns: unsafe { bpf_ktime_get_ns() },
        comm: bpf_get_current_comm().map_err(|e| e as i64)?,
    };
    PROC_EVENTS.output(&event, 0).map_err(|_| 1i64)?;
    Ok(0)
}

The offsets 24 and 44 come from the tracepoint format file at /sys/kernel/debug/tracing/events/sched/sched_process_fork/format. You can inspect this file to see the exact field layout:

sudo cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/format

You should see something close to this (the line numbers are not the line numbers in the file — the kernel just lists fields):

field:char parent_comm[16];   offset:8;  size:16; signed:1;
field:pid_t parent_pid;       offset:24; size:4;  signed:1;
field:char child_comm[16];    offset:28; size:16; signed:1;
field:pid_t child_pid;        offset:44; size:4;  signed:1;

The two offsets the BPF code reads — 24 for parent_pid, 44 for child_pid — must match those offset: values exactly. If they don’t, the load will succeed but every event will report nonsense PIDs.

Warning

Validate before trusting the output Tracepoint format offsets can shift between kernel versions and between architectures (x86_64 vs arm64 in particular). The fastest way to validate without re-deriving offsets is to compare a known-good event:

# Run something forky in one terminal:
bash -c 'sleep 1' &

# In another, watch raw tracefs output:
sudo cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/format
sudo cat /sys/kernel/debug/tracing/trace_pipe | head -5
# Compare the parent/child PIDs from trace_pipe with what your BPF program emits.

If the BPF output doesn’t match trace_pipe, your offsets are wrong for this kernel; re-read the format file before tuning anything else.

The exec handler

Unlike fork, the exec and exit handlers use bpf_get_current_pid_tgid() because the relevant process is the currently executing task:

fn try_proc_exec(_ctx: &TracePointContext) -> Result<u32, i64> {
    let tgid = (bpf_get_current_pid_tgid() >> 32) as u32;

    let event = ProcEvent {
        pid: tgid,
        ppid: 0,
        event_type: 1, // exec
        _padding: [0; 3],
        timestamp_ns: unsafe { bpf_ktime_get_ns() },
        comm: bpf_get_current_comm().map_err(|e| e as i64)?,
    };
    PROC_EVENTS.output(&event, 0).map_err(|_| 1i64)?;
    Ok(0)
}

The exit handler

The exit handler records that the current task is exiting. The sched_process_exit tracepoint does not expose an exit status directly, so this lab records the PID, timestamp, and command name only:

fn try_proc_exit(_ctx: &TracePointContext) -> Result<u32, i64> {
    let tgid = (bpf_get_current_pid_tgid() >> 32) as u32;

    let event = ProcEvent {
        pid: tgid,
        ppid: 0,
        event_type: 2, // exit
        _padding: [0; 3],
        timestamp_ns: unsafe { bpf_ktime_get_ns() },
        comm: bpf_get_current_comm().map_err(|e| e as i64)?,
    };
    PROC_EVENTS.output(&event, 0).map_err(|_| 1i64)?;
    Ok(0)
}

This detail matters for interpretation:

  • sched_process_exit is task-oriented, not strictly process-oriented
  • this lab records the TGID for exec and exit so simple single-process examples line up around one userspace PID
  • a multithreaded program can therefore emit multiple EXIT events with the same visible PID

If you need a true exit status, you would have to read additional task state or use a different hook. This lab stays focused on lifecycle transitions rather than return codes.

Loading three programs from one ELF

All three programs are compiled into a single ELF object file, just as the kprobe and kretprobe were in Lab 4. The userspace loader (using aya) selectively attaches each program to its corresponding tracepoint:

let program_fork = bpf.program_mut("proc_fork").unwrap();
program_fork.load()?;
program_fork.attach("sched", "sched_process_fork")?;

let program_exec = bpf.program_mut("proc_exec").unwrap();
program_exec.load()?;
program_exec.attach("sched", "sched_process_exec")?;

let program_exit = bpf.program_mut("proc_exit").unwrap();
program_exit.load()?;
program_exit.attach("sched", "sched_process_exit")?;

This cumulative design is intentional. Each lab adds new programs to the same ELF, and the userspace code selects which ones to load based on the subcommand. You are building a single instrumentation binary that grows more capable with each lab.

Running it

Start tuxscope in one terminal:

sudo tuxscope proc

In another terminal, run a simple command:

bash -c "ls /tmp"

This single command produces the complete fork-exec-exit sequence. You will see rows like:

4521      3890      bash                FORK
4521      0         bash                EXEC
4522      4521      bash                FORK
4522      0         ls                  EXEC
4522      0         ls                  EXIT
4521      0         bash                EXIT

Read this from top to bottom:

  1. Your shell (PID 3890) forks to create PID 4521
  2. PID 4521 execs into bash (the -c subshell)
  3. That subshell forks PID 4522
  4. PID 4522 execs into ls
  5. ls exits
  6. The subshell exits

JSON output

For structured output suitable for piping into jq or logging systems:

sudo tuxscope proc --format json
{"pid":4521,"ppid":3890,"comm":"bash","event_type":"fork","timestamp_ns":128934567890}
{"pid":4521,"ppid":0,"comm":"bash","event_type":"exec","timestamp_ns":128934568120}
{"pid":4522,"ppid":4521,"comm":"bash","event_type":"fork","timestamp_ns":128934572340}
{"pid":4522,"ppid":0,"comm":"ls","event_type":"exec","timestamp_ns":128934572890}
{"pid":4522,"ppid":0,"comm":"ls","event_type":"exit","timestamp_ns":128934589100}
{"pid":4521,"ppid":0,"comm":"bash","event_type":"exit","timestamp_ns":128934589450}

Filtering by PID

On a busy system, process events fire constantly. --pid matches the event’s pid field only. That is useful when you already know the process you want to follow, but it does not automatically include descendants:

sudo tuxscope proc --pid 3890

Note

For fork events, --pid matches the child PID. To study a parent and all of its children, collect JSON and filter on ppid or correlate fork events in post-processing:

sudo tuxscope proc --format json | jq 'select(.ppid == 3890)'

Exercises

  1. Trace a pipeline. Run cat /etc/passwd | grep root | wc -l and trace the process events. How many fork/exec pairs do you see? Which processes share a parent PID? Draw the process tree.

  2. Observe a daemon. Start a service (e.g., sudo systemctl restart sshd) and watch the fork/exec sequence. Daemons typically double-fork to detach from the parent, can you see this pattern in the event stream?

  3. Trace a multithreaded program. Write a small program that starts several threads and then exits. How many EXIT events do you see for the same visible PID? Compare that to what you learned in the PID vs TGID section.

  4. Measure fork-to-exec latency. Using the timestamp_ns field in JSON output, calculate the time between a fork event and the corresponding exec event for the same PID. What is the typical latency on your system? Does it change under load?

What’s next

In Lab 6: Memory Observation, you will attach to the page fault and OOM killer tracepoints to watch how Linux manages virtual memory in real time. You will see the kernel’s demand-paging mechanism at work and learn why page faults are a normal, expected part of program execution.