Every program you run on Linux begins as a clone of another process. Your shell calls fork() to duplicate itself, calls exec() to replace the duplicate’s memory image with the new program, and eventually the new program calls exit() to terminate. This three-step lifecycle, fork, exec, exit, is the foundation of how Linux manages work.
In this lab you will attach eBPF programs to three kernel tracepoints simultaneously and watch the complete lifecycle of processes in real time. Lab 4 loaded two programs (a kprobe and a kretprobe) for the first time; this lab extends that pattern to three tracepoint programs sharing a single event struct and ring buffer.
Note
Prerequisites This tutorial is part of the Tuxscope series. You need a built tuxscope binary, Linux 5.8+, and root privileges.
The process lifecycle
When you type ls in a terminal, the kernel does not simply “run ls.” Three distinct operations happen in sequence.
fork: duplicating the parent
fork() (or its modern variant clone()) creates a new process by duplicating the calling process. The child gets a copy of the parent’s memory, file descriptors, and execution context. Immediately after fork, parent and child are nearly identical, only their return values from fork() differ.
Parent (bash, PID 1000)
│
│ fork()
│
├──────────────┐
│ │
Parent (1000) Child (1001)
returns 1001 returns 0The kernel assigns a new PID to the child but preserves the parent-child relationship through the ppid (parent PID) field. This builds a process tree rooted at PID 1 (init/systemd).
exec: replacing the image
The child process is still running a copy of bash. To become ls, it calls execve("/usr/bin/ls", ...), which replaces the entire memory image, code, data, stack, heap, with the new program. The PID stays the same. Only the contents change.
Child (PID 1001) Child (PID 1001)
┌─────────────────┐ ┌─────────────────┐
│ bash code │ exec() │ ls code │
│ bash data │ ──────► │ ls data │
│ bash stack │ │ ls stack │
└─────────────────┘ └─────────────────┘
same PID, completely different programexit: terminating
When ls finishes, it calls exit(). The kernel reclaims the process’s memory and file descriptors, records the exit code, and notifies the parent. The parent calls wait() to collect the exit status, which removes the process from the process table entirely.
PID vs TGID
Linux threads share the same address space but each has a unique task ID. The kernel tracks two values:
- PID (in kernel terms): the unique identifier for each task (thread)
- TGID (thread group ID): the PID of the thread group leader, which is what userspace sees as the “PID”
When you call bpf_get_current_pid_tgid(), it returns a 64-bit value: the upper 32 bits are the TGID and the lower 32 bits are the PID. For single-threaded processes, they are identical. For multithreaded programs, the TGID is the main thread’s PID.
bpf_get_current_pid_tgid() returns:
┌──────────────────────────────────────────┐
│ TGID (upper 32 bits) │ PID (lower 32) │
└──────────────────────────────────────────┘
tgid = value >> 32
pid = value & 0xFFFFFFFFThe eBPF programs
This lab attaches to three tracepoints in the sched subsystem:
| Tracepoint | Fires when |
|---|---|
sched/sched_process_fork | A process calls fork()/clone() |
sched/sched_process_exec | A process calls execve() |
sched/sched_process_exit | A task exits |
The event struct
All three programs push events into a shared ring buffer using the same struct:
#[repr(C)]
pub struct ProcEvent {
pub pid: u32,
pub ppid: u32,
pub event_type: u8, // 0 = fork, 1 = exec, 2 = exit
pub _padding: [u8; 3],
pub timestamp_ns: u64,
pub comm: [u8; 16],
}The event_type field distinguishes which tracepoint generated the event. This is a common eBPF pattern: use a single event struct with a discriminator field rather than separate structs for each probe.
Note that ppid is only populated for fork events, where the tracepoint context provides both parent and child PIDs. The exec and exit handlers run in the context of the current task and do not look up the parent, so their ppid field is zero.
The fork handler
The sched_process_fork tracepoint provides parent and child PIDs at fixed offsets in its context structure. The eBPF program reads them directly:
fn try_proc_fork(ctx: &TracePointContext) -> Result<u32, i64> {
let parent_pid: i32 = unsafe { ctx.read_at(24)? };
let child_pid: i32 = unsafe { ctx.read_at(44)? };
let event = ProcEvent {
pid: child_pid as u32,
ppid: parent_pid as u32,
event_type: 0, // fork
_padding: [0; 3],
timestamp_ns: unsafe { bpf_ktime_get_ns() },
comm: bpf_get_current_comm().map_err(|e| e as i64)?,
};
PROC_EVENTS.output(&event, 0).map_err(|_| 1i64)?;
Ok(0)
}The offsets 24 and 44 come from the tracepoint format file at /sys/kernel/debug/tracing/events/sched/sched_process_fork/format. You can inspect this file to see the exact field layout:
sudo cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/formatYou should see something close to this (the line numbers are not the line numbers in the file — the kernel just lists fields):
field:char parent_comm[16]; offset:8; size:16; signed:1;
field:pid_t parent_pid; offset:24; size:4; signed:1;
field:char child_comm[16]; offset:28; size:16; signed:1;
field:pid_t child_pid; offset:44; size:4; signed:1;The two offsets the BPF code reads — 24 for parent_pid, 44 for child_pid — must match those offset: values exactly. If they don’t, the load will succeed but every event will report nonsense PIDs.
Warning
Validate before trusting the output Tracepoint format offsets can shift between kernel versions and between architectures (x86_64 vs arm64 in particular). The fastest way to validate without re-deriving offsets is to compare a known-good event:
# Run something forky in one terminal: bash -c 'sleep 1' & # In another, watch raw tracefs output: sudo cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/format sudo cat /sys/kernel/debug/tracing/trace_pipe | head -5 # Compare the parent/child PIDs from trace_pipe with what your BPF program emits.If the BPF output doesn’t match
trace_pipe, your offsets are wrong for this kernel; re-read the format file before tuning anything else.
The exec handler
Unlike fork, the exec and exit handlers use bpf_get_current_pid_tgid() because the relevant process is the currently executing task:
fn try_proc_exec(_ctx: &TracePointContext) -> Result<u32, i64> {
let tgid = (bpf_get_current_pid_tgid() >> 32) as u32;
let event = ProcEvent {
pid: tgid,
ppid: 0,
event_type: 1, // exec
_padding: [0; 3],
timestamp_ns: unsafe { bpf_ktime_get_ns() },
comm: bpf_get_current_comm().map_err(|e| e as i64)?,
};
PROC_EVENTS.output(&event, 0).map_err(|_| 1i64)?;
Ok(0)
}The exit handler
The exit handler records that the current task is exiting. The sched_process_exit tracepoint does not expose an exit status directly, so this lab records the PID, timestamp, and command name only:
fn try_proc_exit(_ctx: &TracePointContext) -> Result<u32, i64> {
let tgid = (bpf_get_current_pid_tgid() >> 32) as u32;
let event = ProcEvent {
pid: tgid,
ppid: 0,
event_type: 2, // exit
_padding: [0; 3],
timestamp_ns: unsafe { bpf_ktime_get_ns() },
comm: bpf_get_current_comm().map_err(|e| e as i64)?,
};
PROC_EVENTS.output(&event, 0).map_err(|_| 1i64)?;
Ok(0)
}This detail matters for interpretation:
sched_process_exitis task-oriented, not strictly process-oriented- this lab records the TGID for exec and exit so simple single-process examples line up around one userspace PID
- a multithreaded program can therefore emit multiple
EXITevents with the same visible PID
If you need a true exit status, you would have to read additional task state or use a different hook. This lab stays focused on lifecycle transitions rather than return codes.
Loading three programs from one ELF
All three programs are compiled into a single ELF object file, just as the kprobe and kretprobe were in Lab 4. The userspace loader (using aya) selectively attaches each program to its corresponding tracepoint:
let program_fork = bpf.program_mut("proc_fork").unwrap();
program_fork.load()?;
program_fork.attach("sched", "sched_process_fork")?;
let program_exec = bpf.program_mut("proc_exec").unwrap();
program_exec.load()?;
program_exec.attach("sched", "sched_process_exec")?;
let program_exit = bpf.program_mut("proc_exit").unwrap();
program_exit.load()?;
program_exit.attach("sched", "sched_process_exit")?;This cumulative design is intentional. Each lab adds new programs to the same ELF, and the userspace code selects which ones to load based on the subcommand. You are building a single instrumentation binary that grows more capable with each lab.
Running it
Start tuxscope in one terminal:
sudo tuxscope procIn another terminal, run a simple command:
bash -c "ls /tmp"This single command produces the complete fork-exec-exit sequence. You will see rows like:
4521 3890 bash FORK
4521 0 bash EXEC
4522 4521 bash FORK
4522 0 ls EXEC
4522 0 ls EXIT
4521 0 bash EXITRead this from top to bottom:
- Your shell (PID 3890) forks to create PID 4521
- PID 4521 execs into
bash(the-csubshell) - That subshell forks PID 4522
- PID 4522 execs into
ls lsexits- The subshell exits
JSON output
For structured output suitable for piping into jq or logging systems:
sudo tuxscope proc --format json{"pid":4521,"ppid":3890,"comm":"bash","event_type":"fork","timestamp_ns":128934567890}
{"pid":4521,"ppid":0,"comm":"bash","event_type":"exec","timestamp_ns":128934568120}
{"pid":4522,"ppid":4521,"comm":"bash","event_type":"fork","timestamp_ns":128934572340}
{"pid":4522,"ppid":0,"comm":"ls","event_type":"exec","timestamp_ns":128934572890}
{"pid":4522,"ppid":0,"comm":"ls","event_type":"exit","timestamp_ns":128934589100}
{"pid":4521,"ppid":0,"comm":"bash","event_type":"exit","timestamp_ns":128934589450}Filtering by PID
On a busy system, process events fire constantly. --pid matches the event’s pid field only. That is useful when you already know the process you want to follow, but it does not automatically include descendants:
sudo tuxscope proc --pid 3890Note
For fork events,
--pidmatches the child PID. To study a parent and all of its children, collect JSON and filter onppidor correlate fork events in post-processing:sudo tuxscope proc --format json | jq 'select(.ppid == 3890)'
Exercises
-
Trace a pipeline. Run
cat /etc/passwd | grep root | wc -land trace the process events. How many fork/exec pairs do you see? Which processes share a parent PID? Draw the process tree. -
Observe a daemon. Start a service (e.g.,
sudo systemctl restart sshd) and watch the fork/exec sequence. Daemons typically double-fork to detach from the parent, can you see this pattern in the event stream? -
Trace a multithreaded program. Write a small program that starts several threads and then exits. How many
EXITevents do you see for the same visible PID? Compare that to what you learned in the PID vs TGID section. -
Measure fork-to-exec latency. Using the
timestamp_nsfield in JSON output, calculate the time between a fork event and the corresponding exec event for the same PID. What is the typical latency on your system? Does it change under load?
What’s next
In Lab 6: Memory Observation, you will attach to the page fault and OOM killer tracepoints to watch how Linux manages virtual memory in real time. You will see the kernel’s demand-paging mechanism at work and learn why page faults are a normal, expected part of program execution.