Docker
This chapter covers the basic theory around Docker containers security and how to escape them.
Theory
Docker containers are not a security technology by themselves, but rather a packaging and deployment mechanism that leverages existing Linux kernel security features. Docker security is based on 3 major mechanisms creating process isolation and resource control : namespaces, cgroups, capabilities.
Docker's security model assumes proper configuration of these mechanisms. Misconfigurations in any layer can lead to container escape and host compromise.
Namespaces: Process-Level Isolation
Namespaces are a Linux kernel feature that partitions kernel resources so that one set of processes sees one set of resources while another set of processes sees a different set of resources. Docker leverages six types of namespaces:
PID Namespace: Isolates process IDs. Processes in a container see only processes within the same container, with the main process appearing as PID 1.
Network Namespace: Isolates network interfaces, routing tables, and firewall rules. Each container gets its own network stack.
Mount Namespace: Isolates filesystem mount points. Containers see only their designated filesystem view.
User Namespace: Maps container user IDs to different host user IDs (often disabled by default).
UTS Namespace: Isolates hostname and domain name. Each container can have its own hostname.
IPC Namespace: Isolates inter-process communication resources like shared memory and semaphores.
Control Groups (cgroups): Resource Limitation and Monitoring
cgroups): Resource Limitation and Monitoring cgroups are a Linux kernel feature that limits and isolates resource usage (CPU, memory, disk I/O, network) of a collection of processes. Docker uses cgroups to:
CPU Control: Limit CPU usage and set scheduling priorities
Memory Control: Limit RAM usage and prevent memory bombs
Block I/O Control: Limit disk read/write operations
Network Control: Limit network bandwidth (with additional tools)
Device Control: Control access to devices like
/dev/sda
Capabilities: Privilege Control
Traditional Unix systems have a binary privilege model: root (UID 0) has all privileges, while non-root users have limited privileges. Linux capabilities break down root privileges into distinct units, allowing fine control over what privileged operations a process can perform.
Docker containers run with a restricted set of capabilities by default, dropping dangerous ones like:
CAP_SYS_ADMIN(system administration)CAP_SYS_MODULE(kernel module loading)CAP_SYS_PTRACE(process tracing)CAP_DAC_OVERRIDE(override permissions sets by file owners)CAP_SETUID CAP_SETGID(modify it's own user or group set)CAP_NET_ADMIN(handle network related tasks on the host)
Practice
In real life situation we will need to perform a Docker container escape after obtaining a foothold on a running container. Those escape techniques allows to pivot to the host running the container and thus gain privileges.
Rapid Security Assessment
When landing on an asset running docker containers, the following command can help obtain crucial information that could be helpful for PrivEsc.
Capabilities Abuse
Docker drops many dangerous capabilities by default, but adding back a single high-risk capability can fully break isolation.
High-risk capabilities include:
CAP_SYS_ADMIN(most dangerous)CAP_SYS_PTRACECAP_SYS_MODULECAP_DAC_OVERRIDECAP_NET_ADMINCAP_SETUID,CAP_SETGID
Privileged containers (--privileged) disable most security features, granting nearly all capabilities and device access. They provide multiple escape vectors.
Detection
If CapEff is close to 0000003fffffffff means that this container is effectively privileged
CAP_SYS_ADMIN
CAP_SYS_ADMINCAP_SYS_ADMIN allows mounting filesystems, creating namespaces, and accessing devices. In a Docker escape context you can thus mount the host file system into the container directly.
Direct disk mount:
Not guaranteed with CAP_SYS_ADMIN alone.
Mount namespace manipulation :
Always possible with CAP_SYS_ADMINh but does not equal host escape by itself. This gives filesystem control inside the container, not the host.
Condition
Can mount host FS?
CAP_SYS_ADMIN only
No
CAP_SYS_ADMIN + privileged
Yes
CAP_SYS_ADMIN + /dev mount
Yes
CAP_SYS_ADMIN + docker.sock
Yes
CAP_SYS_ADMIN + correct Docker defaults
No
CAP_SYS_PTRACE
CAP_SYS_PTRACECAP_SYS_PTRACE and –pid=host allows attaching to host processes and executing code in their context. The flag –pid=host allows a container to operate within the same process namespace as the host system.
Host-level code execution (requires gdb):
Process injection:
The injector I use:
CAP_SYS_MODULE
CAP_SYS_MODULECAP_SYS_MODULE allows loading arbitrary kernel modules, leads to instant host compromise.
The Kernel module backdoor I use:
CAP_DAC_OVERRIDE
CAP_DAC_OVERRIDECAP_DAC_OVERRIDE allows to bypass file permission checks (requires host file system mount).
Namespaces Abuse
If a container shares namespaces with the host or gains the ability to enter them, isolation collapses.
Dangerous namespace configurations:
--pid=host--net=host--ipc=hostWritable
/proc,/sysnsenteravailable
Detection
Look for:
PidMode: "host"NetworkMode: "host"
PID Namespace Escape (--pid=host)
--pid=host)Host processes are visible and targetable.
You we now inside the host namespaces.
/proc Abuse
/proc Abuse/proc exposes kernel and process internals.
This is one of the simplest real-world escapes.
Mount Namespace Abuse
Writable mount namespaces allow filesystem manipulation.
Docker Socket Namespace Escape
Docker socket allows to reach the Docker Engine API from a container thanks to the socket. If Docker CLI is available on the container you can interact with other containers, spin new ones, ...
Spin a new container with a host bind mount and privileges:
This is the most common container escape in the wild.
cgroups Abuse
cgroups AbuseCgroups limit resources, but misconfigured cgroups can be abused to execute code on the host. This attack is independent of Docker and targets the Linux kernel directly.
Detection
Writable cgroup mounts are a red flag.
cgroup release_agent Escape
release_agent EscapeWhen a cgroup is released, the kernel executes a user-defined binary on the host.
This results in host command execution.
Resources
Last updated
Was this helpful?
