Default capabilities in 2023/4 - do they reasonably prevent the container root escape / break out?

Hi Docker Community,

So far I have understood the containers are also helping with deployments security a lot.

Now, there is a camp that challenges it, namely that the default user, the container root, is deemed to have by default dangerously high runtime capabilities. I understand that the main concern in that camp is the so-called container root escape / break out, becoming the root user on the runtime host.

Looking at highly popular official image(s) such as apache/httpd, I see it go(es) with the defaults (uses the container root and is to be run with the default capabilities). So in the most used official image(s), such security fears are not reflected, as if the default capabilities set would prevent the container root escape / break out. Is it still considered a good example, please?

My quick, non-repeatable and a bit naive search has showed that in last five years there has been at least one known container root escape / break out CVE for the container images being ran with the default capabilities.

Best regards
Petr

Without claiming to be security expert, I can say this. Capabilities are what the kernel allows you to do. You won’t break out simply by having a root user in the container unless some conditions are met or there is a bug in the kernel or a tool that allows you to execute codes outside the container or access/write something in the memory. I’m pretty sure some bugs could allow you to break out even with a non-root user, it just makes it harder as a non-root user can’t execute commands in another namespace and doesn’t have access to everything on the filesystem.

But the point is not that you don’t have root use rin the container, as with proper privileges you can always execute commands in the container as root. UID 0 can always be used. The point is to make sure the application with which the users or other services interact doesn’t have root privileges so even if they hack the application thorugh a webinterface for example, they can’t execute codes as root. In case of the Apache HTTPD server it means the main HTTP process runs as root, but it forks other processes to run as daemon or www-data.

Of course than then you need to make sure the user of the webserver can’t write the data folders unless it hs to. The HTTPD container could run as daemon and PHP as www-data so you would allow the PHP container to write data but not HTTPD. Then the hacker can hack the PHP container through a PHP site…

Even if you have a non-root user, if you mount some host folders, and the owners of the files in it just happen to be the same as the user that runs the application, those files will be writable. If you can write a script from a container and the script runs periodically on the host, you could even create a user and give SSH access to the host.

Let’s say you are confident the application is not hackable or at least the final process with which the users interact is not running as root. Maybe someone changes the image that you use. They hack the repository of the official httpd image and excute commands as root before they start the actual httpd process. So some platforms don’t allow you to run containers as root regardless of how that process would immediately fork new processes running as non-root.

So I think normally you can’t break out, but you can do harm if you have more capabilities than you need. For example if you have access to the network devices, you can break the host network. If you have access to disks, you can delete data or access data you wouldn’t share.

These are the current default capabilities:

  • “CAP_CHOWN”
  • “CAP_DAC_OVERRIDE”
  • “CAP_FSETID”
  • “CAP_FOWNER”
  • “CAP_MKNOD”
  • “CAP_NET_RAW”
  • “CAP_SETGID”
  • “CAP_SETUID”
  • “CAP_SETFCAP”
  • “CAP_SETPCAP”
  • “CAP_NET_BIND_SERVICE”
  • “CAP_SYS_CHROOT”
  • “CAP_KILL”
  • “CAP_AUDIT_WRITE”

I would have to read about all to tell you what these allow, but the Docker documentation says:

By default Docker drops all capabilities except those needed

One easy way to “break out” would be to allow

  • Allow the container to manage namespaces
  • and share the pid namespace of the host with the container
  • and run a command in the container as root
docker run --rm -it --privileged --pid host ubuntu:22.04 nsenter --all -t 1 sh

I put the “break out” part between quotation marks as this is not really breaking out but specifically allowing the container to do everything and run a shell on the host outside the container’s namespaces.

This is exactly what can be used to enter the virtual machine of Docker Desktop and how Lens Desktop from Mirantis is able to open a shell on the Kubernetes nodes.

1 Like

Dear Akos,

Thank you for your elaborated and inspiring answer
(that I have just discovered as quarantined).

Best regards
Petr