TL;DR
In Debian 10, with rootless docker 20.10.0 I’m having the following issues:
- With cgroups 2:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:326: applying cgroup configuration for process caused: could not execute systemctl --user --no-pager show-environment, output="Failed to get environment: Access denied\n": exit status 1: unknown.
- With cgroups 1 and
fuse-overlayfs
I am able to build images and run containers okay, but am not understanding permissions for bind-mounts for non-root users inside the container.
(Also, I also posted this to /r/docker on reddit, I hope that’s not against the guidelines)
This is going to be a lot of text, but if anybody here can help me pick at the edges of this I’d appreciate any insight. There are a few different issues I’m trying to tackle from different angles, but this is all stemming from my attempts in the last day or so to play with rootless mode in Docker 20.10.0. Everything I’m doing here is being attempted in Debian 10. I’ll put a Vagrantfile at the bottom of the post for reproducibility’s sake.
First Issue: cgroups v2
My first attempts were using cgroups v2, enabled by adding systemd.unified_cgroup_hierarchy=1
to the kernel parameters. From what I can see, I enabled cgroups 2 correctly on the system and have rootless docker running:
✔ vagrant@debian-10 ~ ▶ mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
✔ vagrant@debian-10 ~ ▶ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.9.0-0.bpo.2-amd64 root=/dev/mapper/debian--10--vg-root ro net.ifnames=0 biosdevname=0 debian-installer=en_US.UTF-8 random.trust_cpu=on elevator=deadline systemd.unified_cgroup_hierarchy=1
✔ vagrant@debian-10 ~ ▶ docker info
Client:
Context: default
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 3
Server Version: 20.10.0
Storage Driver: fuse-overlayfs
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
init version: de40ad0
Security Options:
seccomp
Profile: default
rootless
cgroupns
Kernel Version: 5.9.0-0.bpo.2-amd64
Operating System: Debian GNU/Linux 10 (buster)
OSType: linux
Architecture: x86_64
CPUs: 6
Total Memory: 15.64GiB
Name: debian-10
ID: GZUR:W75Y:WQID:BRYZ:TZOZ:NQMP:UXRA:WI5P:VMZB:L7SB:WH6Q:Z22P
Docker Root Dir: /home/vagrant/.local/share/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: No kernel memory TCP limit support
WARNING: No oom kill disable support
WARNING: No cpuset support
WARNING: Support for cgroup v2 is experimental
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
✔ vagrant@debian-10 ~ ▶ systemctl --user status docker.service
● docker.service - Docker Application Container Engine (Rootless)
Loaded: loaded (/home/vagrant/.config/systemd/user/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2020-12-10 13:51:52 UTC; 5min ago
Docs: https://docs.docker.com/engine/security/rootless/
Main PID: 574 (rootlesskit)
Tasks: 62
Memory: 354.5M
CPU: 11.694s
CGroup: /user.slice/user-1000.slice/user@1000.service/docker.service
├─574 rootlesskit --net=vpnkit --mtu=1500 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /home/vagrant/bin/dockerd-rootless.sh --storage-d
├─583 /proc/self/exe --net=vpnkit --mtu=1500 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /home/vagrant/bin/dockerd-rootless.sh --storag
├─594 vpnkit --ethernet /tmp/rootlesskit505778783/vpnkit-ethernet.sock --mtu 1500 --host-ip 0.0.0.0
├─615 dockerd --storage-driver=fuse-overlayfs
└─673 containerd --config /run/user/1000/docker/containerd/containerd.toml --log-level info
✔ vagrant@debian-10 ~ ▶ cat /etc/systemd/system/user@.service.d/delegate.conf
[Service]
Delegate=cpu cpuset io memory pids
However, I hit a wall pretty early with cgroups v2:
✔ vagrant@debian-10 ~ ▶ docker run --rm hello-world:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:326: applying cgroup configuration for process caused: could not execute `systemctl --user --no-pager show-environment`, output="Failed to get environment: Access denied\n": exit status 1: unknown.
I played around with it a little bit more, but Google didn’t really help me out much so eventually gave up and switched back to systemd.unified_cgroup_hierarchy=0
.
Second Issue: non-root write access to bind mounts in containers
I followed the instructions in the rootless docker docs for Debian/Ubuntu systems:
✔ vagrant@debian-10 ~ ▶ grep unprivileged /etc/sysctl.conf
kernel.unprivileged_userns_clone=1
✔ vagrant@debian-10 ~ ▶ cat /etc/modprobe.d/docker.conf
options overlay permit_mounts_in_userns=1
I’m trying to use fuse-overlayfs
as the filesystem driver. At first I was having some problems with building images, but then I found this thread where /u/GertVanAntwerpen figured out that the version of fuse-overlayfs
available in buster is too old. I installed the version from bullseye instead, and at this point was able to build images and run basic things:
✔ vagrant@debian-10 ~ ▶ docker info
Client:
Context: default
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 1
Server Version: 20.10.0
Storage Driver: fuse-overlayfs
Logging Driver: json-file
Cgroup Driver: none
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
init version: de40ad0
Security Options:
seccomp
Profile: default
rootless
Kernel Version: 5.9.0-0.bpo.2-amd64
Operating System: Debian GNU/Linux 10 (buster)
OSType: linux
Architecture: x86_64
CPUs: 6
Total Memory: 15.64GiB
Name: debian-10
ID: AEGL:6UDY:FR3E:CVKX:QOKU:6E54:CAHS:SN3J:ER32:LEWP:BOBB:LAMR
Docker Root Dir: /home/vagrant/.local/share/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: Running in rootless-mode without cgroups. To enable cgroups in rootless-mode, you need to boot the system in cgroup v2 mode.
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
✔ vagrant@debian-10 ~ ▶ docker run --rm hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
I was testing out bind mounts, and things work okay if I’m root
inside the container:
✘ vagrant@debian-10 ~ ▶ docker run -Pit --rm --mount type=bind,source=/home/vagrant/tmp,target=/host --user 0 debian:buster-slim /bin/bash
root@b72944c8421d:/# ls -l / | grep host
drwxr-xr-x 2 root root 4096 Dec 10 18:28 host
root@b72944c8421d:/# mount|grep host
/dev/mapper/debian--10--vg-root on /host type ext4 (rw,relatime,errors=remount-ro)
...
root@b72944c8421d:/# echo 'hey' > /host/hey.txt
root@b72944c8421d:/# cat /host/hey.txt
hey
root@b72944c8421d:/# rm /host/hey.txt
root@b72944c8421d:/# touch /host/hereiam.txt
root@b72944c8421d:/# exit
✔ vagrant@debian-10 ~ ▶ ls -l tmp/
total 0
-rw-r--r-- 1 vagrant vagrant 0 Dec 10 18:30 hereiam.txt
So as you can see I have read/write access to the bind mount directory as root
inside the container, and the file that got created outside the container is owned by my user.
What I’m a little bit confused about is how to make this work as a non-root user inside the container. I created this Dockerfile and built an image:
FROM debian:buster-slim
ARG DEFAULT_UID=1000
ARG DEFAULT_GID=1000
ENV DEFAULT_UID $DEFAULT_UID
ENV DEFAULT_GID $DEFAULT_GID
ENV PUSER "user"
ENV PGROUP "user"
ENV DEBIAN_FRONTEND noninteractive
ENV TERM xterm
RUN groupadd --gid ${DEFAULT_GID} ${PGROUP} && \
useradd -M --uid ${DEFAULT_UID} --gid ${DEFAULT_GID} --home /nonexistant ${PUSER}
USER $PUSER
But when I run it:
✔ vagrant@debian-10 docker ▶ docker run -Pit --rm --mount type=bind,source=/home/vagrant/tmp,target=/host --user user debian:buster-user /bin/bash
user@c6ddaa551aef:/$ ls -l / | grep host
drwxr-xr-x 2 root root 4096 Dec 10 18:30 host
user@c6ddaa551aef:/$ mount | grep host
/dev/mapper/debian--10--vg-root on /host type ext4 (rw,relatime,errors=remount-ro)
...
user@c6ddaa551aef:/$ echo 'hey' > ^C
user@c6ddaa551aef:/$ ls -l /host
total 0
-rw-r--r-- 1 root root 0 Dec 10 18:30 hereiam.txt
user@c6ddaa551aef:/$ echo 'hey' > /host/metoo.txt
bash: /host/metoo.txt: Permission denied
user@c6ddaa551aef:/$ rm /host/hereiam.txt
rm: remove write-protected regular empty file '/host/hereiam.txt'? y
rm: cannot remove '/host/hereiam.txt': Permission denied
user@c6ddaa551aef:/$ touch /host/nope.txt
touch: cannot touch '/host/nope.txt': Permission denied
I think that maybe it’s something to do with my /etc/subuid
and /etc/subgid
files. By default they contain this:
✔ vagrant@debian-10 docker ▶ cat /etc/subuid
vagrant:100000:65536
✔ vagrant@debian-10 docker ▶ cat /etc/subgid
vagrant:100000:65536
I tried a few things like adding vagrant:1000:1
or vagrant:1000:1000
to those files, but that didn’t seem to have an effect.
For completeness and reproducibility’s sake, here’s a Vagrantfile I’m using to recreate my environment and try these things out on. It creates a Debian 10-based VM with the latest 5.9 kernel from buster-backports and sets up docker via curl -fsSL https://get.docker.com/rootless | bash
.
unless Vagrant.has_plugin?("vagrant-reload")
raise 'vagrant-reload plugin is not installed!'
end
# hack: https://github.com/hashicorp/vagrant/issues/8878#issuecomment-345112810
class VagrantPlugins::ProviderVirtualBox::Action::Network
def dhcp_server_matches_config?(dhcp_server, config)
true
end
end
Vagrant.configure("2") do |config|
config.vm.box = "bento/debian-10"
config.vm.network "private_network", type: "dhcp"
if Vagrant.has_plugin?("vagrant-vbguest")
config.vbguest.auto_update = false
end
config.vm.provider "virtualbox" do |vb|
vb.memory = "4096"
vb.cpus = 2
end
config.vm.provision "shell", inline: <<-STEP1
export DEBIAN_FRONTEND=noninteractive
echo 'APT::Default-Release "stable";' >> /etc/apt/apt.conf.d/99default-release
sed -i "s/main/main contrib non-free/g" /etc/apt/sources.list
echo "deb http://httpredir.debian.org/debian/ buster-backports main contrib non-free" >> /etc/apt/sources.list
echo "deb-src http://httpredir.debian.org/debian/ buster-backports main contrib non-free" >> /etc/apt/sources.list
apt-get update
apt-get dist-upgrade -y
export KERNEL_VERSION=$(apt-cache search linux-image-5.9 | grep -Pv -- '(-(rt|cloud)-amd64|amd64-(dbg|unsigned))' | sort -r --sort=version | awk '{print $1}' | head -n 1 | sed 's/^linux-image-//' | sed 's/-amd64$//')
apt-get -t buster-backports install -y \
linux-image-$KERNEL_VERSION-amd64 linux-headers-$KERNEL_VERSION-amd64 linux-headers-$KERNEL_VERSION-common \
dkms build-essential linux-kbuild-5.9 linux-compiler-gcc-8-x86 \
firmware-linux firmware-linux-nonfree firmware-misc-nonfree libcap2-bin \
rsync git apt-transport-https ca-certificates curl gnupg2 tmux moreutils
echo "deb http://httpredir.debian.org/debian/ bullseye main contrib non-free" >> /etc/apt/sources.list
echo "deb-src http://httpredir.debian.org/debian/ bullseye main contrib non-free" >> /etc/apt/sources.list
apt-get update
apt-get -t bullseye install -y uidmap fuse-overlayfs
sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT=.*/GRUB_CMDLINE_LINUX_DEFAULT="random.trust_cpu=on elevator=deadline cgroup_enable=memory swapaccount=1 cgroup.memory=nokmem systemd.unified_cgroup_hierarchy=0"/' /etc/default/grub
ls /dev/disk/by-id/ata-* | grep -v '\\-part' | head -n 1 | xargs -r -l grub-install
echo "kernel.unprivileged_userns_clone=1" >> /etc/sysctl.conf
echo "options overlay permit_mounts_in_userns=1" >> /etc/modprobe.d/docker.conf
mkdir -p /etc/systemd/system/user@.service.d
echo -e "[Service]\\nDelegate=cpu cpuset io memory pids" >> /etc/systemd/system/user@.service.d/delegate.conf
loginctl enable-linger vagrant
touch /root/.hushlogin
echo "set nocompatible" > /root/.vimrc
STEP1
config.vm.provision :reload
config.vm.provision "shell", inline: <<-STEP2
export DEBIAN_FRONTEND=noninteractive
apt-get update
apt-get -y --purge remove *4.19* || true
apt-get -y autoremove
apt-get clean
STEP2
config.vm.provision :reload
config.vm.provision "shell", privileged: false, inline: <<-STEP3
mkdir -p /home/vagrant/.config/systemd/user /home/vagrant/.local/bin /home/vagrant/bin /home/vagrant/tmp
git clone --recursive --single-branch --depth 1 https://github.com/mmguero/config /home/vagrant/.config/mmguero.config
touch /home/vagrant/.hushlogin
echo "set nocompatible" > /home/vagrant/.vimrc
rm -f /home/vagrant/.bashrc
ln -s -f -r /home/vagrant/.config/mmguero.config/bash/rc /home/vagrant/.bashrc
ln -s -f -r /home/vagrant/.config/mmguero.config/bash/rc.d /home/vagrant/.bashrc.d
ln -s -f -r /home/vagrant/.config/mmguero.config/bash/aliases /home/vagrant/.bash_aliases
ln -s -f -r /home/vagrant/.config/mmguero.config/bash/functions /home/vagrant/.bash_functions
ln -s -f -r /home/vagrant/.config/mmguero.config/bash/context-color/context-color /home/vagrant/.local/bin/context-color
ln -s -f -r /home/vagrant/.config/mmguero.config/linux/tmux/tmux.conf /home/vagrant/.tmux.conf
ln -s -f -r /home/vagrant/.config/mmguero.config/git/gitconfig /home/vagrant/.gitconfig
curl -fsSL https://get.docker.com/rootless | bash
curl -fsSL "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /home/vagrant/bin/docker-compose
chmod 755 /home/vagrant/bin/docker-compose
echo -e "\\nexport DOCKER_HOST=unix:///run/user/1000/docker.sock" >> /home/vagrant/.bashrc.d/05_docker.bashrc
STEP3
# allow some elevated privileges (raw sockets, binding to ports <1024, promiscuous capture, mlock, etc.)
config.vm.provision "shell", inline: <<-STEP4
setcap 'CAP_IPC_LOCK+eip CAP_NET_ADMIN+eip CAP_NET_BIND_SERVICE+eip CAP_NET_RAW+eip' /home/vagrant/bin/rootlesskit
STEP4
config.vm.provision "shell", privileged: false, inline: <<-STEP5
sed -i "s@\\(dockerd-rootless\\.sh\\)@\\1 --storage-driver=fuse-overlayfs@" /home/vagrant/.config/systemd/user/docker.service
systemctl --user enable docker
systemctl --user daemon-reload
systemctl --user restart docker
STEP5
end
If you made it this far, thank you, and I appreciate any pointers people smarter than me can throw my way.