A couple of issues I ran into tinkering with docker 20.10.0 rootless mode

TL;DR

In Debian 10, with rootless docker 20.10.0 I’m having the following issues:

  • With cgroups 2:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:326: applying cgroup configuration for process caused: could not execute systemctl --user --no-pager show-environment, output="Failed to get environment: Access denied\n": exit status 1: unknown.

  • With cgroups 1 and fuse-overlayfs I am able to build images and run containers okay, but am not understanding permissions for bind-mounts for non-root users inside the container.

(Also, I also posted this to /r/docker on reddit, I hope that’s not against the guidelines)


This is going to be a lot of text, but if anybody here can help me pick at the edges of this I’d appreciate any insight. There are a few different issues I’m trying to tackle from different angles, but this is all stemming from my attempts in the last day or so to play with rootless mode in Docker 20.10.0. Everything I’m doing here is being attempted in Debian 10. I’ll put a Vagrantfile at the bottom of the post for reproducibility’s sake.


First Issue: cgroups v2

My first attempts were using cgroups v2, enabled by adding systemd.unified_cgroup_hierarchy=1 to the kernel parameters. From what I can see, I enabled cgroups 2 correctly on the system and have rootless docker running:

✔ vagrant@debian-10 ~ ▶ mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

✔ vagrant@debian-10 ~ ▶ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.9.0-0.bpo.2-amd64 root=/dev/mapper/debian--10--vg-root ro net.ifnames=0 biosdevname=0 debian-installer=en_US.UTF-8 random.trust_cpu=on elevator=deadline systemd.unified_cgroup_hierarchy=1

✔ vagrant@debian-10 ~ ▶ docker info
Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 3
 Server Version: 20.10.0
 Storage Driver: fuse-overlayfs
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  rootless
  cgroupns
 Kernel Version: 5.9.0-0.bpo.2-amd64
 Operating System: Debian GNU/Linux 10 (buster)
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 15.64GiB
 Name: debian-10
 ID: GZUR:W75Y:WQID:BRYZ:TZOZ:NQMP:UXRA:WI5P:VMZB:L7SB:WH6Q:Z22P
 Docker Root Dir: /home/vagrant/.local/share/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: No kernel memory TCP limit support
WARNING: No oom kill disable support
WARNING: No cpuset support
WARNING: Support for cgroup v2 is experimental
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

✔ vagrant@debian-10 ~ ▶ systemctl --user status docker.service
● docker.service - Docker Application Container Engine (Rootless)
   Loaded: loaded (/home/vagrant/.config/systemd/user/docker.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2020-12-10 13:51:52 UTC; 5min ago
     Docs: https://docs.docker.com/engine/security/rootless/
 Main PID: 574 (rootlesskit)
    Tasks: 62
   Memory: 354.5M
      CPU: 11.694s
   CGroup: /user.slice/user-1000.slice/user@1000.service/docker.service
           ├─574 rootlesskit --net=vpnkit --mtu=1500 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /home/vagrant/bin/dockerd-rootless.sh --storage-d
           ├─583 /proc/self/exe --net=vpnkit --mtu=1500 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /home/vagrant/bin/dockerd-rootless.sh --storag
           ├─594 vpnkit --ethernet /tmp/rootlesskit505778783/vpnkit-ethernet.sock --mtu 1500 --host-ip 0.0.0.0
           ├─615 dockerd --storage-driver=fuse-overlayfs
           └─673 containerd --config /run/user/1000/docker/containerd/containerd.toml --log-level info

✔ vagrant@debian-10 ~ ▶ cat /etc/systemd/system/user@.service.d/delegate.conf
[Service]
Delegate=cpu cpuset io memory pids

However, I hit a wall pretty early with cgroups v2:

✔ vagrant@debian-10 ~ ▶ docker run --rm hello-world:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:326: applying cgroup configuration for process caused: could not execute `systemctl --user --no-pager show-environment`, output="Failed to get environment: Access denied\n": exit status 1: unknown.

I played around with it a little bit more, but Google didn’t really help me out much so eventually gave up and switched back to systemd.unified_cgroup_hierarchy=0.


Second Issue: non-root write access to bind mounts in containers

I followed the instructions in the rootless docker docs for Debian/Ubuntu systems:

✔ vagrant@debian-10 ~ ▶ grep unprivileged /etc/sysctl.conf
kernel.unprivileged_userns_clone=1

✔ vagrant@debian-10 ~ ▶ cat /etc/modprobe.d/docker.conf
options overlay permit_mounts_in_userns=1

I’m trying to use fuse-overlayfs as the filesystem driver. At first I was having some problems with building images, but then I found this thread where /u/GertVanAntwerpen figured out that the version of fuse-overlayfs available in buster is too old. I installed the version from bullseye instead, and at this point was able to build images and run basic things:

✔ vagrant@debian-10 ~ ▶ docker info
Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 20.10.0
 Storage Driver: fuse-overlayfs
 Logging Driver: json-file
 Cgroup Driver: none
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  rootless
 Kernel Version: 5.9.0-0.bpo.2-amd64
 Operating System: Debian GNU/Linux 10 (buster)
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 15.64GiB
 Name: debian-10
 ID: AEGL:6UDY:FR3E:CVKX:QOKU:6E54:CAHS:SN3J:ER32:LEWP:BOBB:LAMR
 Docker Root Dir: /home/vagrant/.local/share/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: Running in rootless-mode without cgroups. To enable cgroups in rootless-mode, you need to boot the system in cgroup v2 mode.
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
✔ vagrant@debian-10 ~ ▶ docker run --rm hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

I was testing out bind mounts, and things work okay if I’m root inside the container:

✘ vagrant@debian-10 ~ ▶ docker run -Pit --rm --mount type=bind,source=/home/vagrant/tmp,target=/host --user 0 debian:buster-slim /bin/bash
root@b72944c8421d:/# ls -l / | grep host 
drwxr-xr-x   2 root   root    4096 Dec 10 18:28 host
root@b72944c8421d:/# mount|grep host
/dev/mapper/debian--10--vg-root on /host type ext4 (rw,relatime,errors=remount-ro)
...
root@b72944c8421d:/# echo 'hey' > /host/hey.txt
root@b72944c8421d:/# cat /host/hey.txt
hey
root@b72944c8421d:/# rm /host/hey.txt
root@b72944c8421d:/# touch /host/hereiam.txt
root@b72944c8421d:/# exit
✔ vagrant@debian-10 ~ ▶ ls -l tmp/
total 0
-rw-r--r-- 1 vagrant vagrant 0 Dec 10 18:30 hereiam.txt

So as you can see I have read/write access to the bind mount directory as root inside the container, and the file that got created outside the container is owned by my user.

What I’m a little bit confused about is how to make this work as a non-root user inside the container. I created this Dockerfile and built an image:

FROM debian:buster-slim

ARG DEFAULT_UID=1000
ARG DEFAULT_GID=1000
ENV DEFAULT_UID $DEFAULT_UID
ENV DEFAULT_GID $DEFAULT_GID
ENV PUSER "user"
ENV PGROUP "user"

ENV DEBIAN_FRONTEND noninteractive
ENV TERM xterm

RUN groupadd --gid ${DEFAULT_GID} ${PGROUP} && \
      useradd -M --uid ${DEFAULT_UID} --gid ${DEFAULT_GID} --home /nonexistant ${PUSER}

USER $PUSER

But when I run it:

✔ vagrant@debian-10 docker ▶ docker run -Pit --rm --mount type=bind,source=/home/vagrant/tmp,target=/host --user user debian:buster-user /bin/bash
user@c6ddaa551aef:/$ ls -l / | grep host
drwxr-xr-x   2 root   root    4096 Dec 10 18:30 host
user@c6ddaa551aef:/$ mount | grep host
/dev/mapper/debian--10--vg-root on /host type ext4 (rw,relatime,errors=remount-ro)
...
user@c6ddaa551aef:/$ echo 'hey' > ^C
user@c6ddaa551aef:/$ ls -l /host
total 0
-rw-r--r-- 1 root root 0 Dec 10 18:30 hereiam.txt
user@c6ddaa551aef:/$ echo 'hey' > /host/metoo.txt
bash: /host/metoo.txt: Permission denied
user@c6ddaa551aef:/$ rm /host/hereiam.txt 
rm: remove write-protected regular empty file '/host/hereiam.txt'? y
rm: cannot remove '/host/hereiam.txt': Permission denied
user@c6ddaa551aef:/$ touch /host/nope.txt
touch: cannot touch '/host/nope.txt': Permission denied

I think that maybe it’s something to do with my /etc/subuid and /etc/subgid files. By default they contain this:

✔ vagrant@debian-10 docker ▶ cat /etc/subuid
vagrant:100000:65536
✔ vagrant@debian-10 docker ▶ cat /etc/subgid
vagrant:100000:65536

I tried a few things like adding vagrant:1000:1 or vagrant:1000:1000 to those files, but that didn’t seem to have an effect.


For completeness and reproducibility’s sake, here’s a Vagrantfile I’m using to recreate my environment and try these things out on. It creates a Debian 10-based VM with the latest 5.9 kernel from buster-backports and sets up docker via curl -fsSL https://get.docker.com/rootless | bash.

unless Vagrant.has_plugin?("vagrant-reload")
  raise 'vagrant-reload plugin is not installed!'
end

# hack: https://github.com/hashicorp/vagrant/issues/8878#issuecomment-345112810
class VagrantPlugins::ProviderVirtualBox::Action::Network
  def dhcp_server_matches_config?(dhcp_server, config)
    true
  end
end

Vagrant.configure("2") do |config|

  config.vm.box = "bento/debian-10"

  config.vm.network "private_network", type: "dhcp"

  if Vagrant.has_plugin?("vagrant-vbguest")
    config.vbguest.auto_update = false
  end

  config.vm.provider "virtualbox" do |vb|
    vb.memory = "4096"
    vb.cpus = 2
  end

  config.vm.provision "shell", inline: <<-STEP1
    export DEBIAN_FRONTEND=noninteractive

    echo 'APT::Default-Release "stable";' >> /etc/apt/apt.conf.d/99default-release
    sed -i "s/main/main contrib non-free/g" /etc/apt/sources.list
    echo "deb http://httpredir.debian.org/debian/ buster-backports main contrib non-free" >> /etc/apt/sources.list
    echo "deb-src http://httpredir.debian.org/debian/ buster-backports main contrib non-free" >> /etc/apt/sources.list
    apt-get update
    apt-get dist-upgrade -y

    export KERNEL_VERSION=$(apt-cache search linux-image-5.9 | grep -Pv -- '(-(rt|cloud)-amd64|amd64-(dbg|unsigned))' | sort -r --sort=version | awk '{print $1}' | head -n 1 | sed 's/^linux-image-//' | sed 's/-amd64$//')
    apt-get -t buster-backports install -y \
      linux-image-$KERNEL_VERSION-amd64 linux-headers-$KERNEL_VERSION-amd64 linux-headers-$KERNEL_VERSION-common \
      dkms build-essential linux-kbuild-5.9 linux-compiler-gcc-8-x86 \
      firmware-linux firmware-linux-nonfree firmware-misc-nonfree libcap2-bin \
      rsync git apt-transport-https ca-certificates curl gnupg2 tmux moreutils

    echo "deb http://httpredir.debian.org/debian/ bullseye main contrib non-free" >> /etc/apt/sources.list
    echo "deb-src http://httpredir.debian.org/debian/ bullseye main contrib non-free" >> /etc/apt/sources.list
    apt-get update
    apt-get -t bullseye install -y uidmap fuse-overlayfs

    sed -i 's/^GRUB_CMDLINE_LINUX_DEFAULT=.*/GRUB_CMDLINE_LINUX_DEFAULT="random.trust_cpu=on elevator=deadline cgroup_enable=memory swapaccount=1 cgroup.memory=nokmem systemd.unified_cgroup_hierarchy=0"/' /etc/default/grub
    ls /dev/disk/by-id/ata-* | grep -v '\\-part' | head -n 1 | xargs -r -l grub-install

    echo "kernel.unprivileged_userns_clone=1" >> /etc/sysctl.conf
    echo "options overlay permit_mounts_in_userns=1" >> /etc/modprobe.d/docker.conf

    mkdir -p /etc/systemd/system/user@.service.d
    echo -e "[Service]\\nDelegate=cpu cpuset io memory pids" >> /etc/systemd/system/user@.service.d/delegate.conf

    loginctl enable-linger vagrant

    touch /root/.hushlogin
    echo "set nocompatible" > /root/.vimrc
  STEP1
  config.vm.provision :reload

  config.vm.provision "shell", inline: <<-STEP2
    export DEBIAN_FRONTEND=noninteractive
    apt-get update
    apt-get -y --purge remove *4.19* || true
    apt-get -y autoremove
    apt-get clean
  STEP2
  config.vm.provision :reload

  config.vm.provision "shell", privileged: false, inline: <<-STEP3
    mkdir -p /home/vagrant/.config/systemd/user /home/vagrant/.local/bin /home/vagrant/bin /home/vagrant/tmp

    git clone --recursive --single-branch --depth 1 https://github.com/mmguero/config /home/vagrant/.config/mmguero.config
    touch /home/vagrant/.hushlogin
    echo "set nocompatible" > /home/vagrant/.vimrc
    rm -f /home/vagrant/.bashrc
    ln -s -f -r /home/vagrant/.config/mmguero.config/bash/rc /home/vagrant/.bashrc
    ln -s -f -r /home/vagrant/.config/mmguero.config/bash/rc.d /home/vagrant/.bashrc.d
    ln -s -f -r /home/vagrant/.config/mmguero.config/bash/aliases /home/vagrant/.bash_aliases
    ln -s -f -r /home/vagrant/.config/mmguero.config/bash/functions /home/vagrant/.bash_functions
    ln -s -f -r /home/vagrant/.config/mmguero.config/bash/context-color/context-color /home/vagrant/.local/bin/context-color
    ln -s -f -r /home/vagrant/.config/mmguero.config/linux/tmux/tmux.conf /home/vagrant/.tmux.conf
    ln -s -f -r /home/vagrant/.config/mmguero.config/git/gitconfig /home/vagrant/.gitconfig
    curl -fsSL https://get.docker.com/rootless | bash
    curl -fsSL "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /home/vagrant/bin/docker-compose
    chmod 755 /home/vagrant/bin/docker-compose
    echo -e "\\nexport DOCKER_HOST=unix:///run/user/1000/docker.sock" >> /home/vagrant/.bashrc.d/05_docker.bashrc

  STEP3

  # allow some elevated privileges (raw sockets, binding to ports <1024, promiscuous capture, mlock, etc.)
  config.vm.provision "shell", inline: <<-STEP4
    setcap 'CAP_IPC_LOCK+eip CAP_NET_ADMIN+eip CAP_NET_BIND_SERVICE+eip CAP_NET_RAW+eip' /home/vagrant/bin/rootlesskit
  STEP4

  config.vm.provision "shell", privileged: false, inline: <<-STEP5
    sed -i "s@\\(dockerd-rootless\\.sh\\)@\\1 --storage-driver=fuse-overlayfs@" /home/vagrant/.config/systemd/user/docker.service
    systemctl --user enable docker
    systemctl --user daemon-reload
    systemctl --user restart docker
  STEP5

end

If you made it this far, thank you, and I appreciate any pointers people smarter than me can throw my way.

1 Like

I’m curious too.

Most of the containers I use, use S6 overlay (2.1.0.2) and the user directive don’t work when set. Even using PUID/PGID doesn’t work when running docker 20.10 rootless.

If you found the solution, please share. I’m using Ubuntu 20.04, docker 20.10 and just the default overlay2 driver.

I think I’ll have to make some small override files for the apps I use and add a VOLUME /data directive (I use /config and /data, but the images I use only define /config usually).