Systemd fails to start in container with userns-remap set on oracle linux 7

We noticed cgroup doesn’t honor user namespace remap. This breaks systemd to
be running inside container.

With userns-remap set to default.

root@lnx-01 docker]# docker logs cdh_SR1111
systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR
+SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP
+BLKID +ELFUTILS +KMOD +IDN)
Detected virtualization kvm.
Detected architecture x86-64.

Welcome to Oracle Linux Server 7.6!

Set hostname to <cdh_SR1111.us.oracle.com>.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or
directory
Failed to create root cgroup hierarchy: Permission denied
Failed to allocate manager object: Permission denied
[!!!] Failed to allocate manager object, freezing.
[root@lnx-01 docker]#

[root@lnx-01 docker]# docker exec -i cdh_SR1111 ls -l
/sys/fs/cgroup
total 0
dr-xr-xr-x. 6 65534 65534 0 Jan 17 06:16 blkio
lrwxrwxrwx. 1 65534 65534 11 Jan 17 04:50 cpu -> cpu,cpuacct
dr-xr-xr-x. 6 65534 65534 0 Jan 17 06:16 cpu,cpuacct
lrwxrwxrwx. 1 65534 65534 11 Jan 17 04:50 cpuacct -> cpu,cpuacct
dr-xr-xr-x. 4 65534 65534 0 Jan 17 05:25 cpuset
dr-xr-xr-x. 6 65534 65534 0 Jan 17 06:16 devices
dr-xr-xr-x. 4 65534 65534 0 Jan 17 05:25 freezer
dr-xr-xr-x. 4 65534 65534 0 Jan 17 05:25 hugetlb
dr-xr-xr-x. 6 65534 65534 0 Jan 17 06:16 memory
lrwxrwxrwx. 1 65534 65534 16 Jan 17 04:50 net_cls -> net_cls,net_prio
dr-xr-xr-x. 4 65534 65534 0 Jan 17 05:25 net_cls,net_prio
lrwxrwxrwx. 1 65534 65534 16 Jan 17 04:50 net_prio -> net_cls,net_prio
dr-xr-xr-x. 4 65534 65534 0 Jan 17 05:25 perf_event
dr-xr-xr-x. 6 65534 65534 0 Jan 17 06:16 pids
dr-xr-xr-x. 2 65534 65534 0 Jan 17 04:50 rdma
dr-xr-xr-x. 6 65534 65534 0 Jan 17 06:16 systemd

with userns-remap disabled
===================
root@ci-phx-cdhdevinfra-ad1-lnx-01 docker]# docker logs cdh_SR1111
systemd 219 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR
+SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP
+BLKID +ELFUTILS +KMOD +IDN)
Detected virtualization kvm.
Detected architecture x86-64.

Welcome to Oracle Linux Server 7.6!

Set hostname to <cdh_SR1111.us.oracle.com>.
[ OK ] Reached target Paths.
[ OK ] Reached target Local File Systems.
[ OK ] Created slice Root Slice.
[ OK ] Listening on Delayed Shutdown Socket.
[ OK ] Listening on Journal Socket.
[ OK ] Reached target Swap.
[ OK ] Created slice System Slice.
Starting Create Volatile Files and Directories…
[ OK ] Reached target Slices.
[root@lnx-01 docker]#

Docker run command used.

docker run -itd --tmpfs /tmp --tmpfs /run --cap-add=SYS_ADMIN -v
/sys/fs/cgroup --cap-add=SYSLOG --restart=unless-stopped --cpus=2.0 -m 10.0GB
–cap-add=NET_RAW --cap-add=NET_ADMIN -p 21000:22 -p 31000:5901
–name=cdh_SR1111 -h cdh_SR1111.us.oracle.com --label CDH_INCIDENT=SR1111
–label CDH_CUSTOMER=bofa --label CDH_BUCKET=nsin08 -e CDH_INCIDENT=SR1111
–label CDH_LOGDIR=/saascdh/download/ -e CDH_UNIX_GROUP=CDHSR1111 --label
CDH_REGION=us-phoenix-1 test_docker:oel7-dev

This issue seems similar to one described in
https://bugzilla.redhat.com/show_bug.cgi?id=1406684 but i couldn’t find
suggested fix for OEL 7.7

Host OS is oel 7.7 oci oke vm silver image for saas, docker image os is OEL
7.6

[root@ln-01 log]# uname -r
4.14.35-1902.9.2.el7uek.x86_64

% docker -v

Docker version 19.03.1-ol, build ead9442

So far i have tried mounting separate tmpfs as /sys/fs/cgroup, fix as suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1566680, configuring oci-systemd-hook but none seems to work. I am out of options here, what else i can try to make it work?

Belated answer but hope it helps.

Try Docker with the new Sysbox runtime. It creates containers that always use the Linux user-namespace (for strong isolation) yet support running systemd, docker, and even Kubernetes inside. For example, the following command gives you a container with the Linux user-namespace and systemd in it:

$ docker run --runtime=sysbox-runc -it --rm nestybox/ubuntu-bionic-systemd  
Welcome to Ubuntu 18.04.5 LTS!

[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Reached target Swap.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Reached target Paths.
...
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.

Ubuntu 18.04.5 LTS c65d2da3189b console

c65d2da3189b login: admin
Password: 
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.3.0-64-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

admin@c65d2da3189b:~$ cat /proc/self/uid_map
         0     165536      65536

Hope that helps!