Failed to chroot load docker image with docker ce 18.09.0

Happy new year folks!

I am a new member to this forum. If I misplaced this question in the wrong category, please let me know if there is a better category.

I am Ying from SONiC community. We have been using a pretty ancient docker-engine (1.11.1) for long time :blush:. And recently we discovered an issue that we need to upgrade to later docker version for the fix. So I tried latest docker-ce version 18.09.0. Though the original issue was fixed. I encountered 2 other issues that I would like to get help from docker community.

(Apologize for the broken links, as new user. I have a limited count for links in a post).

Symptom: unable to chroot docker load container images
Docker engine version not having this issue: 1.11.1 (up to 1.12.2)
Docker engine exhibiting the issue: 18.09.0

Background SONiC build process:

  • SONiC encapsulate individual features in docker containers. We have about 10 containers running on the target platform.
  • To build a SONiC image, we first create required Debian packages, then create individual feature docker containers. In the end, we create an ONIE or ABOOT image and load with these feature docker containers.

Steps to reproduce:

  1. Clone sonic-buildimage repo: https_://github.com/Azure/sonic-buildimage
  2. Follow build instructions: “make init; make configure PLATFORM=<platform, e.g. broadcom>; make target/sonic-broadcom.bin”
  3. Above build will complete in about 5 hours depending on the power of the build machine. Apologize for the long wait.
  4. The current default build is done with docker engine 1.11.1. There is no build issue. Particularly, the feature docker containers were loaded with “sudo chroot docker load < ” in https_://github.com/Azure/sonic-buildimage/blob/master/files/build_templates/sonic_debian_extension.j2#L279.

After the lengthy build is done. now we can try upgrading the docker engine.

  1. edit docker engine version string: https://github.com/Azure/sonic-buildimage/blob/master/build_debian.sh#L32. It is easy to upgrade to any version lower than 17.5 without editing the downloading link: https_://github.com/Azure/sonic-buildimage/blob/master/build_debian.sh#L162. It is trivial to change the download link and try up to version 18.06 since it is still a single Debian package. 28.09.0 takes some more changes but not super difficult either.
  2. now “rm target/sonic-broadcom.bin; make target sonic-broadcom.bin” will fail.

I did some search and investigation, it appears that the current method we are using is good up to docker engine 1.12.2. Starting 1.12.3, up to 18.09.0. Docker load fails seemingly due to docker service is not really running in the chroot target folder.

Currently, I found a work-around: continue loading docker container images with 1.11.1. After all docker images loaded, remove docker-engine 1.11.1. I had to remove docker-engine.prerm in order to remove docker engine, otherwise remove fails and upgrade fails. This is inline with the observation that the docker service is not really running in the chroot target folder.

I think the issue I encountered must not be a special case for SONiC, I just didn’t find solution by google search yet. Community folks, if you know how to chroot docker load images, please let me know. Any help is very much appreciated!

Error messages:

  • sudo chroot ./fsroot docker info
    Containers: 0
    Running: 0
    Paused: 0
    Stopped: 0
    Images: 0
    Server Version: 18.06.1-ce
    Storage Driver: overlay
    Backing Filesystem: extfs
    Supports d_type: true
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Plugins:
    Volume: local
    Network: bridge host macvlan null overlay
    Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
    Swarm: inactive
    Runtimes: runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
    runc version: 69663f0bd4b60df09991c08812a60108003fa340
    init version: fec3683
    Security Options:
    seccomp
    Profile: default
    Kernel Version: 4.4.0-134-generic
    Operating System: Debian GNU/Linux 9 (stretch) (containerized)
    OSType: linux
    Architecture: x86_64
    CPUs: 8
    Total Memory: 27.47GiB
    Name: a30accabd4ad
    ID: 3INJ:SDR5:TRHT:TD3I:BWLA:XLVN:5SEO:XM43:ALUI:SDYE:SU3L:UPQU
    Docker Root Dir: /var/lib/docker
    Debug Mode (client): false
    Debug Mode (server): false
    Registry: https_://index.docker.io/v1/
    Labels:
    Experimental: false
    Insecure Registries:
    127.0.0.0/8
    Live Restore Enabled: false

WARNING: No swap limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
WARNING: No cpu shares support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

  • sudo LANG=C chroot ./fsroot docker load
    Error processing tar file(exit status 1): invalid argument

If build with "NOSTRETCH=1 make target/sonic-broadcom.bin KEEP_SLAVE_ON=yes, after build failed, we stays in the sonic build slave docker. Here we can try load image manually:

localadmin@ee48089166eb:/sonic$ sudo LANG=C chroot ./fsroot docker load <target/docker-fpm-quagga.gz
Untar error on re-exec cmd: fork/exec /proc/self/exe: no such file or directory
localadmin@ee48089166eb:/sonic$ sudo LANG=C chroot ./fsroot dockerd -H unix:// &
[1] 73168
localadmin@ee48089166eb:/sonic$ WARN[2019-01-02T17:55:44.755243700Z] Error while setting daemon root propagation, this is not generally critical but may cause some functionality to not work or fallback to less desirable behavior dir=/var/lib/docker error=“error getting daemon root’s parent mount: open /proc/self/mountinfo: no such file or directory”
INFO[2019-01-02T17:55:44.755724700Z] libcontainerd: docker-containerd is still running pid=69459
INFO[2019-01-02T17:55:44.755780000Z] parsed scheme: “unix” module=grpc
INFO[2019-01-02T17:55:44.755801000Z] scheme “unix” not registered, fallback to default scheme module=grpc
INFO[2019-01-02T17:55:44.755846400Z] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/docker-containerd.sock 0 }] module=grpc
INFO[2019-01-02T17:55:44.755869500Z] ClientConn switching balancer to “pick_first” module=grpc
INFO[2019-01-02T17:55:44.755926800Z] pickfirstBalancer: HandleSubConnStateChange: 0xc420081f40, CONNECTING module=grpc

localadmin@ee48089166eb:/sonic$ sudo LANG=C chroot ./fsroot docker load <target/docker-fpm-quagga.gz
WARN[2019-01-02T17:56:04.756426700Z] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/docker-containerd.sock 0 }. Err :connection error: desc = “transport: error while dialing: dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout”. Reconnecting… module=grpc
INFO[2019-01-02T17:56:04.756612300Z] pickfirstBalancer: HandleSubConnStateChange: 0xc420081f40, TRANSIENT_FAILURE module=grpc
INFO[2019-01-02T17:56:04.756693700Z] pickfirstBalancer: HandleSubConnStateChange: 0xc420081f40, CONNECTING module=grpc
WARN[2019-01-02T17:56:24.756883700Z] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/docker-containerd.sock 0 }. Err :connection error: desc = “transport: error while dialing: dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout”. Reconnecting… module=grpc
INFO[2019-01-02T17:56:24.756949400Z] pickfirstBalancer: HandleSubConnStateChange: 0xc420081f40, TRANSIENT_FAILURE module=grpc
INFO[2019-01-02T17:56:24.757136700Z] pickfirstBalancer: HandleSubConnStateChange: 0xc420081f40, CONNECTING module=grpc
WARN[2019-01-02T17:56:44.757222200Z] Failed to dial unix:///var/run/docker/containerd/docker-containerd.sock: grpc: the connection is closing; please retry. module=grpc
Failed to connect to containerd: failed to dial “/var/run/docker/containerd/docker-containerd.sock”: context deadline exceeded
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
[1]+ Exit 1 sudo LANG=C chroot ./fsroot dockerd -H unix://

The test was done with following diff (upgrade to docver-ce 18.06.0)

diff --git a/build_debian.sh b/build_debian.sh
index 0ab8280…6883649 100755
— a/build_debian.sh
+++ b/build_debian.sh
@@ -29,7 +29,7 @@
set -x -e

docker engine version (with platform)

-DOCKER_VERSION=1.11.1-0~stretch_amd64
+DOCKER_VERSION=18.06.1~ce~3-0~debian_amd64
LINUX_KERNEL_VERSION=4.9.0-8

Working directory to prepare the file system

@@ -159,7 +159,7 @@ echo ‘[INFO] Install docker’

Install apparmor utils since they’re missing and apparmor is enabled in the kernel

Otherwise Docker will fail to start

sudo LANG=C chroot $FILESYSTEM_ROOT apt-get -y install apparmor
-docker_deb_url=https://apt.dockerproject.org/repo/pool/main/d/docker-engine/docker-engine_${DOCKER_VERSION}.deb
+docker_deb_url=https://download.docker.com/linux/debian/dists/stretch/pool/stable/amd64/docker-ce_${DOCKER_VERSION}.deb
docker_deb_temp=mktemp
trap_push “rm -f $docker_deb_temp”
wget $docker_deb_url -qO $docker_deb_temp
diff --git a/files/docker/docker.service.conf b/files/docker/docker.service.conf
index b124d94…38895d5 100644
— a/files/docker/docker.service.conf
+++ b/files/docker/docker.service.conf
@@ -1,3 +1,3 @@
[Service]
ExecStart=
-ExecStart=/usr/bin/docker daemon -H fd:// --storage-driver=overlay --bip=240.127.1.1/24 --iptables=false
+ExecStart=/usr/bin/dockerd -H unix:// --storage-driver=overlay --bip=240.127.1.1/24 --iptables=false

https:// github.com/moby/moby/issues/34817 was the answer to the issue. Particularly, we hit mount point issue.

There are issue with posting the link, please remove the spaces.