Happy new year folks!
I am a new member to this forum. If I misplaced this question in the wrong category, please let me know if there is a better category.
I am Ying from SONiC community. We have been using a pretty ancient docker-engine (1.11.1) for long time . And recently we discovered an issue that we need to upgrade to later docker version for the fix. So I tried latest docker-ce version 18.09.0. Though the original issue was fixed. I encountered 2 other issues that I would like to get help from docker community.
(Apologize for the broken links, as new user. I have a limited count for links in a post).
Symptom: unable to chroot docker load container images
Docker engine version not having this issue: 1.11.1 (up to 1.12.2)
Docker engine exhibiting the issue: 18.09.0
Background SONiC build process:
- SONiC encapsulate individual features in docker containers. We have about 10 containers running on the target platform.
- To build a SONiC image, we first create required Debian packages, then create individual feature docker containers. In the end, we create an ONIE or ABOOT image and load with these feature docker containers.
Steps to reproduce:
- Clone sonic-buildimage repo: https_://github.com/Azure/sonic-buildimage
- Follow build instructions: “make init; make configure PLATFORM=<platform, e.g. broadcom>; make target/sonic-broadcom.bin”
- Above build will complete in about 5 hours depending on the power of the build machine. Apologize for the long wait.
- The current default build is done with docker engine 1.11.1. There is no build issue. Particularly, the feature docker containers were loaded with “sudo chroot docker load < ” in https_://github.com/Azure/sonic-buildimage/blob/master/files/build_templates/sonic_debian_extension.j2#L279.
After the lengthy build is done. now we can try upgrading the docker engine.
- edit docker engine version string: https://github.com/Azure/sonic-buildimage/blob/master/build_debian.sh#L32. It is easy to upgrade to any version lower than 17.5 without editing the downloading link: https_://github.com/Azure/sonic-buildimage/blob/master/build_debian.sh#L162. It is trivial to change the download link and try up to version 18.06 since it is still a single Debian package. 28.09.0 takes some more changes but not super difficult either.
- now “rm target/sonic-broadcom.bin; make target sonic-broadcom.bin” will fail.
I did some search and investigation, it appears that the current method we are using is good up to docker engine 1.12.2. Starting 1.12.3, up to 18.09.0. Docker load fails seemingly due to docker service is not really running in the chroot target folder.
Currently, I found a work-around: continue loading docker container images with 1.11.1. After all docker images loaded, remove docker-engine 1.11.1. I had to remove docker-engine.prerm in order to remove docker engine, otherwise remove fails and upgrade fails. This is inline with the observation that the docker service is not really running in the chroot target folder.
I think the issue I encountered must not be a special case for SONiC, I just didn’t find solution by google search yet. Community folks, if you know how to chroot docker load images, please let me know. Any help is very much appreciated!
Error messages:
- sudo chroot ./fsroot docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 18.06.1-ce
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.4.0-134-generic
Operating System: Debian GNU/Linux 9 (stretch) (containerized)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 27.47GiB
Name: a30accabd4ad
ID: 3INJ:SDR5:TRHT:TD3I:BWLA:XLVN:5SEO:XM43:ALUI:SDYE:SU3L:UPQU
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https_://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: falseWARNING: No swap limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
WARNING: No cpu shares support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
- sudo LANG=C chroot ./fsroot docker load
Error processing tar file(exit status 1): invalid argument
If build with "NOSTRETCH=1 make target/sonic-broadcom.bin KEEP_SLAVE_ON=yes, after build failed, we stays in the sonic build slave docker. Here we can try load image manually:
localadmin@ee48089166eb:/sonic$ sudo LANG=C chroot ./fsroot docker load <target/docker-fpm-quagga.gz
Untar error on re-exec cmd: fork/exec /proc/self/exe: no such file or directory
localadmin@ee48089166eb:/sonic$ sudo LANG=C chroot ./fsroot dockerd -H unix:// &
[1] 73168
localadmin@ee48089166eb:/sonic$ WARN[2019-01-02T17:55:44.755243700Z] Error while setting daemon root propagation, this is not generally critical but may cause some functionality to not work or fallback to less desirable behavior dir=/var/lib/docker error=“error getting daemon root’s parent mount: open /proc/self/mountinfo: no such file or directory”
INFO[2019-01-02T17:55:44.755724700Z] libcontainerd: docker-containerd is still running pid=69459
INFO[2019-01-02T17:55:44.755780000Z] parsed scheme: “unix” module=grpc
INFO[2019-01-02T17:55:44.755801000Z] scheme “unix” not registered, fallback to default scheme module=grpc
INFO[2019-01-02T17:55:44.755846400Z] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/docker-containerd.sock 0 }] module=grpc
INFO[2019-01-02T17:55:44.755869500Z] ClientConn switching balancer to “pick_first” module=grpc
INFO[2019-01-02T17:55:44.755926800Z] pickfirstBalancer: HandleSubConnStateChange: 0xc420081f40, CONNECTING module=grpclocaladmin@ee48089166eb:/sonic$ sudo LANG=C chroot ./fsroot docker load <target/docker-fpm-quagga.gz
WARN[2019-01-02T17:56:04.756426700Z] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/docker-containerd.sock 0 }. Err :connection error: desc = “transport: error while dialing: dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout”. Reconnecting… module=grpc
INFO[2019-01-02T17:56:04.756612300Z] pickfirstBalancer: HandleSubConnStateChange: 0xc420081f40, TRANSIENT_FAILURE module=grpc
INFO[2019-01-02T17:56:04.756693700Z] pickfirstBalancer: HandleSubConnStateChange: 0xc420081f40, CONNECTING module=grpc
WARN[2019-01-02T17:56:24.756883700Z] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/docker-containerd.sock 0 }. Err :connection error: desc = “transport: error while dialing: dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout”. Reconnecting… module=grpc
INFO[2019-01-02T17:56:24.756949400Z] pickfirstBalancer: HandleSubConnStateChange: 0xc420081f40, TRANSIENT_FAILURE module=grpc
INFO[2019-01-02T17:56:24.757136700Z] pickfirstBalancer: HandleSubConnStateChange: 0xc420081f40, CONNECTING module=grpc
WARN[2019-01-02T17:56:44.757222200Z] Failed to dial unix:///var/run/docker/containerd/docker-containerd.sock: grpc: the connection is closing; please retry. module=grpc
Failed to connect to containerd: failed to dial “/var/run/docker/containerd/docker-containerd.sock”: context deadline exceeded
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
[1]+ Exit 1 sudo LANG=C chroot ./fsroot dockerd -H unix://
The test was done with following diff (upgrade to docver-ce 18.06.0)
diff --git a/build_debian.sh b/build_debian.sh
index 0ab8280…6883649 100755
— a/build_debian.sh
+++ b/build_debian.sh
@@ -29,7 +29,7 @@
set -x -edocker engine version (with platform)
-DOCKER_VERSION=1.11.1-0~stretch_amd64
+DOCKER_VERSION=18.06.1~ce~3-0~debian_amd64
LINUX_KERNEL_VERSION=4.9.0-8Working directory to prepare the file system
@@ -159,7 +159,7 @@ echo ‘[INFO] Install docker’
Install apparmor utils since they’re missing and apparmor is enabled in the kernel
Otherwise Docker will fail to start
sudo LANG=C chroot $FILESYSTEM_ROOT apt-get -y install apparmor
-docker_deb_url=https://apt.dockerproject.org/repo/pool/main/d/docker-engine/docker-engine_${DOCKER_VERSION}.deb
+docker_deb_url=https://download.docker.com/linux/debian/dists/stretch/pool/stable/amd64/docker-ce_${DOCKER_VERSION}.deb
docker_deb_temp=mktemp
trap_push “rm -f $docker_deb_temp”
wget $docker_deb_url -qO $docker_deb_temp
diff --git a/files/docker/docker.service.conf b/files/docker/docker.service.conf
index b124d94…38895d5 100644
— a/files/docker/docker.service.conf
+++ b/files/docker/docker.service.conf
@@ -1,3 +1,3 @@
[Service]
ExecStart=
-ExecStart=/usr/bin/docker daemon -H fd:// --storage-driver=overlay --bip=240.127.1.1/24 --iptables=false
+ExecStart=/usr/bin/dockerd -H unix:// --storage-driver=overlay --bip=240.127.1.1/24 --iptables=false