Docker v1.12 unable to live-restore - SOLVED


I’m running Docker version 1.12.2, build bb80604 on Centos 7.2.1511 (so with systemd) with --live-restore and either I’m holding it wrong or it doesn’t work as expected. I started a container with docker run and in the process tree I see:

root     40320  1.8  0.0 1333304 40088 ?       Ssl  15:50   0:00 dockerd -H tcp:// -H unix:///var/run/docker.sock --storage-driver=devicemapper --storage-opt dm.thinpooldev=/dev/mapper/sysvm-docker --bridge=zdocker0 --live-restore
root     40335  0.0  0.0 366776  9520 ?        Ssl  15:50   0:00  \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime docker-runc
root     41465  0.0  0.0 333308  4432 ?        Sl   15:50   0:00      \_ docker-containerd-shim 51819240830c3c7bc350746d9a6b8564b2f76ed719358771688bde5bf745bb51 /var/run/docker/libcontainerd/51819240830c3c7bc350746d9a6b8564b2f76ed719358771688bde5bf745bb51 docker-runc
root     41480  0.1  0.0   4156   360 pts/6    Ss+  15:50   0:00          \_ tail -f /dev/null

If I service docker stop, everything is stopped. If i kill -9 the dockerd process, docker-containerd-shim and the process from the container are still there:

root     41465  0.0  0.0 333308  4432 ?        Sl   15:50   0:00 docker-containerd-shim 51819240830c3c7bc350746d9a6b8564b2f76ed719358771688bde5bf745bb51 /var/run/docker/libcontainerd/51819240830c3c7bc350746d9a6b8564b2f76ed719358771688bde5bf745bb51 docker-runc
root     41480  0.0  0.0   4156   360 pts/6    Ss+  15:50   0:00  \_ tail -f /dev/null

but when I start again docker they get cleaned up. And of course, docker ps doesn’t show any running containers anymore.
There is no daemon.json, all the parameters are taken from the systemd’s docker.service (and visible on the command line).

This is the container I started:

CONTAINER ID        IMAGE                                                     COMMAND               CREATED             STATUS              PORTS               NAMES
f249cdd2108e   "tail -f /dev/null"   34 seconds ago      Up 32 seconds       80/tcp              lonely_hodgkin

And here are the logs from journalctl -u docker -f.
This is when I killed -9 docker daemon:

Oct 19 17:12:35 systemd[1]: docker.service: main process exited, code=killed, status=9/KILL

This is when I started docker service again:

Oct 19 17:14:05 systemd[1]: docker.service stop-sigterm timed out. Killing.                                                                           
Oct 19 17:14:05 systemd[1]: Unit docker.service entered failed state.                                                                                 
Oct 19 17:14:05 systemd[1]: docker.service failed.                                                                                                    
Oct 19 17:14:05 systemd[1]: Started docker.service.                                                                                                   
Oct 19 17:14:05 systemd[1]: Starting docker.service...
Oct 19 17:14:05 docker[8766]: time="2016-10-19T17:14:05.933167311+02:00" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting -tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
Oct 19 17:14:05 docker[8766]: time="2016-10-19T17:14:05.934378664+02:00" level=info msg="libcontainerd: new containerd process, pid: 8779"
Oct 19 17:14:05 docker[8766]: time="2016-10-19T17:14:05.94755205+02:00" level=warning msg="containerd: low RLIMIT_NOFILE changing to max" current=1024 max=4096
Oct 19 17:14:05 docker[8766]: time="2016-10-19T17:14:05.948744112+02:00" level=warning msg="containerd: f249cdd2108e574c577ee99d2d3b007dbab55af87e07e3497f4920971922e36f:init shim died, killing associated process"
Oct 19 17:14:05 docker[8766]: time="2016-10-19T17:14:05.954167315+02:00" level=error msg="containerd: notify OOM events" error="open /proc/5674/cgroup: no such file or directory"
Oct 19 17:14:05 docker[8766]: time="2016-10-19T17:14:05.954551343+02:00" level=warning msg="containerd: f249cdd2108e574c577ee99d2d3b007dbab55af87e07e3497f4920971922e36f:init (pid 5674) does not exist"
Oct 19 17:14:06 docker[8766]: time="2016-10-19T17:14:06.937936422+02:00" level=warning msg="libcontainerd: unknown container f249cdd2108e574c577ee99d2d3b007dbab55af87e07e3497f4920971922e36f"
Oct 19 17:14:06 docker[8766]: time="2016-10-19T17:14:06.990397385+02:00" level=warning msg="devmapper: Base device already exists and has filesystem xfs on it. User specified filesystem  will be ignored."
Oct 19 17:14:07 docker[8766]: time="2016-10-19T17:14:07.021469299+02:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Oct 19 17:14:07 docker[8766]: time="2016-10-19T17:14:07.021877131+02:00" level=warning msg="mountpoint for pids not found"
Oct 19 17:14:07 docker[8766]: time="2016-10-19T17:14:07.022386080+02:00" level=info msg="Loading containers: start."
Oct 19 17:14:27 docker[8766]: ........time="2016-10-19T17:14:27.166070265+02:00" level=error msg="devmapper: Error unmounting device 896b0f24e4cdeb88060993d18a969b41b917d0ee0150439697bf705e0c4bcd9a: Device is Busy"                                                                                                                             
Oct 19 17:14:27 docker[8766]: time="2016-10-19T17:14:27.166113620+02:00" level=error msg="Error unmounting container f249cdd2108e574c577ee99d2d3b007dbab55af87e07e3497f4920971922e36f: Device is Busy"
Oct 19 17:14:27 docker[8766]: time="2016-10-19T17:14:27.175839408+02:00" level=info msg="Firewalld running: false"
Oct 19 17:14:27 docker[8766]: time="2016-10-19T17:14:27.428016613+02:00" level=info msg="Removing stale sandbox 0d4925931306a5107094cc480f8c6606a07d4e5a3d7a50f61512ac2515c35572 (f249cdd2108e574c577ee99d2d3b007dbab55af87e07e3497f4920971922e36f)"
Oct 19 17:14:27 docker[8766]: time="2016-10-19T17:14:27.509705891+02:00" level=info msg="Loading containers: done."
Oct 19 17:14:27 docker[8766]: time="2016-10-19T17:14:27.509785435+02:00" level=info msg="Daemon has completed initialization"
Oct 19 17:14:27 docker[8766]: time="2016-10-19T17:14:27.509814778+02:00" level=info msg="Docker daemon" commit=bb80604 graphdriver=devicemapper version=1.12.2
Oct 19 17:14:27 docker[8766]: time="2016-10-19T17:14:27.516559646+02:00" level=info msg="API listen on /var/run/docker.sock"
Oct 19 17:14:27 docker[8766]: time="2016-10-19T17:14:27.516659621+02:00" level=info msg="API listen on [::]:2375"

Any help debugging this further is highly appreciated. :slight_smile: Thanks

Case closed: use the provided systemd files