Docker v1.12 unable to live-restore - SOLVED

Hi,

I’m running Docker version 1.12.2, build bb80604 on Centos 7.2.1511 (so with systemd) with --live-restore and either I’m holding it wrong or it doesn’t work as expected. I started a container with docker run and in the process tree I see:

root     40320  1.8  0.0 1333304 40088 ?       Ssl  15:50   0:00 dockerd -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock --storage-driver=devicemapper --storage-opt dm.thinpooldev=/dev/mapper/sysvm-docker --bridge=zdocker0 --live-restore
root     40335  0.0  0.0 366776  9520 ?        Ssl  15:50   0:00  \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime docker-runc
root     41465  0.0  0.0 333308  4432 ?        Sl   15:50   0:00      \_ docker-containerd-shim 51819240830c3c7bc350746d9a6b8564b2f76ed719358771688bde5bf745bb51 /var/run/docker/libcontainerd/51819240830c3c7bc350746d9a6b8564b2f76ed719358771688bde5bf745bb51 docker-runc
root     41480  0.1  0.0   4156   360 pts/6    Ss+  15:50   0:00          \_ tail -f /dev/null

If I service docker stop, everything is stopped. If i kill -9 the dockerd process, docker-containerd-shim and the process from the container are still there:

root     41465  0.0  0.0 333308  4432 ?        Sl   15:50   0:00 docker-containerd-shim 51819240830c3c7bc350746d9a6b8564b2f76ed719358771688bde5bf745bb51 /var/run/docker/libcontainerd/51819240830c3c7bc350746d9a6b8564b2f76ed719358771688bde5bf745bb51 docker-runc
root     41480  0.0  0.0   4156   360 pts/6    Ss+  15:50   0:00  \_ tail -f /dev/null

but when I start again docker they get cleaned up. And of course, docker ps doesn’t show any running containers anymore.
There is no daemon.json, all the parameters are taken from the systemd’s docker.service (and visible on the command line).

This is the container I started:

CONTAINER ID        IMAGE                                                     COMMAND               CREATED             STATUS              PORTS               NAMES
f249cdd2108e        docker-registry.example.com/myrepo/testlistenport80   "tail -f /dev/null"   34 seconds ago      Up 32 seconds       80/tcp              lonely_hodgkin

And here are the logs from journalctl -u docker -f.
This is when I killed -9 docker daemon:

Oct 19 17:12:35 pcdockernode-103.example.com systemd[1]: docker.service: main process exited, code=killed, status=9/KILL

This is when I started docker service again:

Oct 19 17:14:05 pcdockernode-103.example.com systemd[1]: docker.service stop-sigterm timed out. Killing.                                                                           
Oct 19 17:14:05 pcdockernode-103.example.com systemd[1]: Unit docker.service entered failed state.                                                                                 
Oct 19 17:14:05 pcdockernode-103.example.com systemd[1]: docker.service failed.                                                                                                    
Oct 19 17:14:05 pcdockernode-103.example.com systemd[1]: Started docker.service.                                                                                                   
Oct 19 17:14:05 pcdockernode-103.example.com systemd[1]: Starting docker.service...
Oct 19 17:14:05 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:05.933167311+02:00" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting -tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
Oct 19 17:14:05 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:05.934378664+02:00" level=info msg="libcontainerd: new containerd process, pid: 8779"
Oct 19 17:14:05 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:05.94755205+02:00" level=warning msg="containerd: low RLIMIT_NOFILE changing to max" current=1024 max=4096
Oct 19 17:14:05 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:05.948744112+02:00" level=warning msg="containerd: f249cdd2108e574c577ee99d2d3b007dbab55af87e07e3497f4920971922e36f:init shim died, killing associated process"
Oct 19 17:14:05 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:05.954167315+02:00" level=error msg="containerd: notify OOM events" error="open /proc/5674/cgroup: no such file or directory"
Oct 19 17:14:05 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:05.954551343+02:00" level=warning msg="containerd: f249cdd2108e574c577ee99d2d3b007dbab55af87e07e3497f4920971922e36f:init (pid 5674) does not exist"
Oct 19 17:14:06 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:06.937936422+02:00" level=warning msg="libcontainerd: unknown container f249cdd2108e574c577ee99d2d3b007dbab55af87e07e3497f4920971922e36f"
Oct 19 17:14:06 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:06.990397385+02:00" level=warning msg="devmapper: Base device already exists and has filesystem xfs on it. User specified filesystem  will be ignored."
Oct 19 17:14:07 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:07.021469299+02:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Oct 19 17:14:07 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:07.021877131+02:00" level=warning msg="mountpoint for pids not found"
Oct 19 17:14:07 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:07.022386080+02:00" level=info msg="Loading containers: start."
Oct 19 17:14:27 pcdockernode-103.example.com docker[8766]: ........time="2016-10-19T17:14:27.166070265+02:00" level=error msg="devmapper: Error unmounting device 896b0f24e4cdeb88060993d18a969b41b917d0ee0150439697bf705e0c4bcd9a: Device is Busy"                                                                                                                             
Oct 19 17:14:27 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:27.166113620+02:00" level=error msg="Error unmounting container f249cdd2108e574c577ee99d2d3b007dbab55af87e07e3497f4920971922e36f: Device is Busy"
Oct 19 17:14:27 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:27.175839408+02:00" level=info msg="Firewalld running: false"
Oct 19 17:14:27 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:27.428016613+02:00" level=info msg="Removing stale sandbox 0d4925931306a5107094cc480f8c6606a07d4e5a3d7a50f61512ac2515c35572 (f249cdd2108e574c577ee99d2d3b007dbab55af87e07e3497f4920971922e36f)"
Oct 19 17:14:27 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:27.509705891+02:00" level=info msg="Loading containers: done."
Oct 19 17:14:27 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:27.509785435+02:00" level=info msg="Daemon has completed initialization"
Oct 19 17:14:27 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:27.509814778+02:00" level=info msg="Docker daemon" commit=bb80604 graphdriver=devicemapper version=1.12.2
Oct 19 17:14:27 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:27.516559646+02:00" level=info msg="API listen on /var/run/docker.sock"
Oct 19 17:14:27 pcdockernode-103.example.com docker[8766]: time="2016-10-19T17:14:27.516659621+02:00" level=info msg="API listen on [::]:2375"

Any help debugging this further is highly appreciated. :slight_smile: Thanks

Case closed: use the provided systemd files