Hi Everyone:
I’m a bit flummoxed, and am hoping someone can help.
I administrate a CentOS Linux release 7.5.1804 (Core) system running Docker 17.05.0-ce. This environment is using device mapper as the storage driver. When it was first setup and configured around a year ago everything worked as expected, but at some point the system had an unexpected reboot.
Here’s the issue: when attempting to startup the Docker Engine via systemd, I get the following:
=====
[root@{system} system]# systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/docker.service.d
└─docker.conf
Active: failed (Result: start-limit) since Mon 2018-07-16 14:15:24 EDT; 8min ago
Docs:
Process: 14519 ExecStart=/usr/bin/dockerd -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375 --data-root="/var/lib/docker" --storage-driver=devicemapper --storage-opt=dm.thinpooldev=/dev/mapper/centos_{system}-thinpool --storage-opt dm.use_deferred_removal=true (code=exited, status=1/FAILURE)
Main PID: 14519 (code=exited, status=1/FAILURE)
Jul 16 14:15:24 {system} systemd[1]: Failed to start Docker Application Container Engine.
Jul 16 14:15:24 {system} systemd[1]: Unit docker.service entered failed state.
Jul 16 14:15:24 {system} systemd[1]: docker.service failed.
Jul 16 14:15:24 {system} systemd[1]: docker.service holdoff time over, scheduling restart.
Jul 16 14:15:24 {system} systemd[1]: start request repeated too quickly for docker.service
Jul 16 14:15:24 {system} systemd[1]: Failed to start Docker Application Container Engine.
Jul 16 14:15:24 {system} systemd[1]: Unit docker.service entered failed state.
Jul 16 14:15:24 {system} systemd[1]: docker.service failed.
Jul 16 13:49:07 {system} systemd[1]: Starting Docker Application Container Engine…
– Subject: Unit docker.service has begun start-up
– Defined-By: systemd
– Support:
– Unit docker.service has begun starting up.
Jul 16 13:49:07 {system} dockerd[13594]: time=“2018-07-16T13:49:07-04:00” level=warning msg=“the “-g / --graph” flag is deprecated. Please use “–data-root” instead”
Jul 16 13:49:07 {system} dockerd[13594]: time=“2018-07-16T13:49:07.283859711-04:00” level=warning msg="[!] DON’T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON’T KNOW WHAT YOU’RE DOING [!]"
Jul 16 13:49:07 {system} dockerd[13594]: time=“2018-07-16T13:49:07.284782021-04:00” level=info msg=“libcontainerd: new containerd process, pid: 13600”
Jul 16 13:49:08 {system} dockerd[13594]: time=“2018-07-16T13:49:08.288373051-04:00” level=warning msg=“failed to rename /”/var/lib/docker"/tmp for background deletion: %!s(). Deleting synchronously"
Jul 16 13:49:08 {system} dockerd[13594]: Error starting daemon: error initializing graphdriver: devmapper: Unable to take ownership of thin-pool (centos_{system}-thinpool) that already has used data blocks
Jul 16 13:49:08 {system} systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Jul 16 13:49:08 {system} systemd[1]: Failed to start Docker Application Container Engine.
– Subject: Unit docker.service has failed
– Defined-By: systemd
– Support:
– Unit docker.service has failed.
– The result is failed.
=====
I know that this often happens because of an issue with /var/lib/docker, usually when it’s been deleted, or when there is a problem with the device mapper thin pool.
But here’s the kicker – if I run the same command as in the ExecStart manually as root, it works just fine:
=====
[root@{system} ~]# /usr/bin/dockerd -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375 --data-root="/var/lib/docker" --storage-driver=devicemapper --storage-opt=dm.thinpooldev=/dev/mapper/centos_{system}-thinpool --storage-opt dm.use_deferred_removal=true
WARN[0000] [!] DON’T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON’T KNOW WHAT YOU’RE DOING [!]
INFO[0000] libcontainerd: new containerd process, pid: 14605
WARN[0000] containerd: low RLIMIT_NOFILE changing to max current=1024 max=4096
WARN[0011] devmapper: Base device already exists and has filesystem xfs on it. User specified filesystem will be ignored.
INFO[0063] Graph migration to content-addressability took 0.00 seconds
INFO[0063] Loading containers: start.
WARN[0063] libcontainerd: client is out of sync, restore was called on a fully synced container (a7e40003fd70351e344ae75eabfd61246e29cf53dbef0c5eebac41c371b1fd04).
WARN[0063] libcontainerd: failed to retrieve container a7e40003fd70351e344ae75eabfd61246e29cf53dbef0c5eebac41c371b1fd04 state: rpc error: code = 2 desc = containerd: container not found
WARN[0063] failed to cleanup ipc mounts:
failed to umount /var/lib/docker/containers/a7e40003fd70351e344ae75eabfd61246e29cf53dbef0c5eebac41c371b1fd04/shm: invalid argument
ERRO[0063] devmapper: Error unmounting device 31cb0384e72d4918346a992adef4955e8eaab5b7bbeecadea7bf89e29d62c1c5: invalid argument
ERRO[0063] Error unmounting container a7e40003fd70351e344ae75eabfd61246e29cf53dbef0c5eebac41c371b1fd04: invalid argument
WARN[0063] libcontainerd: client is out of sync, restore was called on a fully synced container (96a02f04e111a84c375c4b4f115bce02deedd27c2389f51de2b02020a9752424).
WARN[0063] libcontainerd: failed to retrieve container 96a02f04e111a84c375c4b4f115bce02deedd27c2389f51de2b02020a9752424 state: rpc error: code = 2 desc = containerd: container not found
WARN[0063] failed to cleanup ipc mounts:
failed to umount /var/lib/docker/containers/96a02f04e111a84c375c4b4f115bce02deedd27c2389f51de2b02020a9752424/shm: invalid argument
ERRO[0063] devmapper: Error unmounting device c1839311b086f2abac8d21f8a10441edfedb2bf14bf925f5a11407909ad67831: invalid argument
ERRO[0063] Error unmounting container 96a02f04e111a84c375c4b4f115bce02deedd27c2389f51de2b02020a9752424: invalid argument
INFO[0064] Removing stale sandbox c080a2f0e10a148faced93946fa42c0b0cb6ea8482a3ce9c155b7055655e3a6a (a7e40003fd70351e344ae75eabfd61246e29cf53dbef0c5eebac41c371b1fd04)
INFO[0064] Removing stale sandbox 68f55b30cb7bd4f96e3cb9544ec443d0dc23d66eee6f9ae04b0367ca130dc0e1 (96a02f04e111a84c375c4b4f115bce02deedd27c2389f51de2b02020a9752424)
INFO[0064] Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address
ERRO[0065] Failed to start container 96a02f04e111a84c375c4b4f115bce02deedd27c2389f51de2b02020a9752424: error while mounting volume ‘/var/lib/docker/volumes/Jenkins_Backups/_data’: error while mounting volume with options: type=‘nfs’ device=’:/opt/{system}_backups’ o=‘addr={system-2}’: connection refused
INFO[0065] Loading containers: done.
INFO[0067] Daemon has completed initialization
INFO[0067] Docker daemon commit=89658be graphdriver=devicemapper version=17.05.0-ce
INFO[0067] API listen on [::]:2375
INFO[0067] API listen on /var/run/docker.sock
=====
(Note: I did a grep to replace the system hostname with {system} in the above log snippets).
Is there some way to fix this without rebuilding the system? This system is used as an environment to build Docker containers, and it currently has 1Tb of storage, so I’d prefer to fix whatever issues the Docker Engine is having at startup when run under systemd rather than having to rebuild everything. My current work-around is to just run the Docker Engine directly inside screen, but that’s not a very robust work-around.
Thanks in advance for any advice!