Docker hangs on start and does not respond. Full output, stacktrace, config and environment inside

Hello, I’m having an issue with my docker install. When I start the daemon, either through systemd or manually, it will hang and not respond to any commands. I have exhausted every thread I could find and I’m still unable to fix this issue.

I have narrowed it down to the storage driver. When I start docker to use overlay it starts just fine (except for not seeing my containers), but overlay2 does not.

It used to work on this machine however after a reboot it quit working. The reboot breaking it and the problem being related to storage driver makes me think this is a kernel issue. One thread I found recommended installing linux-modules-extra-... and linux-image-extra-virtual However that made no difference. I have also let it sit overnight to see if it would finally load and it does not, the only way to get it to stop is kill -9 .

I’m using docker-ce from https://download.docker.com/linux/ubuntu bionic/stable amd64

I have not dist-upgraded or made any major changes. Disk not full 10%. There is plenty of ram and cpu resources available, this is the only purpose of this machine. It is also important to me that I am able to recover the volume data I have so I don’t want to clear /var/lib/docker

Any help would be much appreciated, thank you in advance! :slight_smile:

Environment:

OS: Ubuntu Server 18.04.2
Kernel: 4.15.0-55-generic
Filesystem: ext4 on lvm
Docker: 18.09.1
Containerd: 1.2.2

Also tried:
Docker: 19.03.1
Conatinerd: 1.2.6

Filesystem                  Type      Size  Used Avail Use% Mounted on
udev                        devtmpfs  3.9G     0  3.9G   0% /dev
tmpfs                       tmpfs     798M 1020K  797M   1% /run
/dev/mapper/host--vg-root   ext4      125G   12G  107G  10% /
tmpfs                       tmpfs     3.9G     0  3.9G   0% /dev/shm
tmpfs                       tmpfs     5.0M     0  5.0M   0% /run/lock
tmpfs                       tmpfs     3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/loop0                  squashfs   89M   89M     0 100% /snap/core/7270
/dev/loop2                  squashfs   11M   11M     0 100% /snap/kubectl/1014
/dev/loop1                  squashfs   89M   89M     0 100% /snap/core/7169
/dev/loop3                  squashfs   11M   11M     0 100% /snap/kubectl/1034
tmpfs                       tmpfs     798M     0  798M   0% /run/user/1001

My /etc/docker/daemon.json is usually empty but I have this in it for testing:

{
  "storage-driver": "overlay2"
}

Here is what it looks like starting dockerd with debug mode:

$ sudo /usr/bin/dockerd --containerd=/run/containerd/containerd.sock -D -l "debug"
INFO[2019-07-26T22:11:23.838982560-07:00] Starting up
DEBU[2019-07-26T22:11:23.839532697-07:00] Listener created for HTTP on unix (/var/run/docker.sock)
DEBU[2019-07-26T22:11:23.840298306-07:00] Golang's threads limit set to 56970
INFO[2019-07-26T22:11:23.840878724-07:00] parsed scheme: "unix"                         module=grpc
INFO[2019-07-26T22:11:23.840914311-07:00] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2019-07-26T22:11:23.840939245-07:00] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0  <nil>}] }  module=grpc
INFO[2019-07-26T22:11:23.840963140-07:00] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2019-07-26T22:11:23.841201833-07:00] pickfirstBalancer: HandleSubConnStateChange: 0xc00014c910, CONNECTING  module=grpc
INFO[2019-07-26T22:11:23.841231668-07:00] blockingPicker: the picked transport is not ready, loop back to repick  module=grpc
INFO[2019-07-26T22:11:23.841878383-07:00] pickfirstBalancer: HandleSubConnStateChange: 0xc00014c910, READY  module=grpc
INFO[2019-07-26T22:11:23.842936134-07:00] parsd scheme: "unix"                         module=grpc
INFO[2019-07-26T22:11:23.842969892-07:00] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2019-07-26T22:11:23.842988017-07:00] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0  <nil>}] }  module=grpc
INFO[2019-07-26T22:11:23.843004117-07:00] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2019-07-26T22:11:23.843087387-07:00] pickfirstBalancer: HandleSubConnStateChange: 0xc0008b2200, CONNECTING  module=grpc
INFO[2019-07-26T22:11:23.843131593-07:00] blockingPicker: the picked transport is not ready, loop back to repick  module=grpc
INFO[2019-07-26T22:11:23.843655723-07:00] pickfirstBalancer: HandleSubConnStateChange: 0xc0008b2200, READY  module=grpc
DEBU[2019-07-26T22:11:23.844395838-07:00] Using default logging driver json-file
DEBU[2019-07-26T22:11:23.844437011-07:00] [graphdriver] trying provided driver: overlay2
DEBU[2019-07-26T22:11:23.844552115-07:00] processing event stream                       module=libcontainerd namespace=plugins.moby
DEBU[2019-07-26T22:11:23.847493678-07:00] backingFs=extfs, projectQuotaSupported=false, indexOff="index=off,"  storage-driver=overlay2
DEBU[2019-07-26T22:11:23.847586895-07:00] Initialized graph driver overlay2
WARN[2019-07-26T22:11:23.862607744-07:00] Your kernel does not support cgroup rt period
WARN[2019-07-26T22:11:23.862646492-07:00] Your kernel does not support cgroup rt runtime
DEBU[2019-07-26T22:11:23.862849606-07:00] Max Concurrent Downloads: 3
DEBU[2019-07-26T22:11:23.862873116-07:00] Max Concurrent Uploads: 5
INFO[2019-07-26T22:11:23.862904180-07:00] Loading containers: start.
DEBU[2019-07-26T22:11:23.863056712-07:00] processing event stream                       module=libcontainerd namespace=moby
DEBU[2019-07-26T22:11:23.864709218-07:00] Loaded container e98fbec425363fcff163384a3bbb1587c423b2076cf9313b06d1e0e3a91343a5, isRunning: false
DEBU[2019-07-26T22:11:23.864732607-07:00] Loaded container 296303298f919606588c762dff121c667d98eae035970e668e492e7063553070, isRunning: false
DEBU[2019-07-26T22:11:23.864734745-07:00] Loaded container 7f94c91d80c605b51e0eeb0a080359008074a7bfd77d690a5c1aa181d2b738b2, isRunning: false
DEBU[2019-07-26T22:11:23.864796772-07:00] Loaded container 92e4f3855a60df74a378425fc4e920d0473d9617f4e2f74c2303b720d1eb6610, isRunning: false
DEBU[2019-07-26T22:11:23.864932555-07:00] Loaded container 1ac354cfbc1c76165138be8aff3e7d82f48392d53ce25718a9ca498ba4e260b9, isRunning: false
DEBU[2019-07-26T22:11:23.864945033-07:00] Loaded container c5e09e4f9de2337c244221b0fc0e40bed364962eb4cf40efb66ccaa8eaf3f086, isRunning: false
DEBU[2019-07-26T22:11:23.864996059-07:00] Loaded container cebe07694e7857ac4ae4f6bacfde2831ed538ee2dd9ee7248381b315626e7019, isRunning: false
^CINFO[2019-07-26T22:16:14.389013997-07:00] Processing signal 'interrupt'
^CINFO[2019-07-26T22:16:14.757979442-07:00] Processing signal 'interrupt'
^CINFO[2019-07-26T22:16:14.929377508-07:00] Processing signal 'interrupt'
^CINFO[2019-07-26T22:16:15.323824421-07:00] Processing signal 'interrupt'

Here is a stack trace when it is hung:
https://pastebin.com/xFyyTkch

Facing a similar problem. Anyone knows?