Docker containers 'frozen' - can't stop, start, run

Hi - hopefully this is the right location for this type of question. I’ve been running docker-ce on my ubuntu server v. 18.04 for approx 6 months with little to no problems.

Just this evening, I got a notification from another service that runs on my Windows server that one of my docker containers was offline and unreachable. I ran ‘docker ps’ to see if the container was running, and it was in fact still running. I then restarted the container, and still could not reach the container. I tried to access all of my other docker containers via their IP and port, and got timeouts on all of them.

I figured something was messed up and just restarted my ubuntu server. I noticed on reboot however, that when I ran docker ps, it still said that all of my containers were running, and the status time had not changed at all (saying up 23 hours, even after I rebooted my machine).

I then tried removing the container and then re-adding it, however when I try to run the new container, I got the following error:

kevin@linuxserver:~$ docker run hello-world
Unable to find image ‘hello-world:latest’ locally
latest: Pulling from library/hello-world
1b930d010525: Pull complete
Digest: sha256:c3b4ada4687bbaa170745b3e4dd8ac3f194ca95b2d0518b417fb47e5879d9b5f
Status: Downloaded newer image for hello-world:latest
docker: Error response from daemon: all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: permission denied”: unavailable.

Obviously the above is just the standard hello world container, but the error message is the same no matter which container I force remove and re-add.

I figured something in my install got messed up, so I removed docker-ce all together, removed
/var/lib/docker, and rebooted my machine.

I’m not sure if maybe I just don’t understand docker, but even with docker-ce uninstalled, I was still able to run docker ps commands, and see my containers still running (although the status time was still unchanged, staying at 23 hours).

I then re-installed docker ce, and was able to run the hello world container, but after a reboot, all of my previous containers returned, still in the same frozen state as before.

I did a lot of googling, and the only thing I could find relating to the above error is the below link, which has the syslog being absolutely huge, and docker being the culprit:

I don’t know if it is related, but my syslog is also rapidly increasing in size (was up to 50 gigs, but even after truncating it, an hour later, it is back up to 19 gigs). I can’t figure out how to read it since it is so big.

I’m not sure where to go at this point, so any guidance would be fantastic. Here is the info on my system:

Docker Info:

kevin@linuxserver:~$ docker info
Client:
Debug Mode: false

Server:
Containers: 13
Running: 11
Paused: 0
Stopped: 2
Images: 17
Server Version: 18.09.9
Storage Driver: aufs
Root Dir: /var/snap/docker/common/var-lib-docker/aufs
Backing Filesystem: extfs
Dirs: 124
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: N/A
runc version: N/A
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-70-generic
Operating System: Ubuntu Core 16
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.605GiB
Name: linuxserver
ID: EHR3:23QF:LXM3:L6P6:ZC6C:YZJA:QLYR:X6WN:MUID:2ULP:XDNL:P5YL
Docker Root Dir: /var/snap/docker/common/var-lib-docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Docker Version:

kevin@linuxserver:~$ docker version
Client: Docker Engine - Community
Version: 19.03.5
API version: 1.39 (downgraded from 1.40)
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:29:52 2019
OS/Arch: linux/amd64
Experimental: false

Server:
Engine:
Version: 18.09.9
API version: 1.39 (minimum version 1.12)
Go version: go1.12.10
Git commit: 9552f2b
Built: Fri Sep 27 20:36:26 2019
OS/Arch: linux/amd64
Experimental: false

Same problem experienced here. By web searching, I found that it seems Ubuntu automatically installed a SNAP version of Docker around 15 November 2019, and when you are running two versions of Docker - one from SNAP and one from DockerIO, you are going to experience resource conflicts that cause this problem we are experiencing.
Try:
sudo killall dockerd
sudo snap remove docker --purge
sudo truncate -s 0 /var/log/syslog
sudo systemctl restart docker.service

If that works, reboot your server and make sure it remains working.

Personally, I don’t mind the SNAP concept but I am concerned about SNAP installing things I didn’t know about.

1 Like

Thanks - that was exactly it.

Cautious: backup your existing storage volume before you execute the commands !!!

Hi, i was having the same issue as you and was able to get docker working again, but now i don’t know how i am supposed to get my containers back again. I have a backup of my var/lib/docker folder. How where you able to accomplice it?

after spending all day and night pulling my hair out trying to figure out this problem, I couldn’t be bothered to try and get my containers back. I have all data for the containers saved to volumes backed up regularly outside each container, and save the commands to create and run the containers.

It took me about 15 minutes to re-run all of the commands to re-create each container to point to my previously saved data. The biggest challenge was getting bitwarden back up and running, otherwise it would have taken me 3 minutes in total, since I only needed to run the commands for my other containers.