Docker crash often

Hi,
Docker version 24.0.2, build cb74dfc running on Ubuntu server crashes very often I’ve been working on it for a long time but I can’t solve it.

The problem appears after a series of problems that seem to be related to HealthCheck checks.
The crash error visible in the log is:

Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:06.795291246+01:00" level=error msg="stream copy error: reading from a closed fifo"
Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:07.015017411+01:00" level=error msg="stream copy error: reading from a closed fifo"
Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:07.024134254+01:00" level=error msg="stream copy error: reading from a closed fifo"
Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:07.034889500+01:00" level=error msg="stream copy error: reading from a closed fifo"
Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:13.722838183+01:00" level=error msg="stream copy error: reading from a closed fifo"
Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:19.288732457+01:00" level=warning msg="Health check for container f5c0409ebb86d8df8ec46fd0ed0607e5e27b8c8c353da3b66ffad9c992a0682e error: timed out starting health check for container f5c0409ebb86d8df8ec46fd0ed0607e5e27b8c8c353da3b66ffad9c992a0682e"
Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:19.370781485+01:00" level=warning msg="Health check for container dcaffe2a3c8fc7e230d74a747341077d85a695b9752a5ebedb70b0deabdf971f error: timed out starting health check for container dcaffe2a3c8fc7e230d74a747341077d85a695b9752a5ebedb70b0deabdf971f"
Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:19.757063494+01:00" level=warning msg="Health check for container d755311084b9b9802aa5e6bdade1bb372b3331b02a28f2438cd80d0dbf00f3e9 error: timed out starting health check for container d755311084b9b9802aa5e6bdade1bb372b3331b02a28f2438cd80d0dbf00f3e9"
Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:19.446989942+01:00" level=warning msg="Health check for container d48614d981ae72fa8f0668f78926fcc0263308ab7adfa000bf8f09a4417e0988 error: timed out starting health check for container d48614d981ae72fa8f0668f78926fcc0263308ab7adfa000bf8f09a4417e0988"
Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:19.748475772+01:00" level=error msg="Could not send KILL signal to container process" container=484a21405f7f81c10fa8a4a1617a0332e1da5df069cacbb93a93a2ed8408a21a error="context deadline exceeded" exec=3eb01c99136d8a4e6f3b56a71fa8af87465a84f49bf7b34e8ad8ccd4da3982cc
Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:20.645095688+01:00" level=error msg="stream copy error: reading from a closed fifo"
Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:20.645133189+01:00" level=error msg="stream copy error: reading from a closed fifo"
Nov 24 06:37:22 wp dockerd[32733]: time="2024-11-24T06:37:22.297498079+01:00" level=error msg="Could not send KILL signal to container process" container=f3105968b94bc41d41ed34b85c98d969df700f92e307bd196b07d8d951db125c error="context deadline exceeded" exec=0935f565df4275856eaa951ef7943aa62db40195193ba535ada465fba70efcda
Nov 24 06:37:24 wp dockerd[32733]: panic: runtime error: invalid memory address or nil pointer dereference
Nov 24 06:37:24 wp dockerd[32733]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x55909fa04a04]
Nov 24 06:37:24 wp dockerd[32733]: goroutine 51114976 [running]:
Nov 24 06:37:24 wp dockerd[32733]: github.com/docker/docker/daemon.(*Daemon).ProcessEvent.func1()
Nov 24 06:37:24 wp dockerd[32733]: /go/src/github.com/docker/docker/daemon/monitor.go:190 +0x44
Nov 24 06:37:24 wp dockerd[32733]: created by github.com/docker/docker/daemon.(*Daemon).ProcessEvent
Nov 24 06:37:24 wp dockerd[32733]: /go/src/github.com/docker/docker/daemon/monitor.go:189 +0x55d
Nov 24 06:37:24 wp systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 24 06:37:24 wp systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 24 06:37:26 wp systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Nov 24 06:37:26 wp systemd[1]: docker.service: Scheduled restart job, restart counter is at 6.
Nov 24 06:37:26 wp systemd[1]: Stopped Docker Application Container Engine.
Nov 24 06:37:26 wp systemd[1]: Starting Docker Application Container Engine...
Nov 24 06:37:27 wp dockerd[29460]: time="2024-11-24T06:37:27.173781002+01:00" level=info msg="Starting up"
Nov 24 06:37:27 wp dockerd[29460]: time="2024-11-24T06:37:27.176126831+01:00" level=info msg="User namespaces: ID ranges will be mapped to subuid/subgid ranges of: dockeruser"
Nov 24 06:37:27 wp dockerd[29460]: time="2024-11-24T06:37:27.184594397+01:00" level=info msg="User namespaces: ID ranges will be mapped to subuid/subgid ranges of: dockeruser"
Nov 24 06:37:27 wp dockerd[29460]: time="2024-11-24T06:37:27.434093496+01:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
Nov 24 06:37:27 wp dockerd[29460]: time="2024-11-24T06:37:27.908101367+01:00" level=info msg="Loading containers: start."
Nov 24 06:37:29 wp dockerd[29460]: time="2024-11-24T06:37:29.148532803+01:00" level=info msg="ignoring event" container=e7e4d74144095bacac02274d2935d5725d2695616c8b8a6681232f5ce698cdda module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

after docker auto restart many containers remain off.

docker info:

Client: Docker Engine - Community
 Version:    24.0.2
 Context:    default
 Debug Mode: false
Server:
 Containers: 81
  Running: 78
  Paused: 0
  Stopped: 3
 Images: 31
 Server Version: 24.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  userns
 Kernel Version: 4.15.0-213-generic
 Operating System: Ubuntu 18.04.6 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.29GiB
 Name: wp
 ID: ceb2f912-4b74-4d22-86b4-3f60afe3f7e5
 Docker Root Dir: /var/lib/docker/165536.165536
 Debug Mode: false
 Username: xxxxxxxx
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Any suggestions are appreciated,
Maurizio

You are using a docker version that reached end of life on an os that reached end of life 1,5 years ago.
You won’t be able to install the latest docker-ce version, which is 27.3.1 at the time being.

A failed health check of a container should not bring down the the docker engine. Depending on whether you run plain container or swarm services, the container should be just marked unhealthy, or in case of a swarm service, the container should be restarted (which replaces the original container).

If you are lucky, someone know the situation and can share an answer. Everyone else will less likely follow up in a scenario with an unsupported Docker version on an unsupported OS version. You are pretty much on your own.

1 Like

I don’t think the health checks caused anything. For me it looks like starting the health checks failed for the same reason as why the other errors appeared.

I suspect some memory or filesystem issues possibly caused by containers or anything on the host that led to the timeouts and invalid memory address issue after which docker restarted also restarting containers so the issue caused by containers could solve itself.

It is all speculation of course.

ok, thanks everyone for the suggestions.
I will try to update both the system and the docker as soon as possible and see if the problem is resolved