Docker won't start stacks after server shutdown

I had a power failure that resulted in a server shutdown, and although I was on a power backup, something happened where the server didn’t do a graceful shutdown of docker. Now I cannot run our stack.

So, let’s remove all stacks, docker system prune, stop docker and containerd:

# uname -r
5.13.6-gentoo-x86_64

Verify nothing is running:

# docker stack ls
NAME      SERVICES

# docker service ls 
ID        NAME      MODE      REPLICAS   IMAGE     PORTS

# docker container ls -a
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

# docker network ls
NETWORK ID     NAME              DRIVER    SCOPE
34f50fa71946   bridge            bridge    local
ee3b279aafb2   docker_gwbridge   bridge    local
85dcf88719e0   host              host      local
hzbseotexo3a   ingress           overlay   swarm
71f5f693f0bb   none              null      local

Verify prune:

# docker system prune
WARNING! This will remove:
  - all stopped containers
  - all networks not used by at least one container
  - all dangling images
  - all dangling build cache

Are you sure you want to continue? [y/N] y
Total reclaimed space: 0B

Stop and start containerd and docker

# /etc/init.d/docker stop
 * Caching service dependencies ...                                                                                                                                                                                                                                                                           [ ok ]
 * Stopping docker ...                                                                                                                                                                                                                                                                                        [ ok ]
# /etc/init.d/containerd stop
 * Stopping containerd ...                                                                                                                                                                                                                                                                                    [ ok ]
# rm /var/log/docker.log
# rm /var/log/containerd/containerd.log 
# /etc/init.d/containerd start
 * Starting containerd ...                                                                                                                                                                                                                                                                                    [ ok ]
# /etc/init.d/docker start
 * /var/log/docker.log: creating file
 * /var/log/docker.log: correcting owner
 * Starting docker ...               

Checkout the containerd log before we bring up a stack:

# tail -f -n 50 /var/log/containerd/containerd.log 
time="2024-01-10T14:44:27.755615651-05:00" level=info msg="loading plugin \"io.containerd.sandbox.store.v1.local\"..." type=io.containerd.sandbox.store.v1
time="2024-01-10T14:44:27.755630604-05:00" level=info msg="loading plugin \"io.containerd.sandbox.controller.v1.local\"..." type=io.containerd.sandbox.controller.v1
time="2024-01-10T14:44:27.755641960-05:00" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
time="2024-01-10T14:44:27.755650075-05:00" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
time="2024-01-10T14:44:27.755657379-05:00" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
time="2024-01-10T14:44:27.755665680-05:00" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
time="2024-01-10T14:44:27.755673297-05:00" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
time="2024-01-10T14:44:27.755681110-05:00" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
time="2024-01-10T14:44:27.755691232-05:00" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
time="2024-01-10T14:44:27.755699428-05:00" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
time="2024-01-10T14:44:27.755711548-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755720139-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755727584-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755735024-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755743875-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755752419-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755760479-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755768012-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755775319-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.sandbox-controllers\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755784421-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.sandboxes\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755792618-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755800102-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.streaming\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755807168-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755818453-05:00" level=info msg="loading plugin \"io.containerd.transfer.v1.local\"..." type=io.containerd.transfer.v1
time="2024-01-10T14:44:27.755830225-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.transfer\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755837696-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755844203-05:00" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
time="2024-01-10T14:44:27.755852498-05:00" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
time="2024-01-10T14:44:27.755861868-05:00" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
time="2024-01-10T14:44:27.755868335-05:00" level=info msg="skipping tracing processor initialization (no tracing plugin)" error="no OpenTelemetry endpoint: skip plugin"
time="2024-01-10T14:44:27.755906103-05:00" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
time="2024-01-10T14:44:27.755934065-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.755943093-05:00" level=info msg="loading plugin \"io.containerd.nri.v1.nri\"..." type=io.containerd.nri.v1
time="2024-01-10T14:44:27.755953274-05:00" level=info msg="NRI interface is disabled by configuration."
time="2024-01-10T14:44:27.755963080-05:00" level=info msg="loading plugin \"io.containerd.grpc.v1.cri\"..." type=io.containerd.grpc.v1
time="2024-01-10T14:44:27.756018157-05:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false PrivilegedWithoutHostDevicesAllDevicesAllowed:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0 Snapshotter: SandboxMode:} UntrustedWorkloadRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false PrivilegedWithoutHostDevicesAllDevicesAllowed:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0 Snapshotter: SandboxMode:} Runtimes:map[runc:{Type:io.containerd.runc.v2 Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[BinaryName: CriuImagePath: CriuPath: CriuWorkPath: IoGid:0 IoUid:0 NoNewKeyring:false NoPivotRoot:false Root: ShimCgroup: SystemdCgroup:false] PrivilegedWithoutHostDevices:false PrivilegedWithoutHostDevicesAllDevicesAllowed:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0 Snapshotter: SandboxMode:podsandbox}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false IgnoreBlockIONotEnabledErrors:false IgnoreRdtNotEnabledErrors:false} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginSetupSerially:false NetworkPluginConfTemplate: IPPreference:} Registry:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:registry.k8s.io/pause:3.8 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:true DisableHugetlbController:true DeviceOwnershipFromSecurityContext:false IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false EnableUnprivilegedPorts:false EnableUnprivilegedICMP:false EnableCDI:false CDISpecDirs:[/etc/cdi /var/run/cdi] ImagePullProgressTimeout:1m0s DrainExecSyncIOTimeout:0s} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.containerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}"
time="2024-01-10T14:44:27.756053132-05:00" level=info msg="Connect containerd service"
time="2024-01-10T14:44:27.756067042-05:00" level=info msg="using legacy CRI server"
time="2024-01-10T14:44:27.756071636-05:00" level=info msg="using experimental NRI integration - disable nri plugin to prevent this"
time="2024-01-10T14:44:27.756278041-05:00" level=info msg="Get image filesystem path \"/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs\""
time="2024-01-10T14:44:27.756773958-05:00" level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
time="2024-01-10T14:44:27.756879031-05:00" level=info msg="Start subscribing containerd event"
time="2024-01-10T14:44:27.756909863-05:00" level=info msg="Start recovering state"
time="2024-01-10T14:44:27.756947693-05:00" level=info msg="Start event monitor"
time="2024-01-10T14:44:27.756961465-05:00" level=info msg="Start snapshots syncer"
time="2024-01-10T14:44:27.756968072-05:00" level=info msg="Start cni network conf syncer for default"
time="2024-01-10T14:44:27.756974821-05:00" level=info msg="Start streaming server"
time="2024-01-10T14:44:27.756999072-05:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
time="2024-01-10T14:44:27.757040449-05:00" level=info msg=serving... address=/run/containerd/containerd.sock
time="2024-01-10T14:44:27.757058609-05:00" level=info msg="containerd successfully booted in 0.017116s"

Now /var/log/docker.log:

# tail -f -n 50 /var/log/docker.log
time="2024-01-10T14:44:36.144943822-05:00" level=error msg="failed to load container" container=4ac42e141f04eb1063f4ec2f8fee299d7899e2add733ed5b005f033dae67fd51 error="open /var/lib/docker/containers/4ac42e141f04eb1063f4ec2f8fee299d7899e2add733ed5b005f033dae67fd51/config.v2.json: no such file or directory"
time="2024-01-10T14:44:36.145207216-05:00" level=error msg="failed to load container" container=1ded1c3cbb00b0043bbc5e790b2b44fc9b796507cf130a52b67a10d31c1afb5d error="open /var/lib/docker/containers/1ded1c3cbb00b0043bbc5e790b2b44fc9b796507cf130a52b67a10d31c1afb5d/config.v2.json: no such file or directory"
time="2024-01-10T14:44:36.145242222-05:00" level=error msg="failed to load container mount" container=5b2af293e0956dc1a83c1ddcf1c75b8c16aa3557ca6b6fb2d0519b603cf3629c error="mount does not exist"
time="2024-01-10T14:44:36.145372289-05:00" level=error msg="failed to load container" container=ac756c0067a0bdcbb01ad4b43ecab9db97a66f4008287acb97a8369a14208a4a error="open /var/lib/docker/containers/ac756c0067a0bdcbb01ad4b43ecab9db97a66f4008287acb97a8369a14208a4a/config.v2.json: no such file or directory"
time="2024-01-10T14:44:36.145481216-05:00" level=error msg="failed to load container mount" container=a8d6ae65770b9215dee42d5a97ceeb014293deb52bbe3cdda5f0ebf3aac275c0 error="mount does not exist"
time="2024-01-10T14:44:36.145571233-05:00" level=error msg="failed to load container mount" container=a9b3f2d71637329479435a02a1fabed38ea06613a984087eee0fe9700e2d4406 error="mount does not exist"
time="2024-01-10T14:44:36.145723410-05:00" level=error msg="failed to load container" container=5b694f967f018e784dd29839715956c6dea65ce20ecc29511124546749b45546 error="open /var/lib/docker/containers/5b694f967f018e784dd29839715956c6dea65ce20ecc29511124546749b45546/config.v2.json: no such file or directory"
time="2024-01-10T14:44:36.150060025-05:00" level=error msg="failed to load container" container=7d62b200c5d4436b615e04c514759f8d4051a1e70f641e6762583da3af14b969 error="open /var/lib/docker/containers/7d62b200c5d4436b615e04c514759f8d4051a1e70f641e6762583da3af14b969/config.v2.json: no such file or directory"
time="2024-01-10T14:44:36.150210822-05:00" level=error msg="failed to load container" container=d1c358b767d787efb9bebe0367caaa25a1f5eadcdb6b9fb7a6ecb67e52859436 error="open /var/lib/docker/containers/d1c358b767d787efb9bebe0367caaa25a1f5eadcdb6b9fb7a6ecb67e52859436/config.v2.json: no such file or directory"
time="2024-01-10T14:44:36.150862261-05:00" level=error msg="failed to load container mount" container=d66305b96812373e9956e68fae7ec36566daaba0ae0912ea37d675b82b3ba5a0 error="mount does not exist"
time="2024-01-10T14:44:36.282597282-05:00" level=info msg="Fixing inconsistent endpoint_cnt for network docker_gwbridge. Expected=0, Actual=2"
time="2024-01-10T14:44:36.500075131-05:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
time="2024-01-10T14:44:36.636277534-05:00" level=info msg="Loading containers: done."
time="2024-01-10T14:44:36.822765017-05:00" level=info msg="Docker daemon" commit=311b9ff0aa93aa55880e1e5f8871c4fb69583426 graphdriver=overlay2 version=24.0.7
time="2024-01-10T14:44:36.823039532-05:00" level=info msg="metrics API listening on 192.168.1.145:9323"
time="2024-01-10T14:44:36.838920048-05:00" level=info msg="Listening for connections" addr="[::]:2377" module=node node.id=xyvuvw9yaohqzivpynjs2tmlz proto=tcp
time="2024-01-10T14:44:36.838977051-05:00" level=info msg="Listening for local connections" addr=/var/run/docker/swarm/control.sock module=node node.id=xyvuvw9yaohqzivpynjs2tmlz proto=unix
time="2024-01-10T14:44:36.839751774-05:00" level=info msg="manager selected by agent for new session: {xyvuvw9yaohqzivpynjs2tmlz 192.168.1.145:2377}" module=node/agent node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:36.847452002-05:00" level=info msg="waiting 0s before registering session" module=node/agent node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:36.953971187-05:00" level=info msg="5d0b3d8eaf5aa84e switched to configuration voters=(6704520153307719758)" module=raft node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:36.954005030-05:00" level=info msg="5d0b3d8eaf5aa84e became follower at term 3018" module=raft node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:36.954013461-05:00" level=info msg="newRaft 5d0b3d8eaf5aa84e [peers: [5d0b3d8eaf5aa84e], term: 3018, commit: 10135240, applied: 10128018, lastindex: 10135240, lastterm: 3018]" module=raft node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:36.956564652-05:00" level=info msg="5d0b3d8eaf5aa84e is starting a new election at term 3018" module=raft node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:36.956589605-05:00" level=info msg="5d0b3d8eaf5aa84e became candidate at term 3019" module=raft node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:36.956613869-05:00" level=info msg="5d0b3d8eaf5aa84e received MsgVoteResp from 5d0b3d8eaf5aa84e at term 3019" module=raft node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:36.956622242-05:00" level=info msg="5d0b3d8eaf5aa84e became leader at term 3019" module=raft node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:36.956633136-05:00" level=info msg="raft.node: 5d0b3d8eaf5aa84e elected leader 5d0b3d8eaf5aa84e at term 3019" module=raft node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:37.004327521-05:00" level=warning msg="election tick value (10s) is different from the one defined in the cluster config (3s), the cluster may be unstable" module=node node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:37.557591940-05:00" level=error msg="error creating cluster object" error="name conflicts with an existing object" module=node node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:37.557663018-05:00" level=info msg="leadership changed from not yet part of a raft cluster to xyvuvw9yaohqzivpynjs2tmlz" module=node node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:37.557727738-05:00" level=info msg="dispatcher starting" module=dispatcher node.id=xyvuvw9yaohqzivpynjs2tmlz
time="2024-01-10T14:44:37.948805174-05:00" level=info msg="worker xyvuvw9yaohqzivpynjs2tmlz was successfully registered" method="(*Dispatcher).register"
time="2024-01-10T14:44:37.990673259-05:00" level=info msg="Initializing Libnetwork Agent Listen-Addr=0.0.0.0 Local-addr=192.168.1.145 Adv-addr=192.168.1.145 Data-addr= Remote-addr-list=[] MTU=1500"
time="2024-01-10T14:44:37.990749247-05:00" level=info msg="New memberlist node - Node:svn will use memberlist nodeID:f3afbab56df5 with config:&{NodeID:f3afbab56df5 Hostname:svn BindAddr:0.0.0.0 AdvertiseAddr:192.168.1.145 BindPort:0 Keys:[[239 216 105 228 220 199 136 112 145 237 90 16 62 88 48 70] [142 206 176 106 78 200 177 187 97 43 111 75 59 67 73 200] [17 65 138 253 176 111 216 34 104 49 139 136 236 63 180 151]] PacketBufferSize:1400 reapEntryInterval:1800000000000 reapNetworkInterval:1825000000000 rejoinClusterDuration:10000000000 rejoinClusterInterval:60000000000 StatsPrintPeriod:5m0s HealthPrintPeriod:1m0s}"
time="2024-01-10T14:44:37.991842643-05:00" level=info msg="Node f3afbab56df5/192.168.1.145, joined gossip cluster"
time="2024-01-10T14:44:37.991929858-05:00" level=info msg="Node f3afbab56df5/192.168.1.145, added to nodes list"
time="2024-01-10T14:44:37.996463583-05:00" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_quiescent_template" error="open /proc/sys/net/ipv4/vs/expire_quiescent_template: no such file or directory"
time="2024-01-10T14:44:37.996503847-05:00" level=error msg="error reading the kernel parameter net.ipv4.vs.conn_reuse_mode" error="open /proc/sys/net/ipv4/vs/conn_reuse_mode: no such file or directory"
time="2024-01-10T14:44:37.996522122-05:00" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn" error="open /proc/sys/net/ipv4/vs/expire_nodest_conn: no such file or directory"
time="2024-01-10T14:44:37.996539692-05:00" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn" error="open /proc/sys/net/ipv4/vs/expire_nodest_conn: no such file or directory"
time="2024-01-10T14:44:37.996556431-05:00" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_quiescent_template" error="open /proc/sys/net/ipv4/vs/expire_quiescent_template: no such file or directory"
time="2024-01-10T14:44:37.996572369-05:00" level=error msg="error reading the kernel parameter net.ipv4.vs.conn_reuse_mode" error="open /proc/sys/net/ipv4/vs/conn_reuse_mode: no such file or directory"
time="2024-01-10T14:44:37.996606629-05:00" level=error msg="error reading the kernel parameter net.ipv4.vs.conn_reuse_mode" error="open /proc/sys/net/ipv4/vs/conn_reuse_mode: no such file or directory"
time="2024-01-10T14:44:37.996625267-05:00" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn" error="open /proc/sys/net/ipv4/vs/expire_nodest_conn: no such file or directory"
time="2024-01-10T14:44:37.996635088-05:00" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_quiescent_template" error="open /proc/sys/net/ipv4/vs/expire_quiescent_template: no such file or directory"
time="2024-01-10T14:44:37.996649022-05:00" level=error msg="error reading the kernel parameter net.ipv4.vs.conn_reuse_mode" error="open /proc/sys/net/ipv4/vs/conn_reuse_mode: no such file or directory"
time="2024-01-10T14:44:37.996658106-05:00" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn" error="open /proc/sys/net/ipv4/vs/expire_nodest_conn: no such file or directory"
time="2024-01-10T14:44:37.996666375-05:00" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_quiescent_template" error="open /proc/sys/net/ipv4/vs/expire_quiescent_template: no such file or directory"
time="2024-01-10T14:44:38.099352015-05:00" level=info msg="Daemon has completed initialization"
time="2024-01-10T14:44:38.133064071-05:00" level=info msg="API listen on /var/run/docker.sock"

After deploying the stack:

# tail -f /var/log/docker.log
time="2024-01-10T14:51:44.100712879-05:00" level=warning msg="rmServiceBinding deleteServiceInfoFromCluster dev-stack_sonarserver 9e5342a3ba9dedc59564fe6c4acfe9fd600b76b4346339ac4dd52b05119638d2 aborted c.serviceBindings[skey] !ok"
time="2024-01-10T14:51:44.549370035-05:00" level=error msg="stream copy error: reading from a closed fifo"
time="2024-01-10T14:51:44.549371532-05:00" level=error msg="stream copy error: reading from a closed fifo"
time="2024-01-10T14:51:48.139228760-05:00" level=error msg="fatal task error" error="starting container failed: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: bpf_prog_query(BPF_CGROUP_DEVICE) failed: invalid argument: unknown" module=node/agent/taskmanager node.id=xyvuvw9yaohqzivpynjs2tmlz service.id=ig25xyd1z1ogq24298afbbc23 task.id=g00vg92lcer58uo3sxgqsqoyu
time="2024-01-10T14:51:48.151281568-05:00" level=warning msg="deleteServiceInfoFromCluster NetworkDB DeleteEntry failed for 0b93fd32a18062bbfdc34d055da0e549771bb13bcb4996023b626f913a6575d2 ud0rsrhzkttu6ybkzcazvvpdf err:cannot delete entry endpoint_table with network id ud0rsrhzkttu6ybkzcazvvpdf and key 0b93fd32a18062bbfdc34d055da0e549771bb13bcb4996023b626f913a6575d2 does not exist or is already being deleted"
time="2024-01-10T14:51:48.151302794-05:00" level=warning msg="rmServiceBinding deleteServiceInfoFromCluster dev-stack_proxy 0b93fd32a18062bbfdc34d055da0e549771bb13bcb4996023b626f913a6575d2 aborted c.serviceBindings[skey] !ok"
time="2024-01-10T14:51:48.949371663-05:00" level=error msg="stream copy error: reading from a closed fifo"
time="2024-01-10T14:51:48.949392985-05:00" level=error msg="stream copy error: reading from a closed fifo"
time="2024-01-10T14:51:49.170103540-05:00" level=warning msg="deleteServiceInfoFromCluster NetworkDB DeleteEntry failed for 6ff531725412b73152b1c6a89782ae23e0b764d788f84bac7720a146ab849d18 hzbseotexo3aqqybmnf5od166 err:cannot delete entry endpoint_table with network id hzbseotexo3aqqybmnf5od166 and key 6ff531725412b73152b1c6a89782ae23e0b764d788f84bac7720a146ab849d18 does not exist or is already being deleted"
time="2024-01-10T14:51:49.329864849-05:00" level=warning msg="rmServiceBinding deleteServiceInfoFromCluster dev-stack_proxy 6ff531725412b73152b1c6a89782ae23e0b764d788f84bac7720a146ab849d18 aborted c.serviceBindings[skey] !ok"
time="2024-01-10T14:51:50.092913189-05:00" level=warning msg="deleteServiceInfoFromCluster NetworkDB DeleteEntry failed for c301d500750ac99c87b8af14e747c2068036958d2ec0ffa1d704d5bacd5dbbc3 ud0rsrhzkttu6ybkzcazvvpdf err:cannot delete entry endpoint_table with network id ud0rsrhzkttu6ybkzcazvvpdf and key c301d500750ac99c87b8af14e747c2068036958d2ec0ffa1d704d5bacd5dbbc3 does not exist or is already being deleted"
time="2024-01-10T14:51:50.092938687-05:00" level=warning msg="rmServiceBinding deleteServiceInfoFromCluster dev-stack_pgres c301d500750ac99c87b8af14e747c2068036958d2ec0ffa1d704d5bacd5dbbc3 aborted c.serviceBindings[skey] !ok"
time="2024-01-10T14:51:50.138282992-05:00" level=error msg="stream copy error: reading from a closed fifo"
time="2024-01-10T14:51:50.138301612-05:00" level=error msg="stream copy error: reading from a closed fifo"
time="2024-01-10T14:51:50.772805418-05:00" level=error msg="fatal task error" error="starting container failed: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: bpf_prog_query(BPF_CGROUP_DEVICE) failed: invalid argument: unknown" module=node/agent/taskmanager node.id=xyvuvw9yaohqzivpynjs2tmlz service.id=yveys1hetnbk6b1ii40hzy85k task.id=yhvpr7yy73wo20vh07r71m6ii
time="2024-01-10T14:51:50.927126510-05:00" level=error msg="stream copy error: reading from a closed fifo"
time="2024-01-10T14:51:50.927132754-05:00" level=error msg="stream copy error: reading from a closed fifo"
# tail -f /var/log/containerd/containerd.log 
time="2024-01-10T14:53:42.610115580-05:00" level=error msg="copy shim log" error="read /proc/self/fd/13: file already closed" namespace=moby
time="2024-01-10T14:53:45.174931704-05:00" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2024-01-10T14:53:45.174968699-05:00" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2024-01-10T14:53:45.174976115-05:00" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-01-10T14:53:45.175021799-05:00" level=info msg="loading plugin \"io.containerd.ttrpc.v1.pause\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-01-10T14:53:45.224467680-05:00" level=info msg="shim disconnected" id=76a225f04f2777dbfe00e7e9b32d26e868fbc395fae3e8e4398047917842ab63 namespace=moby
time="2024-01-10T14:53:45.224506604-05:00" level=warning msg="cleaning up after shim disconnected" id=76a225f04f2777dbfe00e7e9b32d26e868fbc395fae3e8e4398047917842ab63 namespace=moby
time="2024-01-10T14:53:45.224517269-05:00" level=info msg="cleaning up dead shim" namespace=moby
time="2024-01-10T14:53:45.230727219-05:00" level=warning msg="cleanup warnings time=\"2024-01-10T14:53:45-05:00\" level=warning msg=\"failed to read init pid file\" error=\"open /run/containerd/io.containerd.runtime.v2.task/moby/76a225f04f2777dbfe00e7e9b32d26e868fbc395fae3e8e4398047917842ab63/init.pid: no such file or directory\" runtime=io.containerd.runc.v2\n" namespace=moby

Try to disconnent the network referenced in the docker logs above:

# docker network disconnect -f 0b93fd32a18062bbfdc34d055da0e549771bb13bcb4996023b626f913a6575d2 ud0rsrhzkttu6ybkzcazvvpdf
Error response from daemon: network 0b93fd32a18062bbfdc34d055da0e549771bb13bcb4996023b626f913a6575d2 not found

I think this might a problem with either the overlay or a corrupted network db, but I don’t know how to fix it. Any help would be appreciated!

Why do you mess with containerd? I have been using Dicker for 10 years and never touched it, usually Docker handles everything.

stack is usually used with Swarm, do you have a Swarm, how are the nodes doing?

Can you just delete and re-create the networks?

I’m not messing with containerd. It just included the logs.

I ending up deleting overlay, which probably worked. I ended up updating the kernel and docker/containerd to take advance of the downtime.

Everything appears to be running now.