Hi, I have been trying to get some software running that uses docker: a Nextflow pipeline to analyze PacBio HiFi full-length 16S data. Unfortunately it’s important that we get this running ASAP.
I have the software (Docker version 20.10.21, build baeda1f
) installed in rootless mode on our CentOS 8 server and compute nodes, and am just trying to complete the testing phase, at the paragraph starting with “To test the pipeline, run the example below”. You might see the issues I’m having starting on this comment.
What I’ve noticed is that after I execute the nextflow
command I am always getting “Cannot connect to the Docker daemon” error, and after that I can’t even docker run hello-world
without restarting the docker service (systemctl --user restart docker
), otherwise it throws the “Key not found in store” error mentioned in the subject.
Once the docker service is restarted, hello-world
runs fine, but still receive “Cannot connect to the Docker daemon” error when running the nextflow
command. Then hello-world
stops working until docker is restarted again. Any idea what keeps causing the demon to be unreachable and require constant restarts, and how to stop that?
Further, I’ve also learned that dockerd
is having problems on some of the nodes in our cluster. We have a head node + 4 compute nodes. I’m attempting to troubleshoot further by starting dockerd
in the foreground by first stopping the background service (systemctl --user stop docker
) and then running dockerd-rootless.sh
. This works fine on the head node and compute node n010, but for some reason dockerd-rootless.sh
is failing to start on the remaining compute nodes n011-n013, the last lines of the output shown below. This is especially puzzling to me because all 4 compute nodes boot from the same image, and so are essentially identical systems. Unfortunately, I couldn’t find anyone else out there with similar problems and a solution…
WARN[2022-12-12T17:18:10.897937785-07:00] could not use snapshotter devmapper in metadata plugin error="devmapper not configured"
INFO[2022-12-12T17:18:10.897945545-07:00] metadata content store policy set policy=shared
WARN[2022-12-12T17:18:11.856348550-07:00] grpc: addrConn.createTransport failed to connect to {unix:///run/user/10063/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///run/user/10063/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
WARN[2022-12-12T17:18:14.723191667-07:00] grpc: addrConn.createTransport failed to connect to {unix:///run/user/10063/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///run/user/10063/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
WARN[2022-12-12T17:18:19.111939509-07:00] grpc: addrConn.creteTransport failed to connect to {unix:///run/user/10063/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///run/user/10063/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
WARN[2022-12-12T17:18:20.898041903-07:00] waiting for response from boltdb open plugin=bolt
WARN[2022-12-12T17:18:24.874256415-07:00] grpc: addrConn.createTransport failed to connect to {unix:///run/user/10063/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///run/user/10063/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
failed to start containerd: timeout waiting for containerd to start
[rootlesskit:child ] error: command [/usr/bin/dockerd-rootless.sh] exited: exit status 1
[rootlesskit:parent] error: child exited: exit status 1