I have several stacks running on my CentOS server (although I’m doing everything with Swarm, it’s deployment on a single machine, it’s not a cluster). But I’m noticing that some days all the containers go down (as you can see in the image) and come back up, which is causing me problems with some processes being interrupted without apparent cause.
Considerations:
- These containers are different images with different behaviors. So it is not an error inside the image.
- None of the containers have error logs.
- I note that some containers I have raised with Docker Compose (
docker compose up -d
) did not shutdown, so I deduce that it is just a Docker Swarm problem. - Here are the logs I got with the command
journalctl -u docker.service
:
jun 19 03:52:03 myuser dockerd[27439]: time=“2023-06-19T03:52:02.867842622-03:00” level=error msg=“heartbeat to manager { } failed” error=“rpc error: code = DeadlineExceeded desc = context deadline exceeded” method=“(*session).heartbeat” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv session.id=notogbay6a58h88xhkt6ddvl8 sessionID=notogbay6a58h88xhkt6ddvl8
jun 19 03:52:14 myuser dockerd[27439]: time=“2023-06-19T03:52:14.307251296-03:00” level=error msg=“agent: session failed” backoff=100ms error=“rpc error: code = DeadlineExceeded desc = context deadline exceeded” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv
jun 19 03:52:14 myuser dockerd[27439]: time=“2023-06-19T03:52:14.319406267-03:00” level=info msg=“manager selected by agent for new session: { }” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv
jun 19 03:52:14 myuser dockerd[27439]: time=“2023-06-19T03:52:14.339781506-03:00” level=info msg=“waiting 56.40044ms before registering session” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv
jun 19 03:52:16 myuser dockerd[27439]: time=“2023-06-19T03:52:16.376226102-03:00” level=info msg=“worker mftlnmohlt2asl8f982id3vyv was successfully registered” method=“(*Dispatcher).register”
jun 19 03:52:33 myuser dockerd[27439]: time=“2023-06-19T03:52:32.165209518-03:00” level=error msg=“heartbeat to manager { } failed” error=“rpc error: code = DeadlineExceeded desc = context deadline exceeded” method=“(*session).heartbeat” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv session.id=q7zz8xladr450zu54jnef8pfv sessionID=q7zz8xladr450zu54jnef8pfv
jun 19 03:52:33 myuser dockerd[27439]: time=“2023-06-19T03:52:32.622792956-03:00” level=error msg=“agent: session failed” backoff=100ms error=“rpc error: code = DeadlineExceeded desc = context deadline exceeded” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv
jun 19 03:52:39 myuser dockerd[27439]: time=“2023-06-19T03:52:37.141884729-03:00” level=info msg=“manager selected by agent for new session: { }” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv
jun 19 03:53:08 myuser dockerd[27439]: time=“2023-06-19T03:52:50.646468486-03:00” level=info msg=“waiting 91.824135ms before registering session” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv
jun 19 03:53:25 myuser dockerd[27439]: time=“2023-06-19T03:53:13.968378401-03:00” level=error msg=“failed deregistering node after heartbeat expiration” error=“node mftlnmohlt2asl8f982id3vyv is not found in local storage”
jun 19 03:53:29 myuser dockerd[27439]: time=“2023-06-19T03:53:26.702734867-03:00” level=error msg=“agent: session failed” backoff=300ms error=“session initiation timed out” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv
jun 19 03:53:52 myuser dockerd[27439]: time=“2023-06-19T03:53:44.234405701-03:00” level=info msg=“manager selected by agent for new session: { }” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv
jun 19 03:53:53 myuser dockerd[27439]: time=“2023-06-19T03:53:52.716848728-03:00” level=info msg=“waiting 125.136975ms before registering session” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv
jun 19 03:53:56 myuser dockerd[27439]: time=“2023-06-19T03:53:55.505274123-03:00” level=error msg=“Attempting to transfer leadership” raft_id=7ed6a6a73ad485b0
jun 19 03:53:59 myuser dockerd[27439]: time=“2023-06-19T03:53:59.104766225-03:00” level=error msg=“agent: session failed” backoff=700ms error=“session initiation timed out” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv
jun 19 03:53:59 myuser dockerd[27439]: time=“2023-06-19T03:53:59.105003983-03:00” level=info msg=“manager selected by agent for new session: { }” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv
jun 19 03:53:59 myuser dockerd[27439]: time=“2023-06-19T03:53:59.105058043-03:00” level=info msg=“waiting 151.692502ms before registering session” module=node/agent node.id=mftlnmohlt2asl8f982id3vyv
If someone could shed some light on the issue I would appreciate it.