Docker kills all processes after 5 min and then restarts again automatically

Docker version 27.1.1, build 6312585

For many years, I have been running two websites without any problems, using several Docker containers on a virtual server that was once set up with CoreOS 8. And I never encountered a situation which I did not understand.

Until now. Since the last week, I have been struggling with phenomena that I can neither understand nor get under control.

Prerequisite

For some reason, my domains did not show as usual and produced an error. So I restarted the coreOS machine. But my automatic process to start the containers failed this time. I hadn’t changed anything on the machine, so this was unexpected and I had no clue.

I therefore suspended the automatic process to be able to investigate the phenomenon. I ran into several incomprehensible issues so I suspected that coreOS had launched some update which caused all the trouble.

As coreOS is outdated, I ordered a new virtual server with ubuntu 24.04, took a backup locally of the coreOS machine, and made an identical copy of my data on the new server from the local copy. Next I changed the IP addresses on my nameserver and expected everything to run as before.

Installation on ubuntu

Unfortunately, this did not end my troubles. I even sacrificed the old coreOS and installed ubuntu and the data on this old server as well. At least I expected both machines to behave identical, but they do not.

I made a lot of tests and searched the whole internet and even was by chance once successful on the new server for a whole day, when everything looked fine and worked on both domains, but after augmenting my installation with respect to letsencypt I ran into the same troubles as before. After nearly 2 weeks of testing and experimenting I am desperate.

Setup

I have a stack of 4 containers and one container acting as a proxy to the stack.

I have 2 different phenomena which can be reproduced consistently:

  • On the old machine, I can start the stack and it will run indefinitely, but the remote console shows out of memory errors, and when I start the proxy the OOM errors will kill the server.

  • On the new machine I can start both the stack and the proxy. They will run for 5 minutes, then get killed by docker. After 30 seconds, they will be restarted, and the whole process repeats indefinitely. I ran journalctl -u docker but could not get any insight other than the repetitive process.

journalctl -u docker

This is the result on the new machine, spanning a full cycle:


Jul 28 19:10:15 ubuntu systemd[1]: Started docker.service - Docker Application Container Engine.
Jul 28 19:10:15 ubuntu dockerd[1367173]: time="2024-07-28T19:10:15.735474701Z" level=error msg="fatal task error" error="No such container: wp_adm.1.igg4amyf0nvf6kopcqktrk9w5" module=node/agent/taskmanager>
Jul 28 19:10:15 ubuntu dockerd[1367173]: time="2024-07-28T19:10:15.735869241Z" level=error msg="fatal task error" error="No such container: wp_master.1.ewjdsws7afsrebnow2gwhop60" module=node/agent/taskmana>
Jul 28 19:10:15 ubuntu dockerd[1367173]: time="2024-07-28T19:10:15.736189431Z" level=error msg="fatal task error" error="No such container: wp_wp.1.ol73v2y126q1i6rob0cpsaxrt" module=node/agent/taskmanager >
Jul 28 19:10:15 ubuntu dockerd[1367173]: time="2024-07-28T19:10:15.740042026Z" level=error msg="fatal task error" error="No such container: wp_joe.1.shg71id0y3g6t9etodh4823ui" module=node/agent/taskmanager>
Jul 28 19:10:15 ubuntu dockerd[1367173]: time="2024-07-28T19:10:15.824136481Z" level=error msg="Handler for POST /v1.46/swarm/init returned error: This node is already part of a swarm. Use \"docker swarm l>
Jul 28 19:10:16 ubuntu dockerd[1367173]: time="2024-07-28T19:10:16.313827205Z" level=info msg="attempted to update status for a task that has been removed" module=node/agent/taskmanager node.id=oz91fbyuvzi>
Jul 28 19:10:16 ubuntu dockerd[1367173]: time="2024-07-28T19:10:16.721743716Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:16 ubuntu dockerd[1367173]: time="2024-07-28T19:10:16.728146283Z" level=info msg="initialized VXLAN UDP port to 4789 " module=node node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:10:16 ubuntu dockerd[1367173]: time="2024-07-28T19:10:16.819269139Z" level=warning msg="failed to deactivate service binding for container wp_wp.1.kwmdqnjjpqqgmp9x8b6v98drk" error="No such contai>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.001076595Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.001161865Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.001223090Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.001283623Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.001342243Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.003257574Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.003307919Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.003349166Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.003386276Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.199365944Z" level=warning msg="failed to deactivate service binding for container wp_wp.1.ol73v2y126q1i6rob0cpsaxrt" error="No such contai>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.199452476Z" level=warning msg="failed to deactivate service binding for container wp_adm.1.igg4amyf0nvf6kopcqktrk9w5" error="No such conta>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.199482342Z" level=warning msg="failed to deactivate service binding for container wp_master.1.ewjdsws7afsrebnow2gwhop60" error="No such co>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.199507078Z" level=warning msg="failed to deactivate service binding for container wp_joe.1.shg71id0y3g6t9etodh4823ui" error="No such conta>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.206705127Z" level=error msg="fatal task error" error="failed to find a load balancer IP to use for network: 38hfvhjsk2ihdh1axvd2j2y47" mod>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.300908123Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.599625491Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.700934939Z" level=warning msg="failed to deactivate service binding for container wp_adm.1.62pesi1txwhlm6prhu5nxjtnf" error="No such conta>
Jul 28 19:10:19 ubuntu dockerd[1367173]: time="2024-07-28T19:10:19.002571262Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint vbe8ctfi4yi871tu56mkhz1>
Jul 28 19:10:19 ubuntu dockerd[1367173]: time="2024-07-28T19:10:19.019551776Z" level=info msg="initialized VXLAN UDP port to 4789 " module=node node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:10:19 ubuntu dockerd[1367173]: time="2024-07-28T19:10:19.401921389Z" level=warning msg="failed to deactivate service binding for container wp_wp.1.kwmdqnjjpqqgmp9x8b6v98drk" error="No such contai>
Jul 28 19:10:41 ubuntu dockerd[1367173]: time="2024-07-28T19:10:41.817556797Z" level=error msg="Handler for POST /v1.46/swarm/init returned error: This node is already part of a swarm. Use \"docker swarm l>
Jul 28 19:10:43 ubuntu dockerd[1367173]: time="2024-07-28T19:10:43.470450185Z" level=info msg="initialized VXLAN UDP port to 4789 " module=node node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:11:12 ubuntu dockerd[1367173]: time="2024-07-28T19:11:12.105856992Z" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers"
Jul 28 19:11:12 ubuntu dockerd[1367173]: time="2024-07-28T19:11:12.819739476Z" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers"
Jul 28 19:15:01 ubuntu systemd[1]: Stopping docker.service - Docker Application Container Engine...
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.448564460Z" level=info msg="Processing signal 'terminated'"
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.514703099Z" level=info msg="attempted to update status for a task that has been removed" module=node/agent/taskmanager node.id=oz91fbyuvzi>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.533645023Z" level=info msg="attempted to update status for a task that has been removed" module=node/agent/taskmanager node.id=oz91fbyuvzi>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.535576849Z" level=info msg="attempted to update status for a task that has been removed" module=node/agent/taskmanager node.id=oz91fbyuvzi>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.537536939Z" level=info msg="attempted to update status for a task that has been removed" module=node/agent/taskmanager node.id=oz91fbyuvzi>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.600712064Z" level=info msg="attempted to update status for a task that has been removed" module=node/agent/taskmanager node.id=oz91fbyuvzi>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.601773116Z" level=info msg="Stopping manager" module=node node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.605042112Z" level=info msg="dispatcher stopping" method="(*Dispatcher).Stop" module=dispatcher node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.606187872Z" level=info msg="dispatcher session dropped, marking node oz91fbyuvziw7719aua9rqaju down" method="(*Dispatcher).Session" node.i>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.606213390Z" level=error msg="failed to remove node" error="rpc error: code = Aborted desc = dispatcher is stopped" method="(*Dispatcher).S>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.607056804Z" level=info msg="shutting down certificate renewal routine" module=node/tls node.id=oz91fbyuvziw7719aua9rqaju node.role=swarm-m>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.617981460Z" level=info msg="Manager shut down" module=node node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.622599648Z" level=info msg="Node 84c124bb4b87/213.165.82.33, left gossip cluster"
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.622647789Z" level=info msg="Node 84c124bb4b87 change state NodeActive --> NodeFailed"
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.623240241Z" level=info msg="Node 84c124bb4b87/213.165.82.33, added to failed nodes list"
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.623692199Z" level=warning msg="rmServiceBinding ae737dab3e03abb8b4247f0da2956947d569e95a885f9f3eb16d6347df41840e possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.718377515Z" level=warning msg="rmServiceBinding 2a9f172a1ff27ed9356b380e92a1bbf8e8e5d1130f34b8c79b6f4cb449973ac3 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.722648522Z" level=warning msg="rmServiceBinding fe380c5c516f7e4834e5e8a1f2ced55af1952300735622193acdf6424d85f1a6 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.726458973Z" level=warning msg="rmServiceBinding b40e5887461cde133867cc6fea1f18498755b03a7a275c3e5ae12e4fb722ca92 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.730238267Z" level=warning msg="rmServiceBinding b6c8762498653f23cadddb5631dadb508b07be4ba821b11d021c4102d2dca2a2 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.733946267Z" level=warning msg="rmServiceBinding 377f042202ba6071c6dd5be4b94e5a34f37d3026e1c7c988119f525141d86e29 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.737409737Z" level=warning msg="rmServiceBinding c8fe12f8e6d550dfb1d4556356f0e6a02096c89ade91c42d88cf0bfc2343a889 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.741122405Z" level=warning msg="rmServiceBinding bf69d01da1147444d086acec1d98bcd441fa5ad2c31a3bf2c3bde08e20e202b9 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.745041452Z" level=warning msg="rmServiceBinding b6bfa19c4a39d62b00054e279e86194af17253b3b00c6d2c999cee2d03353a4e possible transient state >
Jul 28 19:15:02 ubuntu dockerd[1367173]: time="2024-07-28T19:15:02.026480554Z" level=info msg="ignoring event" container=bbe1f427af40167531bee07e72e8589fa3b86c344be3efe378f2ac256a90296b module=libcontainer>
Jul 28 19:15:02 ubuntu dockerd[1367173]: time="2024-07-28T19:15:02.033467569Z" level=info msg="ignoring event" container=b394ca1075ccf0143c3b2216a0a82a902106bf3bd477b4855590430b78d3808e module=libcontainer>
Jul 28 19:15:02 ubuntu dockerd[1367173]: time="2024-07-28T19:15:02.100265712Z" level=info msg="ignoring event" container=670abb9642e7bba6e80be6999c716cc7a00cade6d0767cea15ca0d5a6ffe5f26 module=libcontainer>
Jul 28 19:15:02 ubuntu dockerd[1367173]: time="2024-07-28T19:15:02.116164964Z" level=info msg="ignoring event" container=b9ac1ee24e90fbde8a9deb9b774f2b7b6a69810b8506d77211c866a0d8b10006 module=libcontainer>
Jul 28 19:15:02 ubuntu dockerd[1367173]: time="2024-07-28T19:15:02.729462473Z" level=warning msg="error detaching from network" error="could not find network attachment for container b394ca1075ccf0143c3b22>
Jul 28 19:15:11 ubuntu dockerd[1367173]: time="2024-07-28T19:15:11.823001943Z" level=info msg="Container failed to exit within 10s of signal 15 - using the force" container=9ae63edaf42a2b4c1c1bb8dbfa936d04>
Jul 28 19:15:11 ubuntu dockerd[1367173]: time="2024-07-28T19:15:11.863829765Z" level=info msg="ignoring event" container=9ae63edaf42a2b4c1c1bb8dbfa936d042e569d7cde8dfd31852d4cd809db5d01 module=libcontainer>
Jul 28 19:15:12 ubuntu dockerd[1367173]: time="2024-07-28T19:15:12.099509982Z" level=warning msg="Failed to disconnect container lb-proxy from swarm network proxy on cluster leave: endpoint lb-proxy not fo>
Jul 28 19:15:12 ubuntu dockerd[1367173]: time="2024-07-28T19:15:12.165816467Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint wncaosrlxn8eo5zd8tbwpz9>
Jul 28 19:15:12 ubuntu dockerd[1367173]: time="2024-07-28T19:15:12.225978428Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint vbe8ctfi4yi871tu56mkhz1>
Jul 28 19:15:12 ubuntu dockerd[1367173]: time="2024-07-28T19:15:12.230436295Z" level=error msg="network proxy remove failed: error while removing network: unknown network proxy id vbe8ctfi4yi871tu56mkhz1ag>
Jul 28 19:15:12 ubuntu dockerd[1367173]: time="2024-07-28T19:15:12.500139493Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint shrw6q9q1ojo3tne5l5fg5p>
Jul 28 19:15:12 ubuntu dockerd[1367173]: time="2024-07-28T19:15:12.510307379Z" level=info msg="Daemon shutdown complete"
Jul 28 19:15:12 ubuntu systemd[1]: docker.service: Deactivated successfully.
Jul 28 19:15:12 ubuntu systemd[1]: Stopped docker.service - Docker Application Container Engine.
Jul 28 19:15:12 ubuntu systemd[1]: docker.service: Consumed 6.480s CPU time.
Jul 28 19:15:12 ubuntu systemd[1]: Starting docker.service - Docker Application Container Engine...
Jul 28 19:15:12 ubuntu dockerd[1368815]: time="2024-07-28T19:15:12.838574028Z" level=info msg="Starting up"
Jul 28 19:15:12 ubuntu dockerd[1368815]: time="2024-07-28T19:15:12.840017717Z" level=info msg="detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.c>
Jul 28 19:15:12 ubuntu dockerd[1368815]: time="2024-07-28T19:15:12.924013111Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.003910585Z" level=info msg="Loading containers: start."
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.005994947Z" level=error msg="failed to load container mount" container=09df3934f55dffd47cabf57dce24e6d989e6da61a5bdbb0809b95e5c8ff641ae er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.006903794Z" level=error msg="failed to load container mount" container=15451ca598b6ff8155741621bbfa0d120381a309ae62417f16372a355d7afd93 er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.007284658Z" level=error msg="failed to load container mount" container=09e2f638323da1a521aa5c2567662ba56aaf3226f2356833296eec33d5923af2 er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.008516721Z" level=error msg="failed to load container" container=873a35b16d35304fa676a1abd7775be5f5281b0b61c36c1eb2c72a625eee5efd error="o>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.009547805Z" level=error msg="failed to load container mount" container=1c2f8accd0e7e2e4bf0bf8d5dcd38e4d9304872e7e99b2f5a79561a644820abb er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.009725329Z" level=error msg="failed to load container mount" container=07d83583e9174e170839bd6f10da6708e0a9dd815460f738ef5d4ee1ff20f19c er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.010046802Z" level=error msg="failed to load container mount" container=e57883634adfc173eaedfb5068cd4d23e3eb483550def84953cfc78d3edfb31d er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.010497609Z" level=error msg="failed to load container mount" container=7bcdcb8ecdd47830be88f3eeec8fd983385092c502878f19d732ea5da6ae13da er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.010608707Z" level=error msg="failed to load container mount" container=caa70a7fd2a2b2f515f1889a3399f9f256702273c52ac16b97b5fc6a5684f8bc er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.010711500Z" level=error msg="failed to load container mount" container=b5a9054dd8054da925f37567519795f2d5f864bb1aa0ec78129e63ecafeb087b er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.010889263Z" level=error msg="failed to load container mount" container=4681cb173b6703a86469add54734916a292eebf6a4a1c5a87b6fa0505ccf9015 er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.522375394Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set >
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.632546614Z" level=info msg="Loading containers: done."
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.700270126Z" level=info msg="Docker daemon" commit=cc13f95 containerd-snapshotter=false storage-driver=overlay2 version=27.1.1
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.720513822Z" level=info msg="Listening for connections" addr="[::]:2377" module=node node.id=oz91fbyuvziw7719aua9rqaju proto=tcp
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.720975830Z" level=info msg="Listening for local connections" addr=/var/run/docker/swarm/control.sock module=node node.id=oz91fbyuvziw7719a>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.843644239Z" level=info msg="manager selected by agent for new session: {oz91fbyuvziw7719aua9rqaju 213.165.82.33:2377}" module=node/agent n>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.845365540Z" level=info msg="waiting 0s before registering session" module=node/agent node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.117594296Z" level=info msg="19f52fc2024a2dc2 switched to configuration voters=(1870453730550885826)" module=raft node.id=oz91fbyuvziw7719a>
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.117924938Z" level=info msg="19f52fc2024a2dc2 became follower at term 256" module=raft node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.118048569Z" level=info msg="newRaft 19f52fc2024a2dc2 [peers: [19f52fc2024a2dc2], term: 256, commit: 18998, applied: 10000, lastindex: 1899>
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.122044640Z" level=info msg="19f52fc2024a2dc2 is starting a new election at term 256" module=raft node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.122138396Z" level=info msg="19f52fc2024a2dc2 became candidate at term 257" module=raft node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.122175826Z" level=info msg="19f52fc2024a2dc2 received MsgVoteResp from 19f52fc2024a2dc2 at term 257" module=raft node.id=oz91fbyuvziw7719a>
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.122188600Z" level=info msg="19f52fc2024a2dc2 became leader at term 257" module=raft node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.122196976Z" level=info msg="raft.node: 19f52fc2024a2dc2 elected leader 19f52fc2024a2dc2 at term 257" module=raft node.id=oz91fbyuvziw7719a>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.109388526Z" level=error msg="agent: session failed" backoff=100ms error="rpc error: code = Aborted desc = dispatcher is stopped" module=no>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.112447196Z" level=info msg="manager selected by agent for new session: {oz91fbyuvziw7719aua9rqaju 213.165.82.33:2377}" module=node/agent n>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.112574646Z" level=info msg="waiting 55.46744ms before registering session" module=node/agent node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.710195814Z" level=error msg="error creating cluster object" error="name conflicts with an existing object" module=node node.id=oz91fbyuvzi>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.710555850Z" level=info msg="leadership changed from not yet part of a raft cluster to oz91fbyuvziw7719aua9rqaju" module=node node.id=oz91f>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.710608038Z" level=info msg="dispatcher starting" module=dispatcher node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.813408077Z" level=info msg="worker oz91fbyuvziw7719aua9rqaju was successfully registered" method="(*Dispatcher).register"
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.815225528Z" level=info msg="initialized VXLAN UDP port to 4789 " module=node node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.815269440Z" level=info msg="Initializing Libnetwork Agent" advertise-addr=213.165.82.33 data-path-addr= listen-addr=0.0.0.0 local-addr=213>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.815314845Z" level=info msg="New memberlist node - Node:ubuntu will use memberlist nodeID:90829a6424b2 with config:&{NodeID:90829a6424b2 Ho>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.815566558Z" level=info msg="Node 90829a6424b2/213.165.82.33, joined gossip cluster"
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.815977018Z" level=info msg="Daemon has completed initialization"
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.833090409Z" level=info msg="Node 90829a6424b2/213.165.82.33, added to nodes list"
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.905438993Z" level=info msg="API listen on /run/docker.sock"
Jul 28 19:15:15 ubuntu systemd[1]: Started docker.service - Docker Application Container Engine.

I searched for single error messages but found no clue as to which was wrong, most messages confused me.

As I understand it docker tries to pull the images which are already there. But I cannot see why the containers are killed (and why restarted automatically).

Edit

The containers exit with 0:

Sun Jul 28 22:56 root@VPS-X ~$ docker ps -a
CONTAINER ID   IMAGE                                              COMMAND                  CREATED         STATUS                      PORTS     NAMES
4dd3ae4a47ae   kklepper/nginx-php7-mysqli-graphicsmagick:alpine   "/bin/sh -c 'php-fpm…"   4 minutes ago   Exited (0) 13 seconds ago             1proxy-nginx-1

Well-known test apps

docker run -d --name loop-demo alpine sh -c "while true; do sleep 1; done"
docker run -d --name sleep-demo alpine sleep infinity
docker run -d --name tail-demo alpine tail -f /dev/null
docker run -dt --name tty-demo alpine

get killed as well with exit code 137, but not restarted:

Sun Jul 28 22:40 root@VPS-X ~$ docker ps -a
CONTAINER ID   IMAGE                                              COMMAND                  CREATED          STATUS                            PORTS                                                                      NAMES
d0b7c153c59c   kklepper/nginx-php7-mysqli-graphicsmagick:alpine   "/bin/sh -c 'php-fpm…"   12 seconds ago   Up 11 seconds                     0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   1proxy-nginx-1
941818f99f4f   kklepper/mariadb33:alpine                          "/start-v3.sh"           32 seconds ago   Up 31 seconds                     3306/tcp                                                                   wp_master.1.oglrvm0juf0wgkdhwwchppmi9
04c5001624f8   kklepper/nginx-php7-mysqli-memcached:alpine        "/bin/sh -c 'php-fpm…"   35 seconds ago   Up 35 seconds                     80/tcp, 443/tcp                                                            wp_wp.1.pw86bnqpmlc6fwwcxubznce8k
393969e34944   kklepper/nginx-php7-mysqli-graphicsmagick:alpine   "/bin/sh -c 'php-fpm…"   38 seconds ago   Up 37 seconds                     80/tcp, 443/tcp                                                            wp_joe.1.56fcmuid1u00h0dcmh9e8hbqp
46d306d5a9c1   adminer:latest                                     "entrypoint.sh php -…"   40 seconds ago   Up 39 seconds                     8080/tcp                                                                   wp_adm.1.65m8hp8ohc97fuhnbracbaxp3
611def38be7a   alpine                                             "/bin/sh"                5 minutes ago    Exited (137) About a minute ago                                                                              tty-demo
492fbea9a9dd   alpine                                             "tail -f /dev/null"      5 minutes ago    Exited (137) About a minute ago                                                                              tail-demo
7d0d8ff2008b   alpine                                             "sleep infinity"         5 minutes ago    Exited (137) About a minute ago                                                                              sleep-demo
3f804ec52991   alpine                                             "sh -c 'while true; …"   5 minutes ago    Exited (137) About a minute ago                                                                   

and later removed:

Sun Jul 28 22:41 root@VPS-X ~$ docker ps -a
CONTAINER ID   IMAGE                                              COMMAND                  CREATED          STATUS          PORTS                                                                      NAMES
cd83ed440068   kklepper/nginx-php7-mysqli-graphicsmagick:alpine   "/bin/sh -c 'php-fpm…"   3 seconds ago    Up 2 seconds    0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   1proxy-nginx-1
f7fee2ca92c1   adminer:latest                                     "entrypoint.sh php -…"   22 seconds ago   Up 21 seconds   8080/tcp                                                                   wp_adm.1.xu3yffjhxeoo0rx9gaybdljui
7fe8ecd73e32   kklepper/mariadb33:alpine                          "/start-v3.sh"           26 seconds ago   Up 25 seconds   3306/tcp                                                                   wp_master.1.idtekxjaz7pxemolr5ifif7oj
ae147633e078   kklepper/nginx-php7-mysqli-memcached:alpine        "/bin/sh -c 'php-fpm…"   28 seconds ago   Up 27 seconds   80/tcp, 443/tcp                                                            wp_wp.1.wz4iof40if1rsnikvza48v6d5
78891f9b9f85   kklepper/nginx-php7-mysqli-graphicsmagick:alpine   "/bin/sh -c 'php-fpm…"   30 seconds ago   Up 29 seconds   80/tcp, 443/tcp                                                            wp_joe.1.cjgg12e5yf5cd4exu8cifknfz

Any ideas or insights?

Now my questions are:

  • did anybody ever experience this kind of behavior
  • what am I doing wrong
  • what can I learn from this setup
  • how can I further investigate this scenario
  • and how can I make the whole thing run as reliably as before
  • and lastly, how could this happen in the first place?

I put a lot of effort into solving the problem and finally managed it: it was simply and solely my fault, and a very stupid one at that.

I should have taken the regular execution every 5 minutes as hint to look at my cronjob right away. How come?

On this machine, I had increasing problems with hard disk memory shortages and the machine became increasingly cluttered. I diagnosed docker to be the cause, so I took several measures to reclaim disk space.

As a result of these measures, I deleted the containers myself every 5 minutes. Bingo! Congratulations!

However, by reinstalling I have gained a lot of free space, so this problem should not occur again in the future.

Many thanks to everyone who has tried to solve my problem. I take this story as a lesson to look at the right place.

1 Like

Thank you for sharing your insights! I hope it will help others in the same situation.