Docker version 27.1.1, build 6312585
For many years, I have been running two websites without any problems, using several Docker containers on a virtual server that was once set up with CoreOS 8
. And I never encountered a situation which I did not understand.
Until now. Since the last week, I have been struggling with phenomena that I can neither understand nor get under control.
Prerequisite
For some reason, my domains did not show as usual and produced an error. So I restarted the coreOS machine. But my automatic process to start the containers failed this time. I hadn’t changed anything on the machine, so this was unexpected and I had no clue.
I therefore suspended the automatic process to be able to investigate the phenomenon. I ran into several incomprehensible issues so I suspected that coreOS had launched some update which caused all the trouble.
As coreOS is outdated, I ordered a new virtual server with ubuntu 24.04
, took a backup locally of the coreOS machine, and made an identical copy of my data on the new server from the local copy. Next I changed the IP addresses on my nameserver and expected everything to run as before.
Installation on ubuntu
Unfortunately, this did not end my troubles. I even sacrificed the old coreOS and installed ubuntu and the data on this old server as well. At least I expected both machines to behave identical, but they do not.
I made a lot of tests and searched the whole internet and even was by chance once successful on the new server for a whole day, when everything looked fine and worked on both domains, but after augmenting my installation with respect to letsencypt I ran into the same troubles as before. After nearly 2 weeks of testing and experimenting I am desperate.
Setup
I have a stack of 4 containers and one container acting as a proxy to the stack.
I have 2 different phenomena which can be reproduced consistently:
-
On the old machine, I can start the stack and it will run indefinitely, but the remote console shows out of memory errors, and when I start the proxy the OOM errors will kill the server.
-
On the new machine I can start both the stack and the proxy. They will run for 5 minutes, then get killed by docker. After 30 seconds, they will be restarted, and the whole process repeats indefinitely. I ran
journalctl -u docker
but could not get any insight other than the repetitive process.
journalctl -u docker
This is the result on the new machine, spanning a full cycle:
Jul 28 19:10:15 ubuntu systemd[1]: Started docker.service - Docker Application Container Engine.
Jul 28 19:10:15 ubuntu dockerd[1367173]: time="2024-07-28T19:10:15.735474701Z" level=error msg="fatal task error" error="No such container: wp_adm.1.igg4amyf0nvf6kopcqktrk9w5" module=node/agent/taskmanager>
Jul 28 19:10:15 ubuntu dockerd[1367173]: time="2024-07-28T19:10:15.735869241Z" level=error msg="fatal task error" error="No such container: wp_master.1.ewjdsws7afsrebnow2gwhop60" module=node/agent/taskmana>
Jul 28 19:10:15 ubuntu dockerd[1367173]: time="2024-07-28T19:10:15.736189431Z" level=error msg="fatal task error" error="No such container: wp_wp.1.ol73v2y126q1i6rob0cpsaxrt" module=node/agent/taskmanager >
Jul 28 19:10:15 ubuntu dockerd[1367173]: time="2024-07-28T19:10:15.740042026Z" level=error msg="fatal task error" error="No such container: wp_joe.1.shg71id0y3g6t9etodh4823ui" module=node/agent/taskmanager>
Jul 28 19:10:15 ubuntu dockerd[1367173]: time="2024-07-28T19:10:15.824136481Z" level=error msg="Handler for POST /v1.46/swarm/init returned error: This node is already part of a swarm. Use \"docker swarm l>
Jul 28 19:10:16 ubuntu dockerd[1367173]: time="2024-07-28T19:10:16.313827205Z" level=info msg="attempted to update status for a task that has been removed" module=node/agent/taskmanager node.id=oz91fbyuvzi>
Jul 28 19:10:16 ubuntu dockerd[1367173]: time="2024-07-28T19:10:16.721743716Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:16 ubuntu dockerd[1367173]: time="2024-07-28T19:10:16.728146283Z" level=info msg="initialized VXLAN UDP port to 4789 " module=node node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:10:16 ubuntu dockerd[1367173]: time="2024-07-28T19:10:16.819269139Z" level=warning msg="failed to deactivate service binding for container wp_wp.1.kwmdqnjjpqqgmp9x8b6v98drk" error="No such contai>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.001076595Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.001161865Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.001223090Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.001283623Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.001342243Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.003257574Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.003307919Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.003349166Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.003386276Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.199365944Z" level=warning msg="failed to deactivate service binding for container wp_wp.1.ol73v2y126q1i6rob0cpsaxrt" error="No such contai>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.199452476Z" level=warning msg="failed to deactivate service binding for container wp_adm.1.igg4amyf0nvf6kopcqktrk9w5" error="No such conta>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.199482342Z" level=warning msg="failed to deactivate service binding for container wp_master.1.ewjdsws7afsrebnow2gwhop60" error="No such co>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.199507078Z" level=warning msg="failed to deactivate service binding for container wp_joe.1.shg71id0y3g6t9etodh4823ui" error="No such conta>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.206705127Z" level=error msg="fatal task error" error="failed to find a load balancer IP to use for network: 38hfvhjsk2ihdh1axvd2j2y47" mod>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.300908123Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.599625491Z" level=error msg="Failed to allocate network resources for node oz91fbyuvziw7719aua9rqaju" error="could not find network alloca>
Jul 28 19:10:17 ubuntu dockerd[1367173]: time="2024-07-28T19:10:17.700934939Z" level=warning msg="failed to deactivate service binding for container wp_adm.1.62pesi1txwhlm6prhu5nxjtnf" error="No such conta>
Jul 28 19:10:19 ubuntu dockerd[1367173]: time="2024-07-28T19:10:19.002571262Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint vbe8ctfi4yi871tu56mkhz1>
Jul 28 19:10:19 ubuntu dockerd[1367173]: time="2024-07-28T19:10:19.019551776Z" level=info msg="initialized VXLAN UDP port to 4789 " module=node node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:10:19 ubuntu dockerd[1367173]: time="2024-07-28T19:10:19.401921389Z" level=warning msg="failed to deactivate service binding for container wp_wp.1.kwmdqnjjpqqgmp9x8b6v98drk" error="No such contai>
Jul 28 19:10:41 ubuntu dockerd[1367173]: time="2024-07-28T19:10:41.817556797Z" level=error msg="Handler for POST /v1.46/swarm/init returned error: This node is already part of a swarm. Use \"docker swarm l>
Jul 28 19:10:43 ubuntu dockerd[1367173]: time="2024-07-28T19:10:43.470450185Z" level=info msg="initialized VXLAN UDP port to 4789 " module=node node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:11:12 ubuntu dockerd[1367173]: time="2024-07-28T19:11:12.105856992Z" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers"
Jul 28 19:11:12 ubuntu dockerd[1367173]: time="2024-07-28T19:11:12.819739476Z" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers"
Jul 28 19:15:01 ubuntu systemd[1]: Stopping docker.service - Docker Application Container Engine...
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.448564460Z" level=info msg="Processing signal 'terminated'"
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.514703099Z" level=info msg="attempted to update status for a task that has been removed" module=node/agent/taskmanager node.id=oz91fbyuvzi>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.533645023Z" level=info msg="attempted to update status for a task that has been removed" module=node/agent/taskmanager node.id=oz91fbyuvzi>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.535576849Z" level=info msg="attempted to update status for a task that has been removed" module=node/agent/taskmanager node.id=oz91fbyuvzi>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.537536939Z" level=info msg="attempted to update status for a task that has been removed" module=node/agent/taskmanager node.id=oz91fbyuvzi>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.600712064Z" level=info msg="attempted to update status for a task that has been removed" module=node/agent/taskmanager node.id=oz91fbyuvzi>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.601773116Z" level=info msg="Stopping manager" module=node node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.605042112Z" level=info msg="dispatcher stopping" method="(*Dispatcher).Stop" module=dispatcher node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.606187872Z" level=info msg="dispatcher session dropped, marking node oz91fbyuvziw7719aua9rqaju down" method="(*Dispatcher).Session" node.i>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.606213390Z" level=error msg="failed to remove node" error="rpc error: code = Aborted desc = dispatcher is stopped" method="(*Dispatcher).S>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.607056804Z" level=info msg="shutting down certificate renewal routine" module=node/tls node.id=oz91fbyuvziw7719aua9rqaju node.role=swarm-m>
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.617981460Z" level=info msg="Manager shut down" module=node node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.622599648Z" level=info msg="Node 84c124bb4b87/213.165.82.33, left gossip cluster"
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.622647789Z" level=info msg="Node 84c124bb4b87 change state NodeActive --> NodeFailed"
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.623240241Z" level=info msg="Node 84c124bb4b87/213.165.82.33, added to failed nodes list"
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.623692199Z" level=warning msg="rmServiceBinding ae737dab3e03abb8b4247f0da2956947d569e95a885f9f3eb16d6347df41840e possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.718377515Z" level=warning msg="rmServiceBinding 2a9f172a1ff27ed9356b380e92a1bbf8e8e5d1130f34b8c79b6f4cb449973ac3 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.722648522Z" level=warning msg="rmServiceBinding fe380c5c516f7e4834e5e8a1f2ced55af1952300735622193acdf6424d85f1a6 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.726458973Z" level=warning msg="rmServiceBinding b40e5887461cde133867cc6fea1f18498755b03a7a275c3e5ae12e4fb722ca92 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.730238267Z" level=warning msg="rmServiceBinding b6c8762498653f23cadddb5631dadb508b07be4ba821b11d021c4102d2dca2a2 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.733946267Z" level=warning msg="rmServiceBinding 377f042202ba6071c6dd5be4b94e5a34f37d3026e1c7c988119f525141d86e29 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.737409737Z" level=warning msg="rmServiceBinding c8fe12f8e6d550dfb1d4556356f0e6a02096c89ade91c42d88cf0bfc2343a889 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.741122405Z" level=warning msg="rmServiceBinding bf69d01da1147444d086acec1d98bcd441fa5ad2c31a3bf2c3bde08e20e202b9 possible transient state >
Jul 28 19:15:01 ubuntu dockerd[1367173]: time="2024-07-28T19:15:01.745041452Z" level=warning msg="rmServiceBinding b6bfa19c4a39d62b00054e279e86194af17253b3b00c6d2c999cee2d03353a4e possible transient state >
Jul 28 19:15:02 ubuntu dockerd[1367173]: time="2024-07-28T19:15:02.026480554Z" level=info msg="ignoring event" container=bbe1f427af40167531bee07e72e8589fa3b86c344be3efe378f2ac256a90296b module=libcontainer>
Jul 28 19:15:02 ubuntu dockerd[1367173]: time="2024-07-28T19:15:02.033467569Z" level=info msg="ignoring event" container=b394ca1075ccf0143c3b2216a0a82a902106bf3bd477b4855590430b78d3808e module=libcontainer>
Jul 28 19:15:02 ubuntu dockerd[1367173]: time="2024-07-28T19:15:02.100265712Z" level=info msg="ignoring event" container=670abb9642e7bba6e80be6999c716cc7a00cade6d0767cea15ca0d5a6ffe5f26 module=libcontainer>
Jul 28 19:15:02 ubuntu dockerd[1367173]: time="2024-07-28T19:15:02.116164964Z" level=info msg="ignoring event" container=b9ac1ee24e90fbde8a9deb9b774f2b7b6a69810b8506d77211c866a0d8b10006 module=libcontainer>
Jul 28 19:15:02 ubuntu dockerd[1367173]: time="2024-07-28T19:15:02.729462473Z" level=warning msg="error detaching from network" error="could not find network attachment for container b394ca1075ccf0143c3b22>
Jul 28 19:15:11 ubuntu dockerd[1367173]: time="2024-07-28T19:15:11.823001943Z" level=info msg="Container failed to exit within 10s of signal 15 - using the force" container=9ae63edaf42a2b4c1c1bb8dbfa936d04>
Jul 28 19:15:11 ubuntu dockerd[1367173]: time="2024-07-28T19:15:11.863829765Z" level=info msg="ignoring event" container=9ae63edaf42a2b4c1c1bb8dbfa936d042e569d7cde8dfd31852d4cd809db5d01 module=libcontainer>
Jul 28 19:15:12 ubuntu dockerd[1367173]: time="2024-07-28T19:15:12.099509982Z" level=warning msg="Failed to disconnect container lb-proxy from swarm network proxy on cluster leave: endpoint lb-proxy not fo>
Jul 28 19:15:12 ubuntu dockerd[1367173]: time="2024-07-28T19:15:12.165816467Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint wncaosrlxn8eo5zd8tbwpz9>
Jul 28 19:15:12 ubuntu dockerd[1367173]: time="2024-07-28T19:15:12.225978428Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint vbe8ctfi4yi871tu56mkhz1>
Jul 28 19:15:12 ubuntu dockerd[1367173]: time="2024-07-28T19:15:12.230436295Z" level=error msg="network proxy remove failed: error while removing network: unknown network proxy id vbe8ctfi4yi871tu56mkhz1ag>
Jul 28 19:15:12 ubuntu dockerd[1367173]: time="2024-07-28T19:15:12.500139493Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint shrw6q9q1ojo3tne5l5fg5p>
Jul 28 19:15:12 ubuntu dockerd[1367173]: time="2024-07-28T19:15:12.510307379Z" level=info msg="Daemon shutdown complete"
Jul 28 19:15:12 ubuntu systemd[1]: docker.service: Deactivated successfully.
Jul 28 19:15:12 ubuntu systemd[1]: Stopped docker.service - Docker Application Container Engine.
Jul 28 19:15:12 ubuntu systemd[1]: docker.service: Consumed 6.480s CPU time.
Jul 28 19:15:12 ubuntu systemd[1]: Starting docker.service - Docker Application Container Engine...
Jul 28 19:15:12 ubuntu dockerd[1368815]: time="2024-07-28T19:15:12.838574028Z" level=info msg="Starting up"
Jul 28 19:15:12 ubuntu dockerd[1368815]: time="2024-07-28T19:15:12.840017717Z" level=info msg="detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.c>
Jul 28 19:15:12 ubuntu dockerd[1368815]: time="2024-07-28T19:15:12.924013111Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.003910585Z" level=info msg="Loading containers: start."
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.005994947Z" level=error msg="failed to load container mount" container=09df3934f55dffd47cabf57dce24e6d989e6da61a5bdbb0809b95e5c8ff641ae er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.006903794Z" level=error msg="failed to load container mount" container=15451ca598b6ff8155741621bbfa0d120381a309ae62417f16372a355d7afd93 er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.007284658Z" level=error msg="failed to load container mount" container=09e2f638323da1a521aa5c2567662ba56aaf3226f2356833296eec33d5923af2 er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.008516721Z" level=error msg="failed to load container" container=873a35b16d35304fa676a1abd7775be5f5281b0b61c36c1eb2c72a625eee5efd error="o>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.009547805Z" level=error msg="failed to load container mount" container=1c2f8accd0e7e2e4bf0bf8d5dcd38e4d9304872e7e99b2f5a79561a644820abb er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.009725329Z" level=error msg="failed to load container mount" container=07d83583e9174e170839bd6f10da6708e0a9dd815460f738ef5d4ee1ff20f19c er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.010046802Z" level=error msg="failed to load container mount" container=e57883634adfc173eaedfb5068cd4d23e3eb483550def84953cfc78d3edfb31d er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.010497609Z" level=error msg="failed to load container mount" container=7bcdcb8ecdd47830be88f3eeec8fd983385092c502878f19d732ea5da6ae13da er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.010608707Z" level=error msg="failed to load container mount" container=caa70a7fd2a2b2f515f1889a3399f9f256702273c52ac16b97b5fc6a5684f8bc er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.010711500Z" level=error msg="failed to load container mount" container=b5a9054dd8054da925f37567519795f2d5f864bb1aa0ec78129e63ecafeb087b er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.010889263Z" level=error msg="failed to load container mount" container=4681cb173b6703a86469add54734916a292eebf6a4a1c5a87b6fa0505ccf9015 er>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.522375394Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set >
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.632546614Z" level=info msg="Loading containers: done."
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.700270126Z" level=info msg="Docker daemon" commit=cc13f95 containerd-snapshotter=false storage-driver=overlay2 version=27.1.1
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.720513822Z" level=info msg="Listening for connections" addr="[::]:2377" module=node node.id=oz91fbyuvziw7719aua9rqaju proto=tcp
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.720975830Z" level=info msg="Listening for local connections" addr=/var/run/docker/swarm/control.sock module=node node.id=oz91fbyuvziw7719a>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.843644239Z" level=info msg="manager selected by agent for new session: {oz91fbyuvziw7719aua9rqaju 213.165.82.33:2377}" module=node/agent n>
Jul 28 19:15:13 ubuntu dockerd[1368815]: time="2024-07-28T19:15:13.845365540Z" level=info msg="waiting 0s before registering session" module=node/agent node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.117594296Z" level=info msg="19f52fc2024a2dc2 switched to configuration voters=(1870453730550885826)" module=raft node.id=oz91fbyuvziw7719a>
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.117924938Z" level=info msg="19f52fc2024a2dc2 became follower at term 256" module=raft node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.118048569Z" level=info msg="newRaft 19f52fc2024a2dc2 [peers: [19f52fc2024a2dc2], term: 256, commit: 18998, applied: 10000, lastindex: 1899>
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.122044640Z" level=info msg="19f52fc2024a2dc2 is starting a new election at term 256" module=raft node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.122138396Z" level=info msg="19f52fc2024a2dc2 became candidate at term 257" module=raft node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.122175826Z" level=info msg="19f52fc2024a2dc2 received MsgVoteResp from 19f52fc2024a2dc2 at term 257" module=raft node.id=oz91fbyuvziw7719a>
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.122188600Z" level=info msg="19f52fc2024a2dc2 became leader at term 257" module=raft node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:14 ubuntu dockerd[1368815]: time="2024-07-28T19:15:14.122196976Z" level=info msg="raft.node: 19f52fc2024a2dc2 elected leader 19f52fc2024a2dc2 at term 257" module=raft node.id=oz91fbyuvziw7719a>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.109388526Z" level=error msg="agent: session failed" backoff=100ms error="rpc error: code = Aborted desc = dispatcher is stopped" module=no>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.112447196Z" level=info msg="manager selected by agent for new session: {oz91fbyuvziw7719aua9rqaju 213.165.82.33:2377}" module=node/agent n>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.112574646Z" level=info msg="waiting 55.46744ms before registering session" module=node/agent node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.710195814Z" level=error msg="error creating cluster object" error="name conflicts with an existing object" module=node node.id=oz91fbyuvzi>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.710555850Z" level=info msg="leadership changed from not yet part of a raft cluster to oz91fbyuvziw7719aua9rqaju" module=node node.id=oz91f>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.710608038Z" level=info msg="dispatcher starting" module=dispatcher node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.813408077Z" level=info msg="worker oz91fbyuvziw7719aua9rqaju was successfully registered" method="(*Dispatcher).register"
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.815225528Z" level=info msg="initialized VXLAN UDP port to 4789 " module=node node.id=oz91fbyuvziw7719aua9rqaju
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.815269440Z" level=info msg="Initializing Libnetwork Agent" advertise-addr=213.165.82.33 data-path-addr= listen-addr=0.0.0.0 local-addr=213>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.815314845Z" level=info msg="New memberlist node - Node:ubuntu will use memberlist nodeID:90829a6424b2 with config:&{NodeID:90829a6424b2 Ho>
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.815566558Z" level=info msg="Node 90829a6424b2/213.165.82.33, joined gossip cluster"
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.815977018Z" level=info msg="Daemon has completed initialization"
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.833090409Z" level=info msg="Node 90829a6424b2/213.165.82.33, added to nodes list"
Jul 28 19:15:15 ubuntu dockerd[1368815]: time="2024-07-28T19:15:15.905438993Z" level=info msg="API listen on /run/docker.sock"
Jul 28 19:15:15 ubuntu systemd[1]: Started docker.service - Docker Application Container Engine.
I searched for single error messages but found no clue as to which was wrong, most messages confused me.
As I understand it docker tries to pull the images which are already there. But I cannot see why the containers are killed (and why restarted automatically).
Edit
The containers exit with 0:
Sun Jul 28 22:56 root@VPS-X ~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4dd3ae4a47ae kklepper/nginx-php7-mysqli-graphicsmagick:alpine "/bin/sh -c 'php-fpm…" 4 minutes ago Exited (0) 13 seconds ago 1proxy-nginx-1
Well-known test apps
docker run -d --name loop-demo alpine sh -c "while true; do sleep 1; done"
docker run -d --name sleep-demo alpine sleep infinity
docker run -d --name tail-demo alpine tail -f /dev/null
docker run -dt --name tty-demo alpine
get killed as well with exit code 137, but not restarted:
Sun Jul 28 22:40 root@VPS-X ~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d0b7c153c59c kklepper/nginx-php7-mysqli-graphicsmagick:alpine "/bin/sh -c 'php-fpm…" 12 seconds ago Up 11 seconds 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp 1proxy-nginx-1
941818f99f4f kklepper/mariadb33:alpine "/start-v3.sh" 32 seconds ago Up 31 seconds 3306/tcp wp_master.1.oglrvm0juf0wgkdhwwchppmi9
04c5001624f8 kklepper/nginx-php7-mysqli-memcached:alpine "/bin/sh -c 'php-fpm…" 35 seconds ago Up 35 seconds 80/tcp, 443/tcp wp_wp.1.pw86bnqpmlc6fwwcxubznce8k
393969e34944 kklepper/nginx-php7-mysqli-graphicsmagick:alpine "/bin/sh -c 'php-fpm…" 38 seconds ago Up 37 seconds 80/tcp, 443/tcp wp_joe.1.56fcmuid1u00h0dcmh9e8hbqp
46d306d5a9c1 adminer:latest "entrypoint.sh php -…" 40 seconds ago Up 39 seconds 8080/tcp wp_adm.1.65m8hp8ohc97fuhnbracbaxp3
611def38be7a alpine "/bin/sh" 5 minutes ago Exited (137) About a minute ago tty-demo
492fbea9a9dd alpine "tail -f /dev/null" 5 minutes ago Exited (137) About a minute ago tail-demo
7d0d8ff2008b alpine "sleep infinity" 5 minutes ago Exited (137) About a minute ago sleep-demo
3f804ec52991 alpine "sh -c 'while true; …" 5 minutes ago Exited (137) About a minute ago
and later removed:
Sun Jul 28 22:41 root@VPS-X ~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cd83ed440068 kklepper/nginx-php7-mysqli-graphicsmagick:alpine "/bin/sh -c 'php-fpm…" 3 seconds ago Up 2 seconds 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp 1proxy-nginx-1
f7fee2ca92c1 adminer:latest "entrypoint.sh php -…" 22 seconds ago Up 21 seconds 8080/tcp wp_adm.1.xu3yffjhxeoo0rx9gaybdljui
7fe8ecd73e32 kklepper/mariadb33:alpine "/start-v3.sh" 26 seconds ago Up 25 seconds 3306/tcp wp_master.1.idtekxjaz7pxemolr5ifif7oj
ae147633e078 kklepper/nginx-php7-mysqli-memcached:alpine "/bin/sh -c 'php-fpm…" 28 seconds ago Up 27 seconds 80/tcp, 443/tcp wp_wp.1.wz4iof40if1rsnikvza48v6d5
78891f9b9f85 kklepper/nginx-php7-mysqli-graphicsmagick:alpine "/bin/sh -c 'php-fpm…" 30 seconds ago Up 29 seconds 80/tcp, 443/tcp wp_joe.1.cjgg12e5yf5cd4exu8cifknfz
Any ideas or insights?
Now my questions are:
- did anybody ever experience this kind of behavior
- what am I doing wrong
- what can I learn from this setup
- how can I further investigate this scenario
- and how can I make the whole thing run as reliably as before
- and lastly, how could this happen in the first place?