I haven’t found any other posts about this but would like to know if theres a solution. I’ve been running a 4 node docker swarm in dev continually for some time and ran into a couple of problems.
Firstly if there is a network outage it seems that something in docker causes an endless loop to occur causing over 100 tasks in each node and effectively overloading the node cpu.
Secondly, I try to reinstall docker and run into problems with locked files in /overlay2/ which require a reboot of the server (if I try to stop docker running it just goes into an endless loop and never stops running)
Thirdly, I had a fully working connection between node 1 and node 3 (application to mariadb database) and after the network outage, the app no longer finds the database with exactly the same settings I was using previously. No firewall settings were changed.
Edit: After reinstall the application finds the database immediately
I’ve run into this same problem twice which involved a complete refresh of the servers and basically start again but this would not be the ideal solution for production.
Does anybody have any knowledge about how to get the docker installation fully working after having a network outage, node overload and locked docker files/surplass files left after a docker system prune?