I have 1 Swarm Manager Node (Intel XEON 6 core 16GB RAM) and 3 x Swarm Nodes (Intel Xeon’s 12 core, 32GB RAM).
I’m trying to spin up about 200-600 containers in the swarm that runs a small python script. Each container is 1GB in size.
My problem is the Manager doesn’t seem to distribute load evenly. It takes on almost 90% of the work on itself as a node, then pushes a little bit to the other 3 nodes. The result is the Manager is running all cores at 100% CPU and I can’t even type docker node or service commands on the command line to check status - its practically dead. While the 3 worker nodes idle with maybe 10 containers each running at the same time.
Am I doing something wrong? I can’t find in the docs how to give weighting to the nodes so that Manager always only gets 30% of load, and the other 3 worker nodes get the remaining 70%.
Please share the compose file used to deploy your stack, so we can get an idea of what you actually do.
Swarm service should be placed round-robin on nodes that fulfill resource and deployment constraints.
You could use a placement constraint to only deploy service to nodes with specific attributes or node tags.
E.g. use a deployment constraint to only deploy payload to the worker nodes:
It then spins up but at around 200 it dies because the Manager host is now at 100% CPU on all cores because 150 are running on itself and only 50 got sent to the other nodes.
BUT, I did notice something now. I left it to run for as few hours and for some reason i don’t understand, after about 2 hours it re-distributed all the load and has now equally put it all accross the workers. Its weird it doesn’t do it upfront but much later it does it.