Docker swarm overloads Manager but not nodes?

nonoti · October 24, 2023, 4:23pm

Hi

I have 1 Swarm Manager Node (Intel XEON 6 core 16GB RAM) and 3 x Swarm Nodes (Intel Xeon’s 12 core, 32GB RAM).

I’m trying to spin up about 200-600 containers in the swarm that runs a small python script. Each container is 1GB in size.

My problem is the Manager doesn’t seem to distribute load evenly. It takes on almost 90% of the work on itself as a node, then pushes a little bit to the other 3 nodes. The result is the Manager is running all cores at 100% CPU and I can’t even type docker node or service commands on the command line to check status - its practically dead. While the 3 worker nodes idle with maybe 10 containers each running at the same time.

Am I doing something wrong? I can’t find in the docs how to give weighting to the nodes so that Manager always only gets 30% of load, and the other 3 worker nodes get the remaining 70%.

Thx.

meyay · October 24, 2023, 5:41pm

Please share the compose file used to deploy your stack, so we can get an idea of what you actually do.

Swarm service should be placed round-robin on nodes that fulfill resource and deployment constraints.
You could use a placement constraint to only deploy service to nodes with specific attributes or node tags.

E.g. use a deployment constraint to only deploy payload to the worker nodes:

version: "3.8"
services:
  myservice:
    ...
    deploy:
      placement:
        constraints:
          - "node.role==worker"
   ...

See https://docs.docker.com/engine/reference/commandline/service_create/#constraint for available constraints.

nonoti · October 24, 2023, 8:30pm

Um, im not using a compose file?

I pulled an image on the master node and i’m running:

docker service create --replicas 600 --update-parallelism 5 --with-registry-auth --name mytestscript registry.blahblah.com/myregistry/myimage

It then spins up but at around 200 it dies because the Manager host is now at 100% CPU on all cores because 150 are running on itself and only 50 got sent to the other nodes.

BUT, I did notice something now. I left it to run for as few hours and for some reason i don’t understand, after about 2 hours it re-distributed all the load and has now equally put it all accross the workers. Its weird it doesn’t do it upfront but much later it does it.

bluepuma77 · October 27, 2023, 7:37am

We run only small loads on Swarm managers, all heavy load is explicitly limited to workers.

Docker Swarm likes to have 3+ manager nodes for redundancy.

You can set CPU and memory constraints per service container, maybe try that.

Also check your network, any VLANs, VSwitches or VPN in between? Wrong MTU can be an issue, as requests >~1400 bytes fail.

Topic		Replies	Views
Load Balancer for Swarm nodes General docker , swarm	5	835	April 9, 2024
Swarm scheduling preference General	0	297	November 2, 2022
Docker swarm scheduling help General	2	2311	September 17, 2019
Docker swarm : Master nodes Swarm docker , swarm	4	777	September 25, 2020
Running Swarm in a single machine Swarm docker	4	23011	August 16, 2018

Docker swarm overloads Manager but not nodes?

Related topics