Hello,
I currently have a docker swarm with 1 manager and 4 workers (each with 4 cores, for a total of 16 cores) built using docker version 18.03.1. More info further below. I am trying to run many, short, one-time jobs on the swarm – that is, I create 100, 2-minute jobs, with 1 replicas and restart-condition set to none. Example of 1 such job (that just sleeps for 120 seconds for now) is shown below:
$ docker service create --name test_job0 --detach --restart-condition none --replicas 1 --reserve-cpu 1 --limit-cpu 1 --reserve-memory 1GB manager:5000/code-docker-image sleep 120
PROBLEM: However, the problem is that only 4 services run simultaneously, instead of the 16 available cores. In fact, I noticed that jobs are scheduled only on the last worker-node (hadoop4 state goes “active” but other worker state remains “draining”) in the swarm and never on the other workers. I would like to have 16 services run simultaneously.
I have tried several different settings, specifying constraints etc. but none of them worked out. The service’s --no-trunc output reports – “no suitable node ( nodes not available for new tasks; insufficient resources on 1 node)”
Any suggestions to get 16 services to run simultaneously is appreciated.
Output showing pending services not running due to “no suitable node” error:
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
lwxadszrv344 test_job8.1 manager:5000/code-docker-image:latest Running Pending 5 seconds ago "no suitable node (4 nodes not…"
n1erpzp05zb8 test_job7.1 manager:5000/code-docker-image:latest Running Pending 5 seconds ago "no suitable node (4 nodes not…"
49hk53l67yxv test_job6.1 manager:5000/code-docker-image:latest Running Pending 5 seconds ago "no suitable node (4 nodes not…"
pyhtgvigu8dx test_job5.1 manager:5000/code-docker-image:latest Running Pending 6 seconds ago "no suitable node (4 nodes not…"
l54a5yaxkzej test_job4.1 manager:5000/code-docker-image:latest Running Pending 6 seconds ago "no suitable node (4 nodes not…"
idxhyxamf43l test_job3.1 manager:5000/code-docker-image:latest hadoop4 Running Accepted less than a second ago
0aeecexapcex test_job2.1 manager:5000/code-docker-image:latest Running Pending 7 seconds ago "no suitable node (4 nodes not…"
39a9fmm654n6 test_job1.1 manager:5000/code-docker-image:latest hadoop4 Running Assigned less than a second ago
h9g259hbqwj1 test_job0.1 manager:5000/code-docker-image:latest hadoop4 Running Assigned less than a second ago
Output showing list of nodes (while some of the jobs were running):
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
ENGINE VERSION
n48mil4rvzj3k2eix38v3eph4 hadoop1 Ready Drain 18.03.1-ce
wqihfh77so5k1q6hej1lkneaa hadoop2 Ready Drain 18.03.1-ce
ep8rl3yrwmv3pqlb96j75ol5z hadoop3 Ready Drain 18.03.1-ce
gqlgvflky2li104d4st2i10jc hadoop4 Ready Active 18.03.1-ce
znwmm4cmrz7a4b0yby11yncw1 * manager Ready Drain Leader 18.03.1-ce
Any suggestions to get 16 services running on this swarm is appreciated.