I am working on a new swarm just set up. We have 3 manager nodes on centos 7. Services run fine, dockersamples/visualizer is running and can see services on any of the 3 nodes.
The nodes look healthy:
:~/stacktest$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
m2gwj719n04n4mlqopoi37x2f * swarm-1 Ready Active Reachable 20.10.6
mj5wgc9yml9nhmetdc6q3s5co swarm-2 Ready Active Leader 20.10.6
fp2ewnf7ywjn7kuan9dif7pdk swarm-3 Ready Active Reachable 20.10.6
I followed the example in Deploy a stack to a swarm | Docker Docs and everything worked fine up to the step ‘Deploy the stack to the swarm’. When I run the command 'docker stack deploy -c docker-compose,yml stackdemo, I get:
:~/stacktest$ docker stack deploy -c docker-compose.yml stackdemo
Creating network stackdemo_default
Creating service stackdemo_web
Creating service stackdemo_redis
However, the services never get to the running state even though no error is given. The current state remains ‘New’ indefinitely:
:~/stacktest$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
mhspwv44uljg registry replicated 1/1 registry:2 *:5000->5000/tcp
3suwj8vmoi33 stackdemo_redis replicated 0/1 redis:alpine
tiyihji32p0o stackdemo_web replicated 0/1 127.0.0.1:5000/stackdemo:latest
6wslf6nds61p viz replicated 1/1 dockersamples/visualizer:latest *:8080->8080/tcp
:~/stacktest$ docker service ps stackdemo_redis
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
xcp6v1vq70nz stackdemo_redis.1 redis:alpine Running New 13 minutes ago
:~/stacktest$ docker service ps stackdemo_web
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ilfhstti03cz stackdemo_web.1 127.0.0.1:5000/stackdemo:latest Running New 14 minutes ago
If I run dockerd -D and try the stack deploy, there is an error where the stackdemo_default network is not found:
DEBU[2021-05-05T14:19:52.704978621-04:00] swarm-1: Initiating bulk sync for networks [bgp2p8e98hcml6dn1oj9ohia2] with node 97d5a9f34ab2
DEBU[2021-05-05T14:19:52.904793793-04:00] Calling HEAD /_ping
DEBU[2021-05-05T14:19:52.912946386-04:00] Calling HEAD /_ping
DEBU[2021-05-05T14:19:52.977910738-04:00] Calling GET /v1.41/info
DEBU[2021-05-05T14:19:53.044689994-04:00] Calling GET /v1.41/networks?filters=%7B%22label%22%3A%7B%22com.docker.stack.namespace%3Dstackdemo%22%3Atrue%7D%7D
DEBU[2021-05-05T14:19:53.105499434-04:00] Calling POST /v1.41/networks/create
DEBU[2021-05-05T14:19:53.105694071-04:00] form data: {“Attachable”:false,“CheckDuplicate”:false,“ConfigFrom”:null,“ConfigOnly”:false,“Driver”:“overlay”,“EnableIPv6”:false,“IPAM”:null,“Ingress”:false,“Internal”:false,“Labels”:{“com.docker.stack.namespace”:“stackdemo”},“Name”:“stackdemo_default”,“Options”:null,“Scope”:“”}
DEBU[2021-05-05T14:19:53.182405347-04:00] Calling GET /v1.41/services?filters=%7B%22label%22%3A%7B%22com.docker.stack.namespace%3Dstackdemo%22%3Atrue%7D%7D
DEBU[2021-05-05T14:19:53.240936740-04:00] Calling GET /v1.41/distribution/127.0.0.1:5000/stackdemo:latest/json
DEBU[2021-05-05T14:19:53.305659117-04:00] Calling POST /v1.41/services/create
DEBU[2021-05-05T14:19:53.305841802-04:00] form data: {“EndpointSpec”:{“Ports”:[{“Protocol”:“tcp”,“PublishMode”:“ingress”,“PublishedPort”:8000,“TargetPort”:8000}]},“Labels”:{“com.docker.stack.image”:“127.0.0.1:5000/stackdemo”,“com.docker.stack.namespace”:“stackdemo”},“Mode”:{“Replicated”:{}},“Name”:“stackdemo_web”,“TaskTemplate”:{“ContainerSpec”:{“Image”:“127.0.0.1:5000/stackdemo:latest”,“Labels”:{“com.docker.stack.namespace”:“stackdemo”},“Privileges”:{“CredentialSpec”:null,“SELinuxContext”:null}},“ForceUpdate”:0,“Networks”:[{“Aliases”:[“web”],“Target”:“stackdemo_default”}],“Placement”:{},“Resources”:{}}}
DEBU[2021-05-05T14:19:53.306914737-04:00] error handling rpc error=“rpc error: code = NotFound desc = network stackdemo_default not found” rpc=/docker.swarmkit.v1.Control/GetNetwork
But this is what I get when I list networks:
:~/stacktest$ docker network ls
NETWORK ID NAME DRIVER SCOPE
a4ff3b97d648 bridge bridge local
aa463181d7a1 docker_gwbridge bridge local
bacf509c8c32 host host local
bgp2p8e98hcm ingress overlay swarm
9bcd7049fa90 none null local
wrf071cu1k0i stackdemo_default swarm
The same thing happens when using another custom stack. I’m stumped - any ideas to try troubleshooting this?