We have an existing swarm cluster with about 20 services spanning across 10-15 VMs.
During service setup we are deploying them with “–with-registry-auth” option. Our docker registry is on AWS ECR.
Everything works fine till we add a new node. When new node is added with an existing label, containers does not come up on the node. On checking service status, I can see message “No image found”.
Example: I have nginx image running on 2 VMs with labels proxy. I add 1 more VM and apply label proxy.
Docker swarm tries to deploy one more copy of Nginx on this new 3rd VM but it fails as image is not present.
Twist/Pointer : If new VM is added within 4 hrs of service creation, nginx comes up on new VM. But if VM is added after 4 hr of service deployment (4 hr is credentials expiry time of AWS ECR), image fails to come on new VM. (Image is still present on manager VM).
To overcome the issue, we added a cron job on each VM that will login to AWS ECR every 2 hr so that credentials are always refreshed. But this also didn’t fix the issue.
If i update nginx from manager node with “–with-registry-auth”, nginx comes on 3rd VM even if it was added after 4 hr. But this is not expected.
Anyone else faced the issue?
Any guidance will be helpful