This works fine when I deploy, but when I add a worker node to the swarm later on, the new worker canāt pull the image required to run the task. The system logs report this:
level=error msg="Not continuing with pull after error: denied: Permission denied for \123\" from request \"/v2/my-gcloud-project/my/image/manifests/123\". "
level=info msg="Translating \"denied: Permission denied for \\\"123\\\" from request \\\"/v2/my-gcloud-project/my/image/manifests/123\\\". \" to \"repository us.gcr.io/my-gcloud-project/my/image not found: does not exist or no pull access\""
level=error msg="pulling image failed" error="repository us.gcr.io/my-gcloud-project/my/image not found: does not exist or no pull access" module="node/agent/taskmanager" node.id=... service.id=... task.id=...
level=error msg="fatal task error" error="No such image: us.gcr.io/my-gcloud-project/my/image:123@sha256:..." module="node/agent/taskmanager" node.id=... service.id=... task.id=...
However, when I manually run docker pull on that machine, it works fine, since every machine in the cluster is authenticated to my private Google Registry, thanks to docker login.
Thus my questions are:
Why canāt the added worker pull from the private registry?
What does --with-registry-auth do exactly?
Thanks a lot
EDIT: the nodes are running Ubuntu 16.04.2 LTS and the Docker version is:
Server:
Version: 17.04.0-ce
API version: 1.28 (minimum version 1.12)
Go version: go1.7.5
Git commit: 4845c56
Built: Mon Apr 3 18:07:42 2017
OS/Arch: linux/amd64
Experimental: false
level=error msg="Attempting next endpoint for pull after error: unknown: Authentication is required"
level=error msg=āpulling image failedā error=āGet: unknown: Authentication is requiredā module=taskmanager
level=error msg=āfatal task errorā error="No such image:
Is it not possible to use swarm with private registry?
I donāt manually run my private registry, I use Google Container Registry, so itās transparently managed by Google.
I doubt itās a problem with the registry istelf, since when I manually run docker pull on the new worker node, it successfully pulls the image from the GCR.
What I donāt understand is: why canāt my service pull that private image itself?
Am I doing something wrong with my authentication configuration? Or is there a bug in docker swarm or the registry-auth?
Are you adding the node without rerunning the docker stack deploy command?
If you rerun docker stack deploy after adding a new node, do things work as you expected?
It could be that the --with-registry-auth flag pushes credentials to the nodes that exist in the swarm at the time you run the command, but the auth info may not propagate to new nodes as they join. I am not sure if this is expected behavior, but if rerunning docker stack deploy again after adding a new node, then this is likely what is going on in the background.