Able to join existing swarm as worker, but unable as manager

When executing a docker swarm join command (as manager), I face the following error:

Error response from daemon: manager stopped: can't initialize raft node: rpc error: code = Internal desc = connection error: desc = "transport: x509: certificate is not valid for any names, but wanted to match swarm-manager"

Joining the same swarm, but as worker, works flawless.

The logfiles show me the following items:

kmo@GETSTdock-app01 ~ $ sudo tail -f /var/log/upstart/docker.log
time="2018-07-06T09:18:17.890620199+02:00" level=info msg="Listening for connections" addr="[::]:2377" module=node node.id=7j75bmugpf8k2o0onta1yp4zy proto=tcp
time="2018-07-06T09:18:17.892234469+02:00" level=info msg="manager selected by agent for new session: { 10.130.223.107:2377}" module=node/agent node.id=7j75bmugpf8k2o0onta1yp4zy
time="2018-07-06T09:18:17.892364019+02:00" level=info msg="waiting 0s before registering session" module=node/agent node.id=7j75bmugpf8k2o0onta1yp4zy
time="2018-07-06T09:18:18.161362606+02:00" level=error msg="fatal task error" error="cannot create a swarm scoped network when swarm is not active" module=node/agent/taskmanager node.id=7j75bmugpf8k2o0onta1yp4zy service.id=p3ng4om2m8rl7ygoef18ayohp task.id=weaubf3qj5goctlh2039sjvdg
time="2018-07-06T09:18:18.162182077+02:00" level=error msg="fatal task error" error="cannot create a swarm scoped network when swarm is not active" module=node/agent/taskmanager node.id=7j75bmugpf8k2o0onta1yp4zy service.id=6sl9y5rcov6htwbyvm504ewh2 task.id=j3foc6rjszuqszj41qyqb6mpe
time="2018-07-06T09:18:18.184847516+02:00" level=info msg="Stopping manager" module=node node.id=7j75bmugpf8k2o0onta1yp4zy
time="2018-07-06T09:18:18.184993569+02:00" level=info msg="Manager shut down" module=node node.id=7j75bmugpf8k2o0onta1yp4zy
time="2018-07-06T09:18:18.185020917+02:00" level=info msg="shutting down certificate renewal routine" module=node/tls node.id=7j75bmugpf8k2o0onta1yp4zy node.role=swarm-manager
time="2018-07-06T09:18:18.185163663+02:00" level=error msg="cluster exited with error: manager stopped: can't initialize raft node: rpc error: code = Internal desc = connection error: desc = \"transport: x509: certificate is not valid for any names, but wanted to match swarm-manager\""
time="2018-07-06T09:18:18.185492995+02:00" level=error msg="Handler for POST /v1.37/swarm/join returned error: manager stopped: can't initialize raft node: rpc error: code = Internal desc = connection error: desc = \"transport: x509: certificate is not valid for any names, but wanted to match swarm-manager\""

I face similar problems when I join as worker, and then attempt to promote the node to a manager node.

Docker version = 18.03.1

OS = Ubuntu 14.04 LTS

Anybody an idea how to resolve this?

@kmoens did you by any chance found a solution for this? I have the exact same problem on CentOS (7.5.1804) and docker (18.03.1-ce). This was working fine before where the same node was able to join as manager but after some yum updates and reboot it can only join as worker.

Hello Arul,

Sorry for late response.

In our case - after several investigations - it was caused by the proxy server which we use in our corporate environment. Since we had a corporate proxy configured for Docker, the joining as “manager” node seems to have issues with the proxy server.

We added all our Docker IP addresses to the no_proxy variable, and then rebooted all Docker nodes. Once that was done, we were able to join as manager again.

Kind regards,
Kenny

@kmoens
Thanks for the reply. For most people with this specific issue, it was the proxy causing it. Unfortunately, in my case, there was no proxy involved in our environment. I ended up wiping everything and moved to 18.06 which I needed to move anyways and this problem went away after that.

I continue to have this same problem even with 18.06 . Any inputs please…
I have a Ubuntu server which is serving as manager node and am trying to join my MAC as a second manager node . This is mu local network at home. (i don;t have a proxy server)

Have you tried making all nodes leave cluster, delete the cluster and recreate? It is very likely that may have fixed my problem. As I mentioned in my post, the same version worked fine for months but something changed that I was not able to figure out but wanted to move to next version anyways for a different reason.