Able to join existing swarm as worker, but unable as manager

kmoens · July 6, 2018, 7:35am

When executing a docker swarm join command (as manager), I face the following error:

Error response from daemon: manager stopped: can't initialize raft node: rpc error: code = Internal desc = connection error: desc = "transport: x509: certificate is not valid for any names, but wanted to match swarm-manager"

Joining the same swarm, but as worker, works flawless.

The logfiles show me the following items:

kmo@GETSTdock-app01 ~ $ sudo tail -f /var/log/upstart/docker.log
time="2018-07-06T09:18:17.890620199+02:00" level=info msg="Listening for connections" addr="[::]:2377" module=node node.id=7j75bmugpf8k2o0onta1yp4zy proto=tcp
time="2018-07-06T09:18:17.892234469+02:00" level=info msg="manager selected by agent for new session: { 10.130.223.107:2377}" module=node/agent node.id=7j75bmugpf8k2o0onta1yp4zy
time="2018-07-06T09:18:17.892364019+02:00" level=info msg="waiting 0s before registering session" module=node/agent node.id=7j75bmugpf8k2o0onta1yp4zy
time="2018-07-06T09:18:18.161362606+02:00" level=error msg="fatal task error" error="cannot create a swarm scoped network when swarm is not active" module=node/agent/taskmanager node.id=7j75bmugpf8k2o0onta1yp4zy service.id=p3ng4om2m8rl7ygoef18ayohp task.id=weaubf3qj5goctlh2039sjvdg
time="2018-07-06T09:18:18.162182077+02:00" level=error msg="fatal task error" error="cannot create a swarm scoped network when swarm is not active" module=node/agent/taskmanager node.id=7j75bmugpf8k2o0onta1yp4zy service.id=6sl9y5rcov6htwbyvm504ewh2 task.id=j3foc6rjszuqszj41qyqb6mpe
time="2018-07-06T09:18:18.184847516+02:00" level=info msg="Stopping manager" module=node node.id=7j75bmugpf8k2o0onta1yp4zy
time="2018-07-06T09:18:18.184993569+02:00" level=info msg="Manager shut down" module=node node.id=7j75bmugpf8k2o0onta1yp4zy
time="2018-07-06T09:18:18.185020917+02:00" level=info msg="shutting down certificate renewal routine" module=node/tls node.id=7j75bmugpf8k2o0onta1yp4zy node.role=swarm-manager
time="2018-07-06T09:18:18.185163663+02:00" level=error msg="cluster exited with error: manager stopped: can't initialize raft node: rpc error: code = Internal desc = connection error: desc = \"transport: x509: certificate is not valid for any names, but wanted to match swarm-manager\""
time="2018-07-06T09:18:18.185492995+02:00" level=error msg="Handler for POST /v1.37/swarm/join returned error: manager stopped: can't initialize raft node: rpc error: code = Internal desc = connection error: desc = \"transport: x509: certificate is not valid for any names, but wanted to match swarm-manager\""

I face similar problems when I join as worker, and then attempt to promote the node to a manager node.

Docker version = 18.03.1

OS = Ubuntu 14.04 LTS

Anybody an idea how to resolve this?

aselvan · July 31, 2018, 4:00pm

@kmoens did you by any chance found a solution for this? I have the exact same problem on CentOS (7.5.1804) and docker (18.03.1-ce). This was working fine before where the same node was able to join as manager but after some yum updates and reboot it can only join as worker.

kmoens · September 3, 2018, 11:16am

Hello Arul,

Sorry for late response.

In our case - after several investigations - it was caused by the proxy server which we use in our corporate environment. Since we had a corporate proxy configured for Docker, the joining as “manager” node seems to have issues with the proxy server.

We added all our Docker IP addresses to the no_proxy variable, and then rebooted all Docker nodes. Once that was done, we were able to join as manager again.

Kind regards,
Kenny

aselvan · September 3, 2018, 12:45pm

@kmoens
Thanks for the reply. For most people with this specific issue, it was the proxy causing it. Unfortunately, in my case, there was no proxy involved in our environment. I ended up wiping everything and moved to 18.06 which I needed to move anyways and this problem went away after that.

rawoor · December 11, 2018, 4:14pm

I continue to have this same problem even with 18.06 . Any inputs please…
I have a Ubuntu server which is serving as manager node and am trying to join my MAC as a second manager node . This is mu local network at home. (i don;t have a proxy server)

aselvan · January 2, 2019, 1:47am

Have you tried making all nodes leave cluster, delete the cluster and recreate? It is very likely that may have fixed my problem. As I mentioned in my post, the same version worked fine for months but something changed that I was not able to figure out but wanted to move to next version anyways for a different reason.

Topic		Replies	Views
Swarm don't add a managers Swarm	1	1793	March 15, 2018
Fail join node as Worker Swarm	14	30808	July 26, 2023
Broken swarm, cannot get past errors after errors Swarm	1	2213	November 15, 2018
Error response from daemon: rpc error: code = Unavailable desc = grpc: the connection is unavailable General docker , swarm	13	42956	April 23, 2021
Docker Swarm join-token not working Swarm	0	1912	May 23, 2018

Able to join existing swarm as worker, but unable as manager

Related topics