Fail join node as Worker

Hello everyone, I am trying to form a Docker Swarm with a manager node and 2 workers. The problem is that the worker nodes do not join the swarm, and they show me the following error:

docker swarm join \

--token SWMTKN-1-40dp45dybgfah9wcunovtz9vy4dorbv3migjf45hrf8fu4w50l-72ild9aigjadqtsz0s159qbj7 \
192.168.15.61:2377

Error response from daemon: Timeout was reached before node was joined. The attempt to join the swarm will continue in the background. Use the “docker info” command to see the current swarm status of your node.

docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 5
Server Version: 17.05.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 23
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: error
NodeID:
Error: rpc error: code = 4 desc = context deadline exceeded
Is Manager: false
Node Address: 192.168.15.62
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-103-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 5.828GiB
Name: node-2
ID: MJ6T:7H5D:D75I:2MBD:55WM:RNZS:7GXS:VZ3H:VK2C:TUAV:IGED:RBBQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: j4vier
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

In logs, i can see this:

Mar 23 06:13:15 localhost dockerd[30120]: time=“2018-03-23T06:13:15.955019189Z” level=error msg=“failed to retrieve remote root CA certificate” error=“rpc error: code = 4 desc = context deadline exceeded” module=node
Mar 23 06:13:20 localhost dockerd[30120]: time=“2018-03-23T06:13:20.955462686Z” level=error msg=“failed to retrieve remote root CA certificate” error=“rpc error: code = 4 desc = context deadline exceeded” module=node
Mar 23 06:13:25 localhost dockerd[30120]: time=“2018-03-23T06:13:25.956006091Z” level=error msg=“failed to retrieve remote root CA certificate” error=“rpc error: code = 4 desc = context deadline exceeded” module=node
Mar 23 06:13:30 localhost dockerd[30120]: time=“2018-03-23T06:13:30.957090086Z” level=error msg=“failed to retrieve remote root CA certificate” error=“rpc error: code = 4 desc = context deadline exceeded” module=node
Mar 23 06:13:31 localhost dockerd[30120]: time=“2018-03-23T06:13:31.034087808Z” level=error msg=“Handler for POST /v1.29/swarm/join returned error: Timeout was reached before node was joined. The attempt to join the swarm will continue in the background. Use the "docker info" command to see the current swarm status of your node.”
Mar 23 06:13:35 localhost dockerd[30120]: time=“2018-03-23T06:13:35.958060130Z” level=error msg=“failed to retrieve remote root CA certificate” error=“rpc error: code = 4 desc = context deadline exceeded” module=node
Mar 23 06:13:35 localhost dockerd[30120]: time=“2018-03-23T06:13:35.959574057Z” level=error msg=“cluster exited with error: rpc error: code = 4 desc = context deadline exceeded”

1 Like

You need to ensure that the requisite network ports are open between the swarm nodes.

  • TCP port 2377 for cluster management communications
  • TCP and UDP port 7946 for communication among nodes
  • UDP port 4789 for overlay network traffic

See Open protocols and ports between the hosts.

Also, Docker 17.05.0 is pretty old. You should consider installing something more recent.

1 Like

Thanks for repply.

sysadmin@ubuntu-1:~$ sudo netstat -tulp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 :ssh : LISTEN 1246/sshd
tcp 0 0 :24007 : LISTEN 22229/glusterd
tcp 0 0 :sunrpc : LISTEN 22380/rpcbind
tcp6 0 0 [::]:ssh [::]:
LISTEN 1246/sshd
tcp6 0 0 [::]:2377 [::]:
LISTEN 14117/dockerd
tcp6 0 0 [::]:7946 [::]:
LISTEN 14117/dockerd
tcp6 0 0 [::]:sunrpc [::]:* LISTEN 22380/rpcbind
udp 0 0 :4789 : -
udp 0 0 :932 : 22380/rpcbind
udp 0 0 :sunrpc : 22380/rpcbind
udp6 0 0 [::]:932 [::]:
22380/rpcbind
udp6 0 0 [::]:7946 [::]:
14117/dockerd
udp6 0 0 [::]:sunrpc [::]:
22380/rpcbind

I have this ports opens, but the problems continue…

Hi! DId you figure out what was the issue? I am facing the same issue!

Hi, Is there a solution for this issue, i’m also facing the same issue.

  1. installed the latest version
  2. ports are open

I’m facing the same issue!
And if you try to join the node again it says that the node is already part of another swarm.
And what is more frustrating is that if you create new VMs the problem repeats.

1 Like

Solved it (On my Windows machine):
Just force remove the nodes using --force or -f:

docker-machine rm myvm1 -f
docker-machine rm myvm2 -f

Remove the nodes as well on Hyper-V Manage, just in case the system hasn’t refreshed it yet.

And restart from the step:

docker-machine create -d hyperv --hyperv-virtual-switch "myswitch" myvm1 

I believe my mistake was not being extremelly careful about the command:

docker-machine swarm init 
1 Like

Is the node which cannot be joined to the Swarm a Windows node? Have you tried to a Linux node to join swarm?

@sanunes thanks, worked for me, but what a brittle piece of software docker-machine is, silently hangs with no error message!

I only opened up port 2377 on the manager node and it started working. Thanks!
Not sure what other errors I may encounter because I have not open up other ports.

BTW, I am doing this on AWS and running CentOS (Centos 7 Linux 3.10.0-1062.4.1.el7.x86_64, Docker-CE version 19.03.4). The opening and closing of ports is done via AWS Security Groups. Not through firewalld configurations.

@seantshen 's solution worked for me on AWS instance.

If you are using aws for creating instances i have sloution for this error, go to ur ec2 instances in aws and select the instance and go to security and edit inbound rules and add custom tcp 2377 there and save and try this again…

I got the same error when trying to join a swarm cluster as a worker Used 2 VMs from Google cloud for this…

Manager node was working fine …docker info–> swarm did not give any errors. but when i try to join the worker nodes with the token … i got this error "Error response from daemon: Timeout was reached before node joined. The attempt to join the swarm will continue in the background. Use the “docker info” command to see the current swarm status of your node. " while docker info showed me
“rpc error: code = DeadlineExceeded desc = context deadline exceeded in swarm error”

tried a lot of different things finally below solution worked.

solution. -->. i used “docker swarm init --force-new-cluster”. in one of the vms i tried to join the as a worker… and then i used “docker swarm leave --force” on the existing manager node … and the joined that one as a worker to the newly created cluster. Other vm also also worked when tried to join as workers for the new cluster…

ubuntu - 18.04
docker version -20.10.17

1 Like

I have also got the same issue while trying this on a aws cloud. After that i allowed
*** Port 2377 TCP for communication with and between manager nodes**
*** Port 7946 TCP/UDP for overlay network node discovery**
*** Port 4789 UDP (configurable) for overlay network traffic**

Then i run the command docker swarm leave and the tried and got the result. Ports are not allowed in security group that’s why i got the error when i allowed the ports then i got the results.

After docker swarm init, I got the following.

docker swarm join --token SWMTKN-1-1doee9kyse6xmvedqt9x0peedkilbxb4brnsascsnmepd23vj0-8arz43emxx389hkm5p62s3h5a 192.168.0.73:2377

Replace 192.168.0.73 with my public ip solved the problem.