Node does not rejoin swarm after restart

nickspicer93 · December 20, 2017, 4:01pm

Description
When we restart a server we would expect the node to rejoin automatically once the service has started, however when trying to do any swarm commands we get “The swarm does not have a leader. It’s possible that too few manager are online.”

Steps to reproduce the issue:

Create a swarm with 3 manager nodes
Restart one of the non-leader nodes
Try to type any swarm commands e.g. docker node ls

Describe the results you received:
Status: Error response from daemon: rpc error: code = 2 desc = The swarm does not have a leader. It’s possible that too few managers are online. Make sure more than half of the managers are online., Code: 1

Describe the results you expected:
A list of the 3 nodes in the swarm, if it was a leader that was restarted, the leader would have switched to a new node

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:
Client:
Version: 17.06.1-ee-2
API version: 1.30
Go version: go1.8.3
Git commit: 8e43158
Built: Wed Aug 23 21:16:53 2017
OS/Arch: windows/amd64

Server:
Version: 17.06.1-ee-2
API version: 1.30 (minimum version 1.24)
Go version: go1.8.3
Git commit: 8e43158
Built: Wed Aug 23 21:25:53 2017
OS/Arch: windows/amd64
Experimental: false

Output of docker info:
Client:
Version: 17.06.1-ee-2
API version: 1.30
Go version: go1.8.3
Git commit: 8e43158
Built: Wed Aug 23 21:16:53 2017
OS/Arch: windows/amd64

Server:
Version: 17.06.1-ee-2
API version: 1.30 (minimum version 1.24)
Go version: go1.8.3
Git commit: 8e43158
Built: Wed Aug 23 21:25:53 2017
OS/Arch: windows/amd64
Experimental: false
PS C:\Windows\system32> docker info
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 12
Server Version: 17.06.1-ee-2
Storage Driver: windowsfilter
Windows:
Logging Driver: json-file
Plugins:
Volume: local
Network: l2bridge l2tunnel nat null overlay transparent
Log: awslogs etwlogs fluentd json-file logentries splunk syslog
Swarm: pending
NodeID: uanql6iga429gt177fqth9f56
Error: rpc error: code = 2 desc = The swarm does not have a leader. It’s possible that too few managers are online. Make sure more than
half of the managers are online.
Is Manager: true
Node Address: 192.168.1.87
Manager Addresses:
192.168.1.85:2377
192.168.1.86:2377
192.168.1.87:2377
Default Isolation: process
Kernel Version: 10.0 14393 (14393.1770.amd64fre.rs1_release.170917-1700)
Operating System: Windows Server 2016 Standard
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 8GiB
Name: mwdockwin-dev3
ID: 6FSY:FJC4:VGJJ:24HR:L32D:NVKR:JYO5:XZL3:HY47:O4YM:FGAI:PNFQ
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
192.168.1.120:5000
127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
All 3 boxes are on the same version.

nickspicer93 · December 21, 2017, 12:00pm

It turns out one of the nodes did not have nat network configured properly, so when one node was down the swarm got corrupted each time because we had three nodes but only two could talk to each other (I presume). This appears to be solved by cleaning up the faulty node network like so:

stop-service hns
stop-service docker
del 'C:\ProgramData\Microsoft\Windows\hns\hns.data’
start-service hns
start-service docker

Topic		Replies	Views
Swarm 1.12 with boot2docker - hosts never rejoin cluster after reboot Swarm	0	1416	July 15, 2016
Worker node doesn't reassign task after it's rebooted General swarm	3	587	December 11, 2018
Docker 19.03.12 : The swarm does not have a leader aferter swarm upgrade General docker , swarm	2	12958	May 18, 2021
Graceful restart of swarm manager leader Swarm	1	4639	August 30, 2021
Can't add third swarm manager or create overlay network - The swarm does not have a leader Swarm swarm	2	1806	May 18, 2018

Node does not rejoin swarm after restart

Related topics