Docker Community Forums

Share and learn in the Docker community.

Node does not rejoin swarm after restart


(Nickspicer93) #1

Description
When we restart a server we would expect the node to rejoin automatically once the service has started, however when trying to do any swarm commands we get “The swarm does not have a leader. It’s possible that too few manager are online.”

Steps to reproduce the issue:

  1. Create a swarm with 3 manager nodes
  2. Restart one of the non-leader nodes
  3. Try to type any swarm commands e.g. docker node ls

Describe the results you received:
Status: Error response from daemon: rpc error: code = 2 desc = The swarm does not have a leader. It’s possible that too few managers are online. Make sure more than half of the managers are online., Code: 1

Describe the results you expected:
A list of the 3 nodes in the swarm, if it was a leader that was restarted, the leader would have switched to a new node

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:
Client:
Version: 17.06.1-ee-2
API version: 1.30
Go version: go1.8.3
Git commit: 8e43158
Built: Wed Aug 23 21:16:53 2017
OS/Arch: windows/amd64

Server:
Version: 17.06.1-ee-2
API version: 1.30 (minimum version 1.24)
Go version: go1.8.3
Git commit: 8e43158
Built: Wed Aug 23 21:25:53 2017
OS/Arch: windows/amd64
Experimental: false

Output of docker info:
Client:
Version: 17.06.1-ee-2
API version: 1.30
Go version: go1.8.3
Git commit: 8e43158
Built: Wed Aug 23 21:16:53 2017
OS/Arch: windows/amd64

Server:
Version: 17.06.1-ee-2
API version: 1.30 (minimum version 1.24)
Go version: go1.8.3
Git commit: 8e43158
Built: Wed Aug 23 21:25:53 2017
OS/Arch: windows/amd64
Experimental: false
PS C:\Windows\system32> docker info
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 12
Server Version: 17.06.1-ee-2
Storage Driver: windowsfilter
Windows:
Logging Driver: json-file
Plugins:
Volume: local
Network: l2bridge l2tunnel nat null overlay transparent
Log: awslogs etwlogs fluentd json-file logentries splunk syslog
Swarm: pending
NodeID: uanql6iga429gt177fqth9f56
Error: rpc error: code = 2 desc = The swarm does not have a leader. It’s possible that too few managers are online. Make sure more than
half of the managers are online.
Is Manager: true
Node Address: 192.168.1.87
Manager Addresses:
192.168.1.85:2377
192.168.1.86:2377
192.168.1.87:2377
Default Isolation: process
Kernel Version: 10.0 14393 (14393.1770.amd64fre.rs1_release.170917-1700)
Operating System: Windows Server 2016 Standard
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 8GiB
Name: mwdockwin-dev3
ID: 6FSY:FJC4:VGJJ:24HR:L32D:NVKR:JYO5:XZL3:HY47:O4YM:FGAI:PNFQ
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
192.168.1.120:5000
127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
All 3 boxes are on the same version.


(Nickspicer93) #2

It turns out one of the nodes did not have nat network configured properly, so when one node was down the swarm got corrupted each time because we had three nodes but only two could talk to each other (I presume). This appears to be solved by cleaning up the faulty node network like so:

stop-service hns
stop-service docker
del 'C:\ProgramData\Microsoft\Windows\hns\hns.data’
start-service hns
start-service docker