Have a swarm made of three manager nodes.
Node 3 recently got into a bad state and was removed and rejoined, after removing the contents of /var/lib/docker to get the docker daemon starting.
Rejoining seemed to work but a few days later node 3 seemed to drop out of the cluster, docker node ls would show it as reachable but down.
Doing the same rejoin trick no longer works and node 3 goes into a pending state.
Looking in /var/lib/docker/swarm on the other two nodes shows an entry for node 3 in state.json even when the node has been removed from the swarm.
Question,
Does state.json have any actual function, so editing out the deleted node would have an affect on the swarm?
I assume that the state of the swarm is maintained in /var/lib/docker/swarm/raft… which is not human readable, is there a tool or method to consistency check this sate file?
The swarm supports a lot of live production containers and I am reluctant to destroy the swarm and start again.