Host not responding on Swarm leave

We have 3 cluster node elasticsearch setup via docker-compose in a swarm with node 01 as manager and 02 and 03 as worker nodes.

I left the worker nodes 02 & 03 using docker swarm leave & node 01, It gave me an alert
"you are currently attempting to leave swarm on a node as a manager. Removing the last manager erases all current state of teh swarm. use --force to ignore this message.

and then ran ‘docker swarm leave --force’ on the manager node.

The application https://:/_cluster/heath was accessible before but after leaving the node it became unreachable . Please assist.

I don’t understand. Why would you remove the only manager node from the cluster? Now you have workers without a manager and you can’t just add another manager. And you have a manager which is not part of any cluster so that is not a manager anymore. I’m not a swarm user, but it seems you lost your cluster and you won’t be able to restore it.

I know but it has been done now. We restored the server to a preious snapshot but still issue persists. Please share any possible solutions

P.S.- I can still get into the docker by running docker exec -it es01 sh & can see docker ps -a
so i’m unsure if this is the right behavior but the URL is inaccessible.

Chat gpt suggests below. Pleae share inputs if below should be followed
When you run docker swarm leave --force on the manager node and leave the swarm, it effectively destroys the swarm configuration and all the services managed by that swarm. This is why your Elasticsearch cluster became inaccessible. Here’s how you can address this issue:

  1. docker swarm init
  2. add worker nodes
  3. Recreate Elasticsearch Services:**
  • Ensure you have your docker-compose.yml file which defines the Elasticsearch services.
  • Deploy the Elasticsearch stack again using the docker stack deploy command from the manager node docker stack deploy -c docker-compose.yml

@rimelek

It basically says create a new cluster and yes, that is what you will need. When a cluster loses all the manager nodes, the existing containers cans till run as there is nothing to stop those, but if any service depends on that it runs in a Swarm cluster and it crashes becouse basic swarm features don’t work anymore, that could lead to unexpected behavior. Unless something deletes the containers, even if you can’t execute commands in the containers, you can still try to identify where the data is for which container. Hopefully on volumes, but you can find the filesystem of containers too if you use docker inspect. And you can use docker export to export the filesystem of a container to a tar file or just use docker cp to copy the filesystem out even if the container is not running but exists.

If you backup all data and you can remount the same data in a new cluster, you can restore the system, but that will be a new cluster.

Ok, so restoring server to a previous snapshot wouldn’t have helped?

I can get into docker with docker exec -it es01 sh Can see/start docker container nodes and the config docker path /usr/share/elasticsearch/config/ is mounted as a local docker volume on each node allowing me to modify its contents from the host. Can run docker inspect <container-ID> to view its contents.

sorry, i’m entirely new to docker. if you can please share the steps to folow to mount and restore as before. @rimelek

You might need to restore all servers to the same snapshot timestamp, not sure.

To create a new cluster, I would run docker swarm leave on nodes, then create a new cluster on manager and join the nodes again. This is un-tested, your current containers might continue to run, then you need to manually remove them. Then re-create your services on Docker Swarm. Make a backup before, if you handle important data.

@rimelek @bluepuma77 noticed the URL for 01 came up but we don’t have the nodes displayed for 02 & 03 when running docker ps -a.

Steps that were performed

  1. On node 01 docker swarm init
  2. On node 01 docker network create -d overlay --attachable elastic
  3. On node 01 ‘sudo systemctl restart docker and then restarted the nodes’ .
    But when i tried steps 3) on node 02/03, 02 & 03 urls for them still didn’t work.

You need to run docker swarm join on the workers with a new/current token from manager.