Swarm init killed the connection to my host

hg8ber · March 10, 2025, 10:12pm

I thought I’d be clever and finally connect my three hosts that all run Docker daemon with roughly 30 containers in total… Big mistake it turned out

Once I ran the command ```swarm init`` it went well, I got a confirmation and the text stating how to connect workers etc. And then… my SSH connection dropped. From that point onward it’s impossible to connect to the host. I have physical access to the server so I have managed to restart it, no luck. I’ve spent this evening trying to find information on this issue.

The host machine runs rasbian with Debian 12 - bookworm. It’s running both IPv4 and IPv6. The host listens to the container ports and the host port (SSH 22). The host retrieves the static ip adress 192.168.xx.xx from my router.

Host ip adress is 192.168.xx.xx
ingress - 172.16.xx.xx
second created network - 172.16.xx.xx

I have tried disabling IPv6 on the host, no luck.
What have I done in order to mess it up like this, and even more importantly, is there a way to restore my previously working SSH connection? I would be tremendously thankful for ANY guidance, commands or hints on what to do/troubleshoot.

bluepuma77 · March 11, 2025, 5:27am

Initial thoughts: Server should not be overloaded, Docker should not interfere with ssh daemon, so most probably it’s a firewall issue. Docker v28 set wrong FW rules, that should be fixed with v28.0.1.

hg8ber · March 11, 2025, 7:14am

Thank you for the reply! I had a look on the docker version, it was v27.5.1. I ran an update and now it’s 28.0.1. Unfortunately still no luck.

However, when running the apt-get update I also received a SSH update. I don’t know if this might be the culprit here but there was an error message related to ssh.socket during the update. I have attached a photo of the error.

I found some guidance where it was suggested to disable the sshd.service and then enable it again. I was able to disable the service, though not able to enable it again

``

hg8ber · March 11, 2025, 11:56am

Ok I managed to enable the ssh.service again. This did not resolve my issue connecting via SSH to my host. I also reinstalled the whole openssh-server package, no luck.

According to the server port 22 should be open (I don’t have any firewalls installed on the server). Though when running NMAP it does not detect port 22 as open.

My distribution uses NetworkManager. I have multiple default bridge networks for the containers and also docker0 and docker_gwbridge. I also had the overlay network Ingress at first, though I removed this in favor for my other swarm network “proxynet”.

When running network inspect on proxynet i can see the physical ethernet ip address, received from my router, listed under peers. This also applied to the previous “ingress” network.

I should also mention that I only have one physical network card (I don’t use wifi or anything fancy, just plain ethernet). I read somewhere that when you initiate a swarm manager it will take control over the physical NIC if you only have one NIC. If that’s the case then how do you connect to this host/swarm manager?

bluepuma77 · March 11, 2025, 5:39pm

For sure you have a firewall, Debian installs one per default. And Docker opens some ports in the firewall automatically, for user convenience. Check iptables -S.

I just tested with two cloud VMs, installed Debian 12, updated Debian, installed Docker, created a Swarm, added a worker, all works fine for me.

root@t1:~# uname -a
Linux t1 6.1.0-31-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) x86_64 GNU/Linux

root@t1:~# docker --version
Docker version 28.0.1, build 068a01e

root@t1:~# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      865/sshd: /usr/sbin
tcp6       0      0 :::22                   :::*                    LISTEN      865/sshd: /usr/sbin
tcp6       0      0 :::2377                 :::*                    LISTEN      2902/dockerd
tcp6       0      0 :::7946                 :::*                    LISTEN      2902/dockerd
udp        0      0 0.0.0.0:68              0.0.0.0:*                           607/dhclient
udp        0      0 0.0.0.0:4789            0.0.0.0:*                           -
udp6       0      0 :::7946                 :::*                                2902/dockerd

root@t1:~# iptables -S
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-BRIDGE
-N DOCKER-CT
-N DOCKER-FORWARD
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-FORWARD
-A DOCKER ! -i docker0 -o docker0 -j DROP
-A DOCKER ! -i docker_gwbridge -o docker_gwbridge -j DROP
-A DOCKER-BRIDGE -o docker0 -j DOCKER
-A DOCKER-BRIDGE -o docker_gwbridge -j DOCKER
-A DOCKER-CT -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A DOCKER-CT -o docker_gwbridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A DOCKER-FORWARD -j DOCKER-CT
-A DOCKER-FORWARD -j DOCKER-ISOLATION-STAGE-1
-A DOCKER-FORWARD -j DOCKER-BRIDGE
-A DOCKER-FORWARD -i docker0 -j ACCEPT
-A DOCKER-FORWARD -i docker_gwbridge -o docker_gwbridge -j DROP
-A DOCKER-FORWARD -i docker_gwbridge ! -o docker_gwbridge -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker_gwbridge ! -o docker_gwbridge -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-2 -o docker_gwbridge -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-USER -j RETURN

hg8ber · March 13, 2025, 7:57am

Wow, this I didn’t know, thank you! I always assumed there was no firewall since everything always worked Checking iptables now I can confirm you are absolutely right.

Prior to your response here I decided to remove all of my various docker networks that I was able to remove. I then recreated a swarm overlay network and strangely enough everything worked as intended. I now have 2 managers and one worker talking to each other

Since it started to work I continued on my path to migrate from NPM to Traefik. There has been quite a few hiccups. I’ve tried various setups, read various swarm and traefik instructions and to my surprise your name appears in literally 90% of the various forum threads

I have done most of the beginners mistakes, though the indirect guidance from you have been tremendous. I’m at a point where I am able to login to the Traefik dashboad, I have several containers that I have transformed to services and they are behind the reverse proxy.

However I find it extremely flaky. My adguard on host 2 (swarm manager) works great. It has received a certificate and I’m able to access it as intended. Adguard on host 1 (also a swarm manager) doesn’t work as intended. I had to change the listening ports on the host in order for it not to conflict with adguard on host 2. When trying to reach it i receive a “Bad gateway”.
I thought this might have to do with the fact that it’s on a different host so I started up vaultwarden on host 2 (which worked great for adguard earlier). Unfortunately I receive a “bad gateway” for vaultwarden as well.

Now to make it even more strange I also set up a service for actualbudget on host 2. This works great with SSL enabled and all.

I spent most of yesterday trying to figure out what’s going on. I’ve re-read everything multiple times. Frankly, I’m at a point where I’ve reviewed the various traefik setups and I just can’t find what’s wrong. It’s literally driving me insane.
Considering your great expertise in this matter, would you by any chance be kind enough and help me identify what’s missing/incorrent in my setup please?

bluepuma77 · March 13, 2025, 8:14am

When using Docker Swarm, you should usually have 3 manager nodes to get HA and a quorum. Every manager can do regular worker tasks. You can promote the worker.

Traefik and “Bad gateway” is mostly associated with using multiple Docker networks, not all containers sharing the same, and not using Traefik swarm.network setting to specify the one to use.

Check simple Traefik Swarm example. Note that easy LetsEncrypt (httpChallenge , tlsChallenge) only works with a single Traefik instance running, so you would need to constrain to a single host (not HA) or use replicas: 1. When using multiple Traefik instances in parallel, you need to use dnsChallenge (example).

hg8ber · March 14, 2025, 7:21am

Thanks for the explanation. I believe I need to read up a bit on the swarm capabilities and how to think when utilizing it. Currently I’m unable to make use of my second DNS server (adguard home on second node) since it runs on the same ports as my first DNS server (first node). When trying to deploy it i twon’t work due to this.

Your github is most excellent! I actually already found it earlier and had followed your configuration example setup, though I managed to overlook the dnsChallenge setup you had lying there. I verified my setup and compared it to yours and well, it looked good… strangely enough!

A couple of hours later I thought I’d try one last thing… to change the loadbalancer.server.port. Well, I was amazed to see that did the trick! I was under the impression that the loadbalancer.server.port value ought to be equivalent to the host port. After changing it to the container port it worked This worked for every single service I had issues with, apart from Duplicati. As I also experience a couple of hiccups on the Vaultwarden container I believe this has to do with my .env-files not being read.

I’ve literally just relaunched vw using secrets instead to hopefully solve the .env issues I’m experiencing. This appear to have solved it

Many thanks for all your help and previous outstanding work!!

Have a wonderful weekend!

system · April 13, 2025, 7:56am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ipv6, kill it with fire General	0	845	May 21, 2017
Can not join swarm General	3	4775	August 7, 2022
Stuck on Getting Started tutorial part 3 General docker	2	2074	April 5, 2019
Problems with 1 active and 2 deatived network cards in my Docker Swarm Host Swarm docker , swarm	2	503	September 3, 2020
Incorrect Status.Address for nodes joining from / to guest bridged OS Swarm docker , swarm	2	55	August 10, 2024

Swarm init killed the connection to my host

Related topics