SWARM : Some ports are exposed, some other not

Hi guys,

I’m quite noob to docker so sorry if answer is obvious. I’ve been reading a lot without getting a proper answer so I’m posting here in hope someone can help me understand.

I have a 3 manager (RasPi4B) + 1 worker swarm (RasPi5), all freshly built on Raspberry Bookworm. I’m running the latest docker version. I must say that I already have a lot of stacks working perfectly well. This one is the last one I need to finalize my home infrastructure.

When deploying following stack via portainer, I can fully DNS request each of the nodes individually with their own Host IP. Even the DoT is answering properly. But web interface (18006:80 or 18007:443) are unavailable and, as data are already populated in volumes, I believe 18008:3000 is of no use (only serves for first setup).

I’ve also carefully chosen port numbers in my swarm so they would never already be used by any other service/node/Host OS

Here is the stack :

---
version: "3.8"

services:
   agh1:
     image: adguard/adguardhome
     ports:
     - target: 80
       published: 18006
       protocol: tcp
       mode: host
     - target: 443
       published: 18007
       protocol: tcp
       mode: host
     - target: 3000
       published: 18008
       protocol: tcp
       mode: host
     - target: 853
       published: 853
       protocol: tcp
       mode: host
     - target: 53
       published: 53
       protocol: tcp
       mode: host
     - target: 53
       published: 53
       protocol: udp
       mode: host
     deploy:
       placement:
         constraints: [node.hostname == RasPi1]
     volumes:
       - agh1-work:/opt/adguardhome/work
       - agh1-conf:/opt/adguardhome/conf
       - certs:/certs

   agh2:
     image: adguard/adguardhome
     ports:
     - target: 80
       published: 18006
       protocol: tcp
       mode: host
     - target: 443
       published: 18007
       protocol: tcp
       mode: host
     - target: 3000
       published: 18008
       protocol: tcp
       mode: host
     - target: 853
       published: 853
       protocol: tcp
       mode: host
     - target: 53
       published: 53
       protocol: tcp
       mode: host
     - target: 53
       published: 53
       protocol: udp
       mode: host
     deploy:   
       mode: replicated
       replicas: 1
       placement:
         constraints: [node.hostname == RasPi2]
     volumes:
       - agh2-work:/opt/adguardhome/work
       - agh2-conf:/opt/adguardhome/conf
       - certs:/certs

   agh3:
     image: adguard/adguardhome
     ports:
     - target: 80
       published: 18006
       protocol: tcp
       mode: host
     - target: 443
       published: 18007
       protocol: tcp
       mode: host
     - target: 3000
       published: 18008
       protocol: tcp
       mode: host
     - target: 853
       published: 853
       protocol: tcp
       mode: host
     - target: 53
       published: 53
       protocol: tcp
       mode: host
     - target: 53
       published: 53
       protocol: udp
       mode: host
     deploy:   
       mode: replicated
       replicas: 1
       placement:
         constraints: [node.hostname == RasPi3]
     volumes:
       - agh3-work:/opt/adguardhome/work
       - agh3-conf:/opt/adguardhome/conf
       - certs:/certs    

   agh4:
     image: adguard/adguardhome
     ports:
     - target: 80
       published: 18006
       protocol: tcp
       mode: host
     - target: 443
       published: 18007
       protocol: tcp
       mode: host
     - target: 3000
       published: 18008
       protocol: tcp
       mode: host
     - target: 853
       published: 853
       protocol: tcp
       mode: host
     - target: 53
       published: 53
       protocol: tcp
       mode: host
     - target: 53
       published: 53
       protocol: udp
       mode: host
     deploy:   
       mode: replicated
       replicas: 1
       placement:
         constraints: [node.hostname == RasPi4]
     volumes:
       - agh4-work:/opt/adguardhome/work
       - agh4-conf:/opt/adguardhome/conf
       - certs:/certs       
       
volumes:
  agh1-conf:
    driver: local
    driver_opts:
      type: nfs
      o: "addr=192.168.2.1,soft,nolock,noatime,rsize=8192,wsize=8192,tcp,timeo=14,nfsvers=4"
      device: ":/volume1/nfs/docker/agh/1/conf"
  agh1-work:
    driver: local
    driver_opts:
      type: nfs
      o: "addr=192.168.2.1,soft,nolock,noatime,rsize=8192,wsize=8192,tcp,timeo=14,nfsvers=4"
      device: ":/volume1/nfs/docker/agh/1/work"
  agh2-conf:
    driver: local
    driver_opts:
      type: nfs
      o: "addr=192.168.2.1,soft,nolock,noatime,rsize=8192,wsize=8192,tcp,timeo=14,nfsvers=4"
      device: ":/volume1/nfs/docker/agh/2/conf"
  agh2-work:
    driver: local
    driver_opts:
      type: nfs
      o: "addr=192.168.2.1,soft,nolock,noatime,rsize=8192,wsize=8192,tcp,timeo=14,nfsvers=4"
      device: ":/volume1/nfs/docker/agh/2/work"   
  agh3-conf:
    driver: local
    driver_opts:
      type: nfs
      o: "addr=192.168.2.1,soft,nolock,noatime,rsize=8192,wsize=8192,tcp,timeo=14,nfsvers=4"
      device: ":/volume1/nfs/docker/agh/3/conf"
  agh3-work:
    driver: local
    driver_opts:
      type: nfs
      o: "addr=192.168.2.1,soft,nolock,noatime,rsize=8192,wsize=8192,tcp,timeo=14,nfsvers=4"
      device: ":/volume1/nfs/docker/agh/3/work"     
  agh4-conf:
    driver: local
    driver_opts:
      type: nfs
      o: "addr=192.168.2.1,soft,nolock,noatime,rsize=8192,wsize=8192,tcp,timeo=14,nfsvers=4"
      device: ":/volume1/nfs/docker/agh/4/conf"
  agh4-work:
    driver: local
    driver_opts:
      type: nfs
      o: "addr=192.168.2.1,soft,nolock,noatime,rsize=8192,wsize=8192,tcp,timeo=14,nfsvers=4"
      device: ":/volume1/nfs/docker/agh/4/work"       
  certs:
    driver: local
    driver_opts:
      type: nfs
      o: "addr=192.168.2.1,soft,nolock,noatime,rsize=8192,wsize=8192,tcp,timeo=14,nfsvers=4"
      device: ":/volume1/nfs/certs"

So it seems that 53 and 853 are properly exposed and 18006, 18007 and 18008 are not properly exposed or at least not working as I’d expect. So how is this possible ? I don’t even know how to troubleshoot this…

Here is the output of

sudo ss -lntp
State                                                 Recv-Q                                                Send-Q                                                                                               Local Address:Port                                                                                                Peer Address:Port                                               Process
LISTEN                                                0                                                     4096                                                                                                       0.0.0.0:18008                                                                                                    0.0.0.0:*                                                   users:(("docker-proxy",pid=1161802,fd=4))
LISTEN                                                0                                                     4096                                                                                                       0.0.0.0:18006                                                                                                    0.0.0.0:*                                                   users:(("docker-proxy",pid=1161885,fd=4))
LISTEN                                                0                                                     4096                                                                                                       0.0.0.0:18007                                                                                                    0.0.0.0:*                                                   users:(("docker-proxy",pid=1161853,fd=4))
LISTEN                                                0                                                     4096                                                                                                       0.0.0.0:18001                                                                                                    0.0.0.0:*                                                   users:(("docker-proxy",pid=2740,fd=4))
LISTEN                                                0                                                     4096                                                                                                       0.0.0.0:18002                                                                                                    0.0.0.0:*                                                   users:(("docker-proxy",pid=2703,fd=4))
LISTEN                                                0                                                     4096                                                                                                       0.0.0.0:18003                                                                                                    0.0.0.0:*                                                   users:(("docker-proxy",pid=2573,fd=4))
LISTEN                                                0                                                     128                                                                                                        0.0.0.0:22000                                                                                                    0.0.0.0:*                                                   users:(("sshd",pid=815,fd=3))
LISTEN                                                0                                                     4096                                                                                                       0.0.0.0:853                                                                                                      0.0.0.0:*                                                   users:(("docker-proxy",pid=1161833,fd=4))
LISTEN                                                0                                                     4096                                                                                                       0.0.0.0:53                                                                                                       0.0.0.0:*                                                   users:(("docker-proxy",pid=1161906,fd=4))
LISTEN                                                0                                                     64                                                                                                         0.0.0.0:43465                                                                                                    0.0.0.0:*
LISTEN                                                0                                                     4096                                                                                                             *:18012                                                                                                          *:*                                                   users:(("dockerd",pid=845,fd=126))
LISTEN                                                0                                                     4096                                                                                                             *:18013                                                                                                          *:*                                                   users:(("dockerd",pid=845,fd=135))
LISTEN                                                0                                                     4096                                                                                                             *:18014                                                                                                          *:*                                                   users:(("dockerd",pid=845,fd=105))
LISTEN                                                0                                                     4096                                                                                                             *:18015                                                                                                          *:*                                                   users:(("dockerd",pid=845,fd=106))
LISTEN                                                0                                                     4096                                                                                                          [::]:18008                                                                                                       [::]:*                                                   users:(("docker-proxy",pid=1161812,fd=4))
LISTEN                                                1                                                     4096                                                                                                             *:18010                                                                                                          *:*                                                   users:(("dockerd",pid=845,fd=61))
LISTEN                                                0                                                     4096                                                                                                             *:18011                                                                                                          *:*                                                   users:(("dockerd",pid=845,fd=103))
LISTEN                                                0                                                     4096                                                                                                             *:18005                                                                                                          *:*                                                   users:(("dockerd",pid=845,fd=32))
LISTEN                                                0                                                     4096                                                                                                          [::]:18006                                                                                                       [::]:*                                                   users:(("docker-proxy",pid=1161893,fd=4))
LISTEN                                                0                                                     4096                                                                                                          [::]:18007                                                                                                       [::]:*                                                   users:(("docker-proxy",pid=1161861,fd=4))
LISTEN                                                0                                                     4096                                                                                                             *:18000                                                                                                          *:*                                                   users:(("dockerd",pid=845,fd=59))
LISTEN                                                0                                                     4096                                                                                                          [::]:18001                                                                                                       [::]:*                                                   users:(("docker-proxy",pid=2748,fd=4))
LISTEN                                                0                                                     4096                                                                                                          [::]:18002                                                                                                       [::]:*                                                   users:(("docker-proxy",pid=2712,fd=4))
LISTEN                                                0                                                     4096                                                                                                          [::]:18003                                                                                                       [::]:*                                                   users:(("docker-proxy",pid=2656,fd=4))
LISTEN                                                0                                                     4096                                                                                                             *:18017                                                                                                          *:*                                                   users:(("dockerd",pid=845,fd=144))
LISTEN                                                0                                                     4096                                                                                                             *:18018                                                                                                          *:*                                                   users:(("dockerd",pid=845,fd=85))
LISTEN                                                0                                                     128                                                                                                           [::]:22000                                                                                                       [::]:*                                                   users:(("sshd",pid=815,fd=4))
LISTEN                                                0                                                     64                                                                                                            [::]:45977                                                                                                       [::]:*
LISTEN                                                0                                                     4096                                                                                                          [::]:853                                                                                                         [::]:*                                                   users:(("docker-proxy",pid=1161840,fd=4))
LISTEN                                                0                                                     4096                                                                                                          [::]:53                                                                                                          [::]:*                                                   users:(("docker-proxy",pid=1161913,fd=4))
LISTEN                                                0                                                     4096                                                                                                             *:7946                                                                                                           *:*                                                   users:(("dockerd",pid=845,fd=33))
LISTEN                                                0                                                     4096                                                                                                             *:2377                                                                                                           *:*                                                   users:(("dockerd",pid=845,fd=23))

In the end my objective is to have 4 separate adguardhome instances behind a VIP so they back-up each other in case Master is going down. I had it working with 4 containers deployed independantly on each node but it’s a pain to manage, where with the stack, I only need one file to do everything.

Thanks in advance for your help !!

I am surprised this part of the compose yaml is even valid:

services:
   agh1:
     image: adguard/adguardhome
     network_mode: host

Even if it throws no error, it makes no sense: either you use the host network, or you publish ports.

When you publish port in mode: host, they will only bind the port on host where the container is running. Since you used a placement constraint it will only be valid for that host. Thus, publishing the same ports on each node should work without any issues.

Sorry…

     network_mode: host

…that was a typo, I’m doing many many tests and I forgot to remove this one…

My post is fixed, not mentionning this anymore.

When you publish port in mode: host , they will only bind the port on host where the container is running. Since you used a placement constraint it will only be valid for that host. Thus, publishing the same ports on each node should work without any issues.

That’s exactly my intention and it works for DNS and DoT, not html ports, hence my need for help

It is odd that it works for some port, but not others. It should either work for all of them, or none.

I am using traefik as layer 7 http reverse proxy in deploy.mode: global and ports[].mode: host. It works without any flaws. So what you experience is indeed unexpected behavior, though I doubt that swarm is the one that prevents http traffic to reach the container.

Is there no trace of a problem inside the logs? Did you try to access the published port using curl http://localhost:18006, curl https://localhost:18007 and curl http://localhost:18008 on the respective node?

Logs are fine from what I can see

2024/01/12 23:00:53.137297 [info] AdGuard Home, version v0.107.43
2024/01/12 23:00:53.170888 [info] tls: using default ciphers
2024/01/12 23:00:53.197669 [info] safesearch default: reset 253 rules
2024/01/12 23:00:53.413397 [info] Initializing auth module: /opt/adguardhome/work/data/sessions.db
2024/01/12 23:00:53.417291 [info] auth: initialized.  users:1  sessions:11
2024/01/12 23:00:53.425792 [info] tls: number of certs: 4
2024/01/12 23:00:53.425918 [info] tls: got an intermediate cert
2024/01/12 23:00:53.425981 [info] tls: got an intermediate cert
2024/01/12 23:00:53.426046 [info] tls: got an intermediate cert
2024/01/12 23:00:53.777541 [info] AdGuard Home updates are disabled
2024/01/12 23:00:53.777576 [info] web: initializing
2024/01/12 23:00:53.826002 [info] dnsproxy: cache: enabled, size 4096 b
2024/01/12 23:00:53.826049 [info] dnsproxy: max goroutines is set to 300
2024/01/12 23:00:53.830034 [info] clients: processing addresses
2024/01/12 23:00:53.830798 [info] AdGuard Home is available at the following addresses:
2024/01/12 23:00:53.832074 [info] go to http://127.0.0.1:18006
2024/01/12 23:00:53.832171 [info] go to http://10.0.31.26:18006
2024/01/12 23:00:53.832182 [info] go to http://172.18.0.7:18006
2024/01/12 23:00:53.834708 [info] go to https://[my_domain]:18007
2024/01/12 23:00:54.402032 [info] dnsproxy: starting dns proxy server
2024/01/12 23:00:54.402267 [info] Ratelimit is enabled and set to 20 rps, IPv4 subnet mask len 24, IPv6 subnet mask len 56
2024/01/12 23:00:54.402352 [info] The server is configured to refuse ANY requests
2024/01/12 23:00:54.402415 [info] dnsproxy: cache: enabled, size 16194304 b
2024/01/12 23:00:54.402521 [info] dnsproxy: max goroutines is set to 300
2024/01/12 23:00:54.402732 [info] dnsproxy: creating udp server socket 0.0.0.0:53
2024/01/12 23:00:54.403081 [info] dnsproxy: listening to udp://[::]:53
2024/01/12 23:00:54.403176 [info] dnsproxy: creating tcp server socket 0.0.0.0:53
2024/01/12 23:00:54.403331 [info] dnsproxy: listening to tcp://[::]:53
2024/01/12 23:00:54.403398 [info] dnsproxy: creating tls server socket 0.0.0.0:853
2024/01/12 23:00:54.403510 [info] dnsproxy: listening to tls://[::]:853
2024/01/12 23:00:54.403576 [info] Creating a QUIC listener
2024/01/12 23:00:54.413265 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details.
2024/01/12 23:00:54.413891 [info] Listening to quic://[::]:853
2024/01/12 23:00:54.414260 [info] dnsproxy: entering tcp listener loop on [::]:53
2024/01/12 23:00:54.414538 [info] Entering the DNS-over-QUIC listener loop on [::]:853
2024/01/12 23:00:54.414544 [info] dnsproxy: entering tls listener loop on [::]:853
2024/01/12 23:00:54.415048 [info] dnsproxy: entering udp listener loop on [::]:53

The only difference when I ran those adguardhome instances in local containers, logs where showing one more line saying

[info] AdGuard Home is available at the following addresses:
[info] go to http://192.168.0.201:18006

Here are the output for curl commands

pi@RasPi1:~ $ curl http://localhost:18006/
curl: (7) Failed to connect to localhost port 18006 after 0 ms: Couldn't connect to server
pi@RasPi1:~ $ curl https://localhost:18007/
curl: (7) Failed to connect to localhost port 18007 after 1 ms: Couldn't connect to server
pi@RasPi1:~ $ curl http://localhost:18008
curl: (7) Failed to connect to localhost port 18008 after 0 ms: Couldn't connect to server

I just rebooted my 4 RasPi and behavior is still the same.

curl http://172.18.0.7:18006 is answering

pi@RasPi1:~ $ curl http://172.18.0.7:18006
<a href="/login.html">Found</a>.

All other URL are failing

[info] go to http://127.0.0.1:18006
[info] go to http://10.0.31.26:18006
[info] go to https://[my_domain]:18007

To be sure I changed all the 1800x ports with something else like 3800x but it didn’t change a thing, my instances are still not web-condigurable but they still answer to DNS or DoT requests…

Are your containers all up and running? You are missing some replicas:.

In general I would say it is not best practice to use Swarm and pin all the services to nodes and expose the ports locally. You lose any failover capabilities.

Have you thought about placing a proxy in front of your services? Then only the proxy needs to be pinned to a node, other services can roam freely, you can set mode: global to have one instance on every node or use replicas: to scale even to more that your number of nodes. Constraints can still be applied. See simple Traefik Swarm example.

Yes, my containers are all up and running with my solution.

I’ve come to that solution because I can’t get adguardhome to work properly in swarm mode (I started docker and docker swarm a week ago so bare with me, I’m sure I’m missing a lot of things…).

In fact I have the same issue with containers having a database. Homarr, uptime, overseer, vaultwarden are all not working in swarm mode…as soon as I have more than one instance, the second one can’t access database and cannot start or is starting but not working. I thought having NFS volumes would help but it doesn’t.

On the other hand I’m running Nginx Proxy Manager, this one is working properly in swarm mode, as a global service and is benefitting of the redundancy provided by docker.

Above all that, I have a Keepalived service running on each node, so every (working) service is responding behind one global IP.

I’ll try to adapt the traefik template you gave me, thanks for your help.

Of course, you can access the container using the container ip on the node it’s running, as the host has an interface to the target networks as well.

With published ports, you should be able to access the published port on every ip the host has, unless a firewall or bug prevents it.

If no firewall is enabled, and it still does not work, I would suggest raising an issue in the upstream moby project: https://github.com/moby/moby/issues.

Yeah, high-availability is a complex beast.

What do you want to achieve with your multiple adguards, is it for different environments?

The challenge for HA lies on different levels:

  1. Your domain must resolve to an IP. We use a managed load balancer for that, you can also do keepalived. Multiple IPs in A record will not solve this, as the (browser) client picks a random one and will not update if it’s unavailable.

  2. The request to your domain needs to be forwarded to the appropriate application. Traefik can handle that, it has automatic Configuration Discovery for Docker services with labels. You don’t need to update Traefik config when you launch a new target service. For this Traefik must run on Swarm manager node or you use a Docker Socket Proxy on Swarm manager.

Traefik can create LetsEncrypt certs. But only the paid version can handle clustered LE (multiple instances), for open source you need workarounds.

  1. The applications are connected to the same internal Docker network, do not need to expose ports on the nodes. Traefik will just forward requests to app instances round-robin.

  2. Storage is the next problem. NFS to a NAS has the same single point of failure. For files you should use some kind of replicated storage. Minio can run in a cluster, as can an underlying GlusterFS or CephFS. Swarm now support CSI API for k8s storage providers, I would call the whole thing rather experimental.

  3. For databases it gets more complicated, as they should not use network storage but rather local storage. Especially do not let multiple instances run on the same shared folder, that might corrupt your DB. Instead the database needs to run in cluster or mirror mode. Kind of easy for MongoDB, harder for Postgres.

To make this mostly right we have been fighting for 2 years with Docker Swarm. For us devs this still seems easier than running k8s on our bare metal. Because of our customers we can’t use an arbitrary cloud with managed services.

Ok, I already have a bunch of stacks working properly with docker redundancy and NFS volumes. All of this is behind a keepalived VIP. This works fine (even from outside) so I can say that I’m able to undetstand port forwarding and network basics. I must say I also have my own wildcard certificate, when I started, let’s encrypt was a pain when you have plenty of hosts…

On each Host I’ve installed an adguardhome instance (as standalone containers) and I have another stack called adguardhome-sync that copy instance1 conf to the others.

So everyhting, including my DNS servers, my Nginx-proxy-manager, docker swarm is responding to my VIP’s IP address

When one of the Host is going down keepalived gives the VIP to another Host and everyhting is working fine.

My only objective here is to try to replicate my 4 differents standalone containers in one stack template to simplify the deployment.

Everyhting you mentioned is too complex for my home usage. But thank you for having taken the time

So yor goal is to deploy 4 different AdGuard instances with different settings?

No, my 4 instances share the same configuration.

I configure the first one, and adguardhome-sync copies it to the other instances.

Can we get back to helicopter view: what do you want to achieve? Why have 4 adguards?

Hi,

First I need redundancy for adguardhome because I use it as a DNS servers for my LAN but also as a DoT for my mobile phones while away from home (including my wife’s smartphone).

With that said, I don’t want/need 4 instances, but my cluster is made of 4 machines and as my VIP can roam from a machine to another, and I didn’t manage to get it working as a swarm service, I ended up installing one instance per physical machine as a workaround.

This way, even if one of the node gets down, I don’t loose DNS. It works as it is, I asked you guys for help because I wanted to have a “cleaner” setup having everyhting running as a swarm service instead of standalone containers …

If you use deploy mode: global it will create a single instance on every swarm node. You can limit this with deploy placement constraints.

But then there is the new question: how to sync them.

How do you manage your VIP?

Out of curiosity: why use DoT (“DNS-over-TLS”) and not a full WireGuard VPN?

I tried the deploy mode: global, one instance starts on every node but only one instance can access the database, other instances state they can’t access the DB. And if the working instance goes down, others won’t connect to the DB…

There is adguardhome-sync, which is another software that can copy the conf from the main instance to many replicas. This is working great as long as every instance is working (accessing DB). With AGH in global mode, instances not connected to DB obviously won’t be able to replicate the main conf…

For the VIP I’m using keepalived on each node. They all share a common IP and I check docker service status. As soon as a service goes down, I give the shared IP to another host.

I’m using DoT because my wife can’t bear with technology so using a vpn is a nogo. I just activate “private DNS” on our Androids and we gets rid of ads without any fuss