Docker Swarm - Issue with Routing Mesh not routing Published Port on all Nodes

Background

I have a 3 node Docker Swarm running with one Manager node utilizing Overlay networking. The Ingress overlay network configuration is default configuration.

(Manager) Node 1: 192.168.7.236
Node 2: 192.168.7.237
Node 3: 192.168.7.238

I have deployed a Service for a Minecraft server that runs one replica/container on a single node that is publishing TCP Port 25565 to the Ingress overlay network and is also using its own overlay network: minecraft_network.

Overlay networking is working as I can deploy containers on all three nodes in the minecraft_network overlay network and they can ping each other and the external world.

Issue

From my understanding, when I publish a port on a service, that service should be reachable externally from all 3 Docker nodes via Routing Mesh. This is not the case, the service is only reachable externally on the Docker node that is running the Container. Output below from running a test via nc:

➜  ~ nc -z -v 192.168.7.236 25565
Connection to 192.168.7.236 port 25565 [tcp/*] succeeded!
➜  ~ nc -z -v 192.168.7.237 25565
nc: connectx to 192.168.7.237 port 25565 (tcp) failed: Operation timed out
➜  ~ nc -z -v 192.168.7.238 25565
nc: connectx to 192.168.7.238 port 25565 (tcp) failed: Operation timed out

Everything looks correct when I inspect the networks, service and even iptables is showing the port is permitted (by Docker automatically adding it) on each Node.

I could really use some help understanding what I am doing wrong for this to fail.

Environment:

Application Details:

  • Operating System: VMware Photon 4.0 Revision 2
  • Docker version 20.10.11, build dea9396.

Cluster:

root@docker01 [ ~ ]# docker node ls
ID                            HOSTNAME                  STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
txjvdmo2s1nj69u6gig01gdeo *   docker01.<redacted>   Ready     Active         Leader           20.10.11
g8uw56gqr7a9eqxhxvwp415ug     docker02.<redacted>   Ready     Active                          20.10.11
5wcwrkjbhz8t4rqxujca4c0xo     docker03.<redacted>   Ready     Active                          20.10.11

IP Tables Output:

Node 1:

root@docker01 [ ~ ]# iptables -S
-P INPUT DROP
-P FORWARD DROP
-P OUTPUT DROP
-N DOCKER
-N DOCKER-INGRESS
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-A INPUT -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 2222 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 2377 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 7946 -j ACCEPT
-A INPUT -p udp -m udp --dport 7946 -j ACCEPT
-A INPUT -p udp -m udp --dport 4789 -j ACCEPT
-A INPUT -d 224.0.0.0/8 -i eth0 -j ACCEPT
-A INPUT -i eth0 -p vrrp -j ACCEPT
-A INPUT -i eth0 -p icmp -j ACCEPT
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-INGRESS
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -o docker_gwbridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker_gwbridge -j DOCKER
-A FORWARD -i docker_gwbridge ! -o docker_gwbridge -j ACCEPT
-A FORWARD -i docker_gwbridge -o docker_gwbridge -j DROP
-A OUTPUT -j ACCEPT
-A DOCKER-INGRESS -p tcp -m tcp --dport 25565 -j ACCEPT
-A DOCKER-INGRESS -p tcp -m state --state RELATED,ESTABLISHED -m tcp --sport 25565 -j ACCEPT
-A DOCKER-INGRESS -j RETURN
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker_gwbridge ! -o docker_gwbridge -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker_gwbridge -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN

Node 2:

root@docker02 [ ~ ]# iptables -S
-P INPUT DROP
-P FORWARD DROP
-P OUTPUT DROP
-N DOCKER
-N DOCKER-INGRESS
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-A INPUT -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 2222 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 2377 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 7946 -j ACCEPT
-A INPUT -p udp -m udp --dport 7946 -j ACCEPT
-A INPUT -p udp -m udp --dport 4789 -j ACCEPT
-A INPUT -d 224.0.0.0/8 -i eth0 -j ACCEPT
-A INPUT -i eth0 -p vrrp -j ACCEPT
-A INPUT -i eth0 -p icmp -j ACCEPT
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-INGRESS
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -o docker_gwbridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker_gwbridge -j DOCKER
-A FORWARD -i docker_gwbridge ! -o docker_gwbridge -j ACCEPT
-A FORWARD -i docker_gwbridge -o docker_gwbridge -j DROP
-A OUTPUT -j ACCEPT
-A DOCKER-INGRESS -p tcp -m tcp --dport 25565 -j ACCEPT
-A DOCKER-INGRESS -p tcp -m state --state RELATED,ESTABLISHED -m tcp --sport 25565 -j ACCEPT
-A DOCKER-INGRESS -j RETURN
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker_gwbridge ! -o docker_gwbridge -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker_gwbridge -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN

Node 3:

root@docker03 [ ~ ]# iptables -S
-P INPUT DROP
-P FORWARD DROP
-P OUTPUT DROP
-N DOCKER
-N DOCKER-INGRESS
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-A INPUT -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 2222 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 2377 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 7946 -j ACCEPT
-A INPUT -p udp -m udp --dport 7946 -j ACCEPT
-A INPUT -p udp -m udp --dport 4789 -j ACCEPT
-A INPUT -d 224.0.0.0/8 -i eth0 -j ACCEPT
-A INPUT -i eth0 -p vrrp -j ACCEPT
-A INPUT -i eth0 -p icmp -j ACCEPT
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-INGRESS
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -o docker_gwbridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker_gwbridge -j DOCKER
-A FORWARD -i docker_gwbridge ! -o docker_gwbridge -j ACCEPT
-A FORWARD -i docker_gwbridge -o docker_gwbridge -j DROP
-A OUTPUT -j ACCEPT
-A DOCKER-INGRESS -p tcp -m tcp --dport 25565 -j ACCEPT
-A DOCKER-INGRESS -p tcp -m state --state RELATED,ESTABLISHED -m tcp --sport 25565 -j ACCEPT
-A DOCKER-INGRESS -j RETURN
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker_gwbridge ! -o docker_gwbridge -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker_gwbridge -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN

Proof Overlay networking is working:

I added individual Alpine containers to each node to the minecraft_network overlay network and tested pinging for DNS name resolution and communication between the Hosts and external world.

Node 1:

root@docker01 [ ~ ]# docker run -it --name alpine1 --network minecraft_network alpine
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
df9b9388f04a: Pull complete 
Digest: sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454
Status: Downloaded newer image for alpine:latest
/ # ping server
PING server (10.0.6.2): 56 data bytes
64 bytes from 10.0.6.2: seq=0 ttl=64 time=0.096 ms
64 bytes from 10.0.6.2: seq=1 ttl=64 time=0.048 ms
--- server ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.048/0.072/0.096 ms
/ # ping alpine2
PING alpine2 (10.0.6.6): 56 data bytes
64 bytes from 10.0.6.6: seq=0 ttl=64 time=0.197 ms
64 bytes from 10.0.6.6: seq=1 ttl=64 time=0.215 ms
64 bytes from 10.0.6.6: seq=2 ttl=64 time=0.175 ms
--- alpine2 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.175/0.195/0.215 ms
/ # ping alpine3
PING alpine3 (10.0.6.8): 56 data bytes
64 bytes from 10.0.6.8: seq=0 ttl=64 time=0.974 ms
64 bytes from 10.0.6.8: seq=1 ttl=64 time=0.890 ms
64 bytes from 10.0.6.8: seq=2 ttl=64 time=0.854 ms
--- alpine3 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.854/0.906/0.974 ms
/ # ping google.com
PING google.com (142.250.64.238): 56 data bytes
64 bytes from 142.250.64.238: seq=0 ttl=116 time=30.473 ms
64 bytes from 142.250.64.238: seq=1 ttl=116 time=64.490 ms
64 bytes from 142.250.64.238: seq=2 ttl=116 time=27.212 ms
--- google.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 27.212/40.725/64.490 ms
/ # 

Node 2:

root@docker02 [ ~ ]# docker run -it --name alpine2 --network minecraft_network alpine
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
df9b9388f04a: Pull complete 
Digest: sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454
Status: Downloaded newer image for alpine:latest
/ # ping server
PING server (10.0.6.2): 56 data bytes
64 bytes from 10.0.6.2: seq=0 ttl=64 time=0.086 ms
64 bytes from 10.0.6.2: seq=1 ttl=64 time=0.048 ms
64 bytes from 10.0.6.2: seq=2 ttl=64 time=0.093 ms
--- server ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.048/0.075/0.093 ms
/ # ping alpine1
PING alpine1 (10.0.6.5): 56 data bytes
64 bytes from 10.0.6.5: seq=0 ttl=64 time=0.134 ms
64 bytes from 10.0.6.5: seq=1 ttl=64 time=0.219 ms
64 bytes from 10.0.6.5: seq=2 ttl=64 time=0.285 ms
--- alpine1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.134/0.212/0.285 ms
/ # ping alpine3
PING alpine3 (10.0.6.8): 56 data bytes
64 bytes from 10.0.6.8: seq=0 ttl=64 time=1.107 ms
64 bytes from 10.0.6.8: seq=1 ttl=64 time=1.282 ms
64 bytes from 10.0.6.8: seq=2 ttl=64 time=1.219 ms
--- alpine3 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 1.107/1.202/1.282 ms
/ # ping google.com
PING google.com (142.250.64.238): 56 data bytes
64 bytes from 142.250.64.238: seq=0 ttl=116 time=35.483 ms
64 bytes from 142.250.64.238: seq=1 ttl=116 time=27.942 ms
64 bytes from 142.250.64.238: seq=2 ttl=116 time=25.849 ms
--- google.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 25.849/29.758/35.483 ms
/ # 

Node 3:

root@docker03 [ ~ ]# docker run -it --name alpine3 --network minecraft_network alpine
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
df9b9388f04a: Pull complete 
Digest: sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454
Status: Downloaded newer image for alpine:latest
/ # ping server
PING server (10.0.6.2): 56 data bytes
64 bytes from 10.0.6.2: seq=0 ttl=64 time=0.071 ms
64 bytes from 10.0.6.2: seq=1 ttl=64 time=0.077 ms
64 bytes from 10.0.6.2: seq=2 ttl=64 time=0.093 ms
--- server ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.071/0.080/0.093 ms
/ # ping alpine1
PING alpine1 (10.0.6.5): 56 data bytes
64 bytes from 10.0.6.5: seq=0 ttl=64 time=1.039 ms
64 bytes from 10.0.6.5: seq=1 ttl=64 time=1.229 ms
64 bytes from 10.0.6.5: seq=2 ttl=64 time=1.228 ms
--- alpine1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 1.039/1.165/1.229 ms
/ # ping alpine2
PING alpine2 (10.0.6.6): 56 data bytes
64 bytes from 10.0.6.6: seq=0 ttl=64 time=0.613 ms
64 bytes from 10.0.6.6: seq=1 ttl=64 time=1.193 ms
64 bytes from 10.0.6.6: seq=2 ttl=64 time=1.225 ms
--- alpine2 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.613/1.010/1.225 ms
/ # ping google.com
PING google.com (142.250.64.238): 56 data bytes
64 bytes from 142.250.64.238: seq=0 ttl=116 time=28.863 ms
64 bytes from 142.250.64.238: seq=1 ttl=116 time=28.646 ms
64 bytes from 142.250.64.238: seq=2 ttl=116 time=27.834 ms
--- google.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 27.834/28.447/28.863 ms

Configuration and Status:

Docker Service YAML:

root@docker01 [ ~ ]# more minecraft-server.yml 
version: '3.8'

services:
  server:
    image: itzg/minecraft-server
    ports:
      - target: 25565
        published: 25565
        protocol: tcp
    volumes:
      - "datastore:/data"
    networks:
      - network
    environment:
      EULA: "TRUE"
    restart: unless-stopped
    deploy:
      mode: replicated
      replicas: 1

networks:
  network:
    driver: overlay
    attachable: true

volumes:
  datastore:

Docker Stack/Service Information:

root@docker01 [ ~ ]# docker stack deploy -c minecraft-server.yml minecraft
Ignoring unsupported options: restart

Creating network minecraft_network
Creating service minecraft_server

root@docker01 [ ~ ]# docker service ls
ID             NAME               MODE         REPLICAS   IMAGE                          PORTS
128zzthqb5pv   minecraft_server   replicated   1/1        itzg/minecraft-server:latest   *:25565->25565/tcp

root@docker01 [ ~ ]# docker service ps minecraft_server
ID             NAME                 IMAGE                          NODE                      DESIRED STATE   CURRENT STATE            ERROR     PORTS
gddad3afa0xl   minecraft_server.1   itzg/minecraft-server:latest   docker01.<redacted>   Running         Running 13 seconds ago  

Inspections:

root@docker01 [ ~ ]# docker service inspect minecraft_server
[
        "Endpoint": {
            "Spec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 25565,
                        "PublishedPort": 25565,
                        "PublishMode": "ingress"
                    }
                ]
            },
            "Ports": [
                {
                    "Protocol": "tcp",
                    "TargetPort": 25565,
                    "PublishedPort": 25565,
                    "PublishMode": "ingress"
                }
            ],
            "VirtualIPs": [
                {
                    "NetworkID": "s3vwk04tgjgy27dvk2coj72ku",
                    "Addr": "10.0.0.15/24"
                },
                {
                    "NetworkID": "xnlj3dttk4y3kb1nu22d5v9sl",
                    "Addr": "10.0.6.2/24"
                }
            ]
    }
]

root@docker01 [ ~ ]# docker network inspect ingress
[
    {
        "Name": "ingress",
        "Id": "s3vwk04tgjgy27dvk2coj72ku",
        "Created": "2022-04-17T22:05:48.77504595Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": true,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "c2067bdcec04c5f327072200b5bc720ddcb79933bc621e72f248912e4466bbed": {
                "Name": "minecraft_server.1.gddad3afa0xly8zc8ge00ssqb",
                "EndpointID": "6fbd05792c0a9be832e63af57dd5f5a2ce9e6219317e3b707eefc54e895987d5",
                "MacAddress": "02:42:0a:00:00:10",
                "IPv4Address": "10.0.0.16/24",
                "IPv6Address": ""
            },
            "ingress-sbox": {
                "Name": "ingress-endpoint",
                "EndpointID": "5c382c781bb17d93a33789f03f8162ad3c0a28cc170d8607beda2212747835c1",
                "MacAddress": "02:42:0a:00:00:02",
                "IPv4Address": "10.0.0.2/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4096"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "a8b91c57730c",
                "IP": "192.168.7.236"
            },
            {
                "Name": "c1cb3607cdf7",
                "IP": "192.168.7.237"
            },
            {
                "Name": "048d71f586ca",
                "IP": "192.168.7.238"
            }
        ]
    }
]

I appreciate any and all help. Thank you!

Not necessarily. A service (I am not talking about Docker here) could listen on a LAN IP even when the host does not allow any traffic from the outside world to the IP on that port.

The iptables rules look good to me, but I don’t read iptables very well.

I can see “vrrp” in a rule. Do you have KeepAlived on the nodes? I am not sure if that is relevant though.
Is it possible that your request can’t even reach the node because of a firewall in front of the nodes and not on them?

I think you could also use tools like wireshark, tshark or tcpdump to trace your network traffic and see if the requests can reach the nodes or not. https://www.redhat.com/sysadmin/troubleshoot-tcpdump

You could also check the service logs to see if even the service got the request but it could not respond. To determine if it is the case, I would also try to change the -P OUTPUT DROP to -P OUTPUT ACCEPT in the third line of the iptables rules. If that doesn’t help, I would do the same with the INPUT temporarily. If that doesn’t help either, than the packages are caught on an other level.

I do have keepalived running. If it matters, I can’t reach the open port on the Docker nodes themselves, except from the one the container is running in.
I’ll look at iptables logs to see if it’s blocking anything and run a packet capture. It’s my home network so it is a flat network and no firewalls in place between laptops and the Docker nodes. They are running inside of ESXi on an Intel NUC cluster (No VMware NSX deployed).

I had an ESXi cluster once and had to set the vswitch on all nodes to promiscous mode - but I don’t remember if this was a necessity for swarm or I did it for somethinge else.

[off-topic]

I replaced ESXi with Proxmox something like a year ago and must say I don’t miss ESXi - Proxmox uses KVM under the hood, which is the hypervisor most cloud provider use for their compute nodes
 if it’s good enough for cloud providers like AWS, it should be good enough for a homelab :slight_smile:

[/off-topic]

Keepalived shouldn’t matter. In homelab scenarios It is usualy “just” a stable target ip for portforwarding. Indeed it makes most sense with the routing mesh in place.

Does VMware Photon 4.0 Revision 2 officially support Docker Swarm? Usually lightweight container os vendors aim for kubernetes and provide their own docker re-distribution. You might want to raise an issue in their github project: Issues · vmware/photon · GitHub

Update, it looks like iptables is the culprit. But, now I am trying to find out why. I’m a novice when it comes to iptables so I’m struggling to debug where iptables is blocking or not NAT’ng the traffic. I cross-posted this issue in case they’re causing it Photon OS 4 R2 - Docker Swarm - Issue with iptables Causing Routing Mesh not routing Published Port on all Nodes · Issue #1321 · vmware/photon · GitHub

I would use Proxmox but I have a few use cases that require me to have vCenter and vSphere unrelated to this.

Not sure if it helps, but on a swarm node without deployments, the iptables rules look like this:

root@swarm1:/mnt/deployment# iptables -S
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-INGRESS
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-INGRESS
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -o docker_gwbridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker_gwbridge -j DOCKER
-A FORWARD -i docker_gwbridge ! -o docker_gwbridge -j ACCEPT
-A FORWARD -i docker_gwbridge -o docker_gwbridge -j DROP
-A DOCKER-INGRESS -j RETURN
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker_gwbridge ! -o docker_gwbridge -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker_gwbridge -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN

Here is a comparision of one of your iptables -S outputs and mine:

I never tried to create iptables rules manualy. I prefer to do it on the hypervisor/security group level outside the vm.

@meyay

The additional rules are me moving SSH to TCP 2222, opening up the required ports for Docker Swarm and Overlay networking to work and enabling keepalived for a VIP. The rest of the rules are managed by Docker.

I wonder if it’s a NAT issue inside of iptables? It’s not getting a “connection refused” it’s timing out when trying to connect via other Docker nodes. If I disable iptables the issue goes away.

I guess mine works because the policy for INPUT and OUPUT is set to ACCEPT (see line 2/4 of the comparision) for traffic that does not match any filters.

Your policy for INPUT and OUTPUT is set to DROP and requires you to add filters for each and everything that should be allowed.

That is the default in Photon OS’ iptables policy, I did change INPUT and OUTPUT to ACCEPT and it didn’t change the behavior. It definitely seems to be related to iptables, I just don’t know iptables good enough to figure out how to debug why it’s failing. It’s not getting connection refused (dropped), just seems to not be coming back.

This issue is resolved. More information here on how to resolve it. Photon OS 4 R2 - Docker Swarm - Issue with iptables Causing Routing Mesh not routing Published Port on all Nodes · Issue #1321 · vmware/photon · GitHub

2 Likes

Awesome
 I got into the same issue using Ubuntu due to Upgrade to 20.10 breaks swarm network · Issue #41775 · moby/moby · GitHub

A lot of work to track down when you don’t know what to look for


ethtool -K ens224 tx-checksum-ip-generic off

Everywhere else I looked, were people not having opened the right ports, or made some other sort of setup error


Hello everyone,

I was having the exact same issue for a swarm cluster, buit of Ubuntu 20.04.4 LTS VMs, running on ESXi 6.7. I spent countless hours troubleshooting it. My main focus was iptables, since it made most sense to me.

However, in my case, running the command below on all cluster nodes immediately fixed my problem. Now, ingress publishing works like a charm!

sudo ethtool -K <interface> tx-checksum-ip-generic off

It’s worth trying!

Best regards,
Ivan Spasov

1 Like

Hi Ivan

After more than a week challanging on same case on customer, your suggestion works perfectly, you save my life.
thank you , than you

Regards
Sahram

Hello Sahram, I glad I was able to help. Have in mind that this solution is not persistent across reboots. You will have to find some way to make it permanent. In my case I added the code below to the /etc/rc.local file:

#!/bin/bash
ethtool -K ens160 tx-checksum-ip-generic off
exit

The rc.local file has to be executable. In case it isnt, run the command below:

chmod +x /etc/rc.local
1 Like

Hello Ivan,

thank you so much,

regards,
Sahram