Docker Swarm Ingress network is not forwarding packets to the containers

Hello.

There have been a number of posts relating to Docker Swarm mode “not working”. Unfortunately, none of these have worked for me.

Scenario:

Two Proxmox hosts: p1 and c2 run one privileged LXC each - architect on p1 and librarian on c2. The LXCs have Docker configured in Swarm mode, where architect is the swarm leader. Across the two nodes are a bunch of network containers that listen on various ports.

When I use a specialised network-diagnostics image, I can open an nc instance on a certain port and connect to it from another instance on the other host.

> ssh p1
> docker run diagnostics ..
    ~# ip addr
    eth0 10.0.1.5 # ingress network
    eth1 10.0.0.6 # mesh network for the containers
    ~# nc -l -p 1920
    Hello

> ssh c2
> docker run diagnostics ..
    ~# ip addr
    eth0 10.0.1.12 # ingress network
    eht1 10.0.0.13 # mesh network for containers
    ~# nc 10.0.0.6
    Hello

This works if I’m using IP addresses from the mesh network. However if I replace the IP addresses in the nc call with IPs from the ingress network, netcat stops receiving data.

This extends to other services like the containers I briefly mentioned.

Things I have tried:

  1. Adjusting the MTU on the mesh network to something like 1200
  2. Fiddling with various IP forwarding settings on the Proxmox hosts as well as the LXCs
  3. Checking/adjusting the kernel modules loaded at boot (and reboot the host of course). Currently loaded are: overlay br_netfilter ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh ns_conntrack.
  4. Host-side networking. (Not ideal for my situation because the number of containers will eventually grow and potentially require true load balancing)
  5. DNSRR - Fails because it is not compatible with the Ingress network
  6. Disabling checksum offloading on the host’s and LXC’s bridges
  7. Network diagnostics using familiar tools such as tcpdump, trneaceroute, ping and similar.
  8. Googling for help. No LLMs, or forums can answer my question.

Other info

  • DNS works. I can getent hosts nettest1 and getent hosts tasks.nettest1 and get different but correct IPs
  • I do have my own DNS server running that the docker daemons are configured to use. It also runs in swarm mode but uses host-mode networking. (Ideally I’d like it to use the ingress network like everything else, but this just seems to mask the problem). The docker daemons have since been reconfigured without DNS preferences. The above symptoms exist despite the initial docker DNS settings.
  • Please let me know if I can provide any other information that may be diagnostically relevant. I sincerely appreciate any and all help.

In writing this post, it occurred to me that since the issue is related to the ingress network, adjusting the MTU settings of the mesh network cannot influence the problem. Shutting the swarm down and rebuilding the ingress network using a lower MTU did the trick.

> ssh architect
> docker stack down ...
> docker network rm ingress
> docker network create \
      --driver overlay \
      --ingress \
      --opt com.docker.network.driver.mtu=1450 \
      --subnet=10.0.0.0/24 \
      --gateway=10.0.0.1 \
      ingress

I wonder if a proper MTU diagnostic would have caught this issue earlier as I’ve spent a week on this now.

1 Like

Glad you found the solution! Thanks for keeping us updated with your solution!

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.