Hello.
There have been a number of posts relating to Docker Swarm mode “not working”. Unfortunately, none of these have worked for me.
Scenario:
Two Proxmox hosts: p1 and c2 run one privileged LXC each - architect on p1 and librarian on c2. The LXCs have Docker configured in Swarm mode, where architect is the swarm leader. Across the two nodes are a bunch of network containers that listen on various ports.
When I use a specialised network-diagnostics image, I can open an nc instance on a certain port and connect to it from another instance on the other host.
> ssh p1
> docker run diagnostics ..
~# ip addr
eth0 10.0.1.5 # ingress network
eth1 10.0.0.6 # mesh network for the containers
~# nc -l -p 1920
Hello
> ssh c2
> docker run diagnostics ..
~# ip addr
eth0 10.0.1.12 # ingress network
eht1 10.0.0.13 # mesh network for containers
~# nc 10.0.0.6
Hello
This works if I’m using IP addresses from the mesh network. However if I replace the IP addresses in the nc call with IPs from the ingress network, netcat stops receiving data.
This extends to other services like the containers I briefly mentioned.
Things I have tried:
- Adjusting the MTU on the mesh network to something like 1200
- Fiddling with various IP forwarding settings on the Proxmox hosts as well as the LXCs
- Checking/adjusting the kernel modules loaded at boot (and reboot the host of course). Currently loaded are:
overlay br_netfilter ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh ns_conntrack. - Host-side networking. (Not ideal for my situation because the number of containers will eventually grow and potentially require true load balancing)
- DNSRR - Fails because it is not compatible with the Ingress network
- Disabling checksum offloading on the host’s and LXC’s bridges
- Network diagnostics using familiar tools such as
tcpdump,trneaceroute,pingand similar. - Googling for help. No LLMs, or forums can answer my question.
Other info
- DNS works. I can
getent hosts nettest1andgetent hosts tasks.nettest1and get different but correct IPs - I do have my own DNS server running that the docker daemons are configured to use. It also runs in swarm mode but uses host-mode networking. (Ideally I’d like it to use the ingress network like everything else, but this just seems to mask the problem). The docker daemons have since been reconfigured without DNS preferences. The above symptoms exist despite the initial docker DNS settings.
- Please let me know if I can provide any other information that may be diagnostically relevant. I sincerely appreciate any and all help.