Docker Swarm Mesh Networking not working as expected on Ubuntu 22.04 Server

davyd92 · October 13, 2023, 10:24am

I should preface this that I’m really not an expert on Docker and Docker Swarm.

Issue type
Docker Swarm Mesh Networking not working as expected.
OS Version/build
Ubuntu 22.04.3 - Kernel 5.15.0-86-generic
App version
24.0.6
Steps to reproduce

5 Docker hosts (3 managers, 2 workers) - Ubuntu 22.04 Server VMs running on the same subnet without anything inbetween that would block traffic on Swarm-relevant ports.
Fresh up-to-date VMs (put relevant hosts in each nodes /etc/hosts), install Docker per the official documentation for Ubuntu, add non-root user in docker group.
Make sure ufw is not blocking anything, then create Swarm per the official documentation. Join managers and workers.
Then I create an attachable overlay network.
Create an nginx service for testing mesh networking
docker service create --name my-web --network testnet --publish published=8080,target=80 --replicas 2 nginx
Try to curl <http://any node IP:8080>. Only works if I curl the specific node that a replica is running on. Curling other nodes results in connection timed out.

I’ve been bashing my head against this issue for two days now. It seems like mesh networking is not working properly. Am I wrong in thinking that I should see vxlan interfaces when I run ip a after creating an overlay network and attaching services to it? Because it is empty. I also checked with ipvsadm -L -n, but it is empty. I’ve made sure that IPVS, overlay, and vxlan kernel modules are loaded. Tried reinitializing the Swarm.

When scaling services, it scales as expected. So if I scale to 5 replicas, I can curl any node IP:8080 obviously. I’ve also tried inter-node communcation/inter-container communication by running an alpine debug service where I try to curl the VirtualIP:8080 - doesn’t work. I’ve made sure proper DNS functionality in the Swarm which works as expected, so I can run nslookup my-web gives me the address of one of the containers.

I’ve checked nftables and iptables configuration to see if anything looks out of order, but seeing that I haven’t modified anything there I can’t see why it would be broken. Checking Docker service logs on one of the managers I could see this ...level=info msg="initialized VXLAN UDP port to 4789 " which leads me to believe that VXLAN interfaces are created as expected.

One thing I haven’t tried yet is to set everything up on an older Ubuntu liveserver version (20.04).

I’m starting to think that I’m missing something extremely obvious, or am I crazy and just interpreting Dockers documentation wrong in that you should be able to access a service published on port 8080 (target 80) with any node IP:8080?

Super thankful for any help!

gaetandelaunay · October 19, 2023, 2:48pm

Hi, exact same issue here!
For the moment, i have no idea what the problem might be!
On another cluster on ubuntu 20.04.5 / kernel 5.4.0-149-generic, working like a charm… Hoping it’s not an os/kernel issue…
Still investigating.
Thanks for any guess.

meyay · October 19, 2023, 3:02pm

You never can be sure about that. For Instance if you use ESXi with NSX then traffic on port 4789 will not reach any of the vms.

That expectation is correct, as long as the service task containers actually bind a service on port 80.

meyay · October 19, 2023, 3:42pm

Have you checked it like this?

network_name=testnet
short_id=$(docker network ls  --format '{{ slice .ID 0 9 }}'  --filter name=${network_name})
sudo nsenter --net=/var/run/docker/netns/lb_${short_id} ipvsadm -l -n

davyd92 · October 20, 2023, 6:41am

Thank you for your reply Metin! These VMs run on VMware vSphere on a VXRail. Could it be a similar issue like with ESXi?

Regarding your last reply. I tried running the commands which gave me this:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  262 rr
  -> 10.0.2.8:0                   Masq    1      0          0
FWM  263 rr
  -> 10.0.2.3:0                   Masq    1      0          0
  -> 10.0.2.4:0                   Masq    1      0          0
  -> 10.0.2.10:0                  Masq    1      0          0
  -> 10.0.2.11:0                  Masq    1      0          0
  -> 10.0.2.12:0                  Masq    1      0          0`

I’m running the nginx service on published port 8080 and target port 80.

meyay · October 20, 2023, 8:06am

I am not strong when it commes to knowing the vmware products. But isn’t ESXi the hypervisor used in their offering?

The output of ipvsadm only shows the FWM groups (whatever the exact terminology might be) and lists the ips of the target containers.

You might want to take a look at this topic:

davyd92 · October 20, 2023, 12:56pm

Hi Metin,

It was totally the data-path-port!! I changed it to 7789 and mesh networking is now working as expected!

Thank you so much for your help,

/David

chuanma · November 13, 2023, 2:40pm

Thanks @davyd92 . I had the same issue and using a customized data-path-port also fixed it.

However I went a bit further to investigate.

I created 2 clusters. Cluster A is with a default data-path-port and cluster B with a customized one. Then I found out that cluster A is using ipv6:7946 and cluster B is using ipv4:7946. Maybe it’s a bug in docker swarm? Note that 7946 is not configurable. Wondering why having a different data-path-port will affect 7946. @meyay

Here are the outputs from cluster A:

$ netstat -tpln |grep 7946
…
tcp6 0 0 :::7946 :::* LISTEN
…

$ sudo netstat -anu
…
udp6 0 0 :::7946 :::*

Topic		Replies	Views
Docker swarm overlay network not working General docker	8	4728	April 25, 2020
Swarm Mesh Troubleshoot help General swarm	15	104	October 1, 2024
Routing mesh not working docker 20.10 Ubunut General docker , swarm	0	715	January 21, 2021
Docker swarm not connecting across hosts Swarm docker , swarm	0	1612	August 27, 2018
Docker Swarm overlay network can't communicate accross hosts General swarm	0	517	March 6, 2020

Docker Swarm Mesh Networking not working as expected on Ubuntu 22.04 Server

Related topics