I should preface this that I’m really not an expert on Docker and Docker Swarm.
Docker Swarm Mesh Networking not working as expected.
Ubuntu 22.04.3 - Kernel 5.15.0-86-generic
Steps to reproduce
- 5 Docker hosts (3 managers, 2 workers) - Ubuntu 22.04 Server VMs running on the same subnet without anything inbetween that would block traffic on Swarm-relevant ports.
- Fresh up-to-date VMs (put relevant hosts in each nodes /etc/hosts), install Docker per the official documentation for Ubuntu, add non-root user in docker group.
- Make sure ufw is not blocking anything, then create Swarm per the official documentation. Join managers and workers.
- Then I create an attachable overlay network.
- Create an nginx service for testing mesh networking
docker service create --name my-web --network testnet --publish published=8080,target=80 --replicas 2 nginx
- Try to curl
<http://any node IP:8080>. Only works if I curl the specific node that a replica is running on. Curling other nodes results in connection timed out.
I’ve been bashing my head against this issue for two days now. It seems like mesh networking is not working properly. Am I wrong in thinking that I should see vxlan interfaces when I run
ip a after creating an overlay network and attaching services to it? Because it is empty. I also checked with
ipvsadm -L -n, but it is empty. I’ve made sure that IPVS, overlay, and vxlan kernel modules are loaded. Tried reinitializing the Swarm.
When scaling services, it scales as expected. So if I scale to 5 replicas, I can curl any node IP:8080 obviously. I’ve also tried inter-node communcation/inter-container communication by running an alpine debug service where I try to curl the VirtualIP:8080 - doesn’t work. I’ve made sure proper DNS functionality in the Swarm which works as expected, so I can run
nslookup my-web gives me the address of one of the containers.
I’ve checked nftables and iptables configuration to see if anything looks out of order, but seeing that I haven’t modified anything there I can’t see why it would be broken. Checking Docker service logs on one of the managers I could see this
...level=info msg="initialized VXLAN UDP port to 4789 " which leads me to believe that VXLAN interfaces are created as expected.
One thing I haven’t tried yet is to set everything up on an older Ubuntu liveserver version (20.04).
I’m starting to think that I’m missing something extremely obvious, or am I crazy and just interpreting Dockers documentation wrong in that you should be able to access a service published on port 8080 (target 80) with any node IP:8080?
Super thankful for any help!