Swarm Mesh Troubleshoot help

Hello,
I tried to figure this out on my own and I read a few posts that touch on this same subject, but for some reason the posts I found just dont seem to complete (they stop at certain places). So, I’m asking for help to get me past the next step.
I am using Docker version 27.3.1, build ce12230 on 3 managers and 4 workers on RHEL9 each with 2 CPU’s and 16GB RAM.
I followed this walkthrough: Use Swarm mode routing mesh | Docker Docs
in that I did a

docker service create --name my-web --publish published=8080,target=80 --replicas 2 nginx

I can see which nodes are running the my-web. When I do a curl http://<nodename>:8080 it seems to reply every other time. Because it is two instances on different nodes, I tried to switch between each node every other time, thinking that maybe the first node is “busy” but it’s the same result.

Am I missing something here? In my mind, it should reply each and every single time. I even went so far as to curl http://<nodeIP>:8080 and it exhibits the same exact behavior, responding every other time. Any help would greatly appreciated as I’m somewhat new to Docker. I did follow another post where someone suggested docker inspect gw_bridge and see if the ip of gw_bridge responds to port 8080 requests and that did not work for me.

Thank you in advance,
AB

just an idea. Are you sure all the mentioned required ports are open between the nodes?

First thank you for the formatting. I failed on that one! LOL
second, yes (at least I hope so) all nodes have these ports open:
ports: 2377/tcp 7946/tcp 7946/udp 4789/tcp 4789/udp

Wait are you talking port 8080 and port 80?

No, I meant the ports in your previous message.

I’ll add them now and reply with results

ok, so all ports added, 8080 and 80, still same results, it will respond to every other curl, not each one. if I do a
curl http://<nodeName1>:8080 it responds
curl http://<nodeName2>:8080 will not respond/timeout
curl http://<nodeName2>:8080 responds
curl http://<nodename1>:8080 will not respond/timeout

I don’t know maybe, I’m missing something, I just expected to ‘hand-off’ requests round-robin so to speak if it was busy.

When the required ports are opened in the firewall, and it still doesn’t work, the usual suspects are:

  • not using a low latency network amongst the nodes
  • missmatch in mtu size amongst the nodes
  • nodes run in vms on vmware vsphere/esxi

A published port of a swarm service should be bound on every node, and the traffic should be forwarded internally using the ingress routing mesh.

Maybe this topic can provide some new ideas: Docker Swarm - The web service without loading

Creating a Docker Swarm service with a simple port will create an ingress network. All nodes will open the port and Docker will load-balance requests to the replicated service instances.

Is the Swarm cluster setup correctly, can you list nodes, services and service instances?

this is exactly our environment. I will read the link you posted below, but if there is anymore reading that you can point me to, I’ll be more than happy to read up on that.

Thank you,
AB

Only the links I shared in one of the posts in the topic you read: Docker Swarm - The web service without loading - #12 by meyay

a coworker seems to think maybe I missed a step in setup. I’m going to try a simple 2 node environment from scratch and try again that way. I’m going to do everything by hand following the docker tutorial to setup Swarm. When I set the last environment up, I did everything through Ansible. It’s possible I just missed something.

well, I just want to checkin to report some findings. It has to do with the data-port=4789 and VMware’s VXLAN running on same port. I completely rebuilt a new environment, I was super-careful on the Docker install and it exhibited the same internittent behavior as before. I tore down the Docker Swarm, added 7789 to the FW’s, reinitialized new swarm environment using 7789 as the data-port and voila, works like an absolute charm.

My problem now is, my coworker is saying well, 4789 belongs to docker, so change default port for VXLAN and my reply is, but 4789 is assigned by IANA for VXLAN. It would be so much easier to just reassign a new port for Docker.

Is there anything on the horizon from Docker about this predicament?

Disregard. I just had a meeting with him and he said, “oh IANA? Well, lets just move the default Docker data-port.”

Thank you all for your assistance!

1 Like

Well, both VMware NSX, and the Swarm overlay network mode use XVLAN. So one of them needs to be configured to use a non default port to prevent collisions :smiley: