Docker Swarm - The web service without loading

Hello everyone, I have a question about docker swarm using docker-compose.yaml

this is my .yaml:

version: "3.7"
services:
  web:
    image: httpd:latest
    deploy:
      placement:
        constraints:
          - node.role == worker
      replicas: 3
    ports:
      - "8888:80"

but when doing a:

docker stack deploy -c docker-compose.yaml webpage

and try to enter http://ip:8888 it does not load, if I remove the line “- node.role == worker” and a container loads in the manager then if the website loads, it is only shown if a container is working in the master node, if there is no other node and they are all in the worker then the web page does not load.

pc@manager01:~/swarm$ docker node ls
ID                            HOSTNAME    STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
k4ean1zqjydcyc9l7858seku9 *   manager01   Ready     Active         Leader           24.0.6
yf4eemen3b2vhkx6s66u2ksgi     worker01    Ready     Active                          24.0.6
mxv2in62u0earrhyedwuc3k1f     worker02    Ready     Active                          24.0.6
pc@manager01:~/swarm$ docker service ls
ID             NAME          MODE         REPLICAS   IMAGE          PORTS
snwsob823vxe   webpage_web   replicated   3/3        httpd:latest   *:8888->80/tc
pc@manager01:~/swarm$ docker service ps webpage_web
ID             NAME            IMAGE          NODE       DESIRED STATE   CURRENT STATE                ERROR     PORTS
2gkvuhqcoomq   webpage_web.1   httpd:latest   worker02   Running         Running about a minute ago             
knnv2qge66ed   webpage_web.2   httpd:latest   worker01   Running         Running about a minute ago             
xamw3ehx2tx1   webpage_web.3   httpd:latest   worker01   Running         Running about a minute ago      

It is running on a physical server in the office, each node is a vm and all the vms are on the same network and see each other without problem.

:wink:

From my understanding, when using a Docker Swarm service with ports, it should open the port on all nodes (managers and workers), and forward it to a container round robin.

Not sure what IP you are using to access it.

In general, you would use a reverse proxy (like Traefik, nginx, haproxy, caddy) in front of your workers when in a HTTP world, which can route by (sub-) domain to different services and create necessary TLS/SSL certificates. See simple Traefik example.

Thank you for responding… I am accessing the manager’s IP to view the website

manager: 192.168.1.198
worker01:192.179.1.199
worker02: 192.168.1.200

I enter http://192.168.1.198 and if the web container is in the manager it shows me the page, but if the container is deployed in the workers and not in the manager it does not show me anything when entering the manager IP.

I assume that the worker ports are open, at least from the vm nothing is closed.

How do you expect it to work with 3 replicas using the same port on the same node? :slight_smile: I don’t use Swarm in production, but only one container should be able to listen on one port. When you remove the constraint, each node can get one service task.

Quote:

  • If you expect to run multiple service tasks on each node (such as when you have 5 nodes but run 10 replicas), you cannot specify a static target port. Either allow Docker to assign a random high-numbered port (by leaving off the published), or ensure that only a single instance of the service runs on a given node, by using a global service rather than a replicated one, or by using placement constraints.

I would try the service with two replicas to see if it was the problem. You could also test without replicas and try to access to a port from worker2 when it is on worker1. This way you can check if the communication between the manager and the workers is the problem or none of the nodes can forward the requests to the other where the container is.

Make sure the ingress and the docker_qwbridge networks are present on each node.

Quote:

  • an overlay network called ingress, which handles the control and data traffic related to swarm services. When you create a swarm service and do not connect it to a user-defined overlay network, it connects to the ingress network by default.
  • a bridge network called docker_gwbridge, which connects the individual Docker daemon to the other daemons participating in the swarm.

If the stack indeed is deployed as swarm stack (as in docker stack deploy) there is no problem to run multiple replicas on the same node. The service vip will balance the traffic to the tasks. Of course this wouldn’t work with docker compose, as there is no service abstraction that would own a virtual ip and no ingress :slight_smile:

The quote from the documentation is from the section “bypass the routing mesh”, where it uses mode: host instead of mode: ingress for publishing ports, which of course can only be bound to a single replica per node.

I would put my money on the usual suspects:

  • firewall
  • mtu size mismatch
  • no low latency network
2 Likes

Good to know, thanks. I actually read about the VIP, but apparently didn’t understand this is what solves the problem.

My first idea was the firewall, then I read

On the other hand, the port could be open and a traffic still blocked later…

You know how it is Ákos: always be suspicous :slight_smile:

@accuhealth you can run this snippet on the manager node, to generate snippets that can be executed on each node, which check if ports are reachable:

# on manager
node_ids=$(docker node ls -q)
check_ips=""
for node in ${node_ids}; do
  check_ips="${check_ips} $(docker node inspect ${node} --format '{{.Status.Addr}}')"
done
cat << EOF
# execute this on each node:
check_ips="${check_ips}"
for _ip in \${check_ips}; do
  echo "## ip: \${_ip}"
  nc -zv \${_ip} -t 2377 7946
  nc -zv \${_ip} -u 7946 4789
done
EOF

If I remember right, port 2377/tcp should only be open on the manager node-

hi @meyay,

Sorry, I was a little busy with work…The code you gave me gave the following:

# execute this on each node:
check_ips=" 192.168.1.198 192.168.1.199 192.168.1.200"
for _ip in ${check_ips}; do
  echo "## ip: ${_ip}"
  nc -zv ${_ip} -t 2377 7946
  nc -zv ${_ip} -u 7946 4789
done
netstat -plta
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      -                   
tcp        0      0 localhost:domain        0.0.0.0:*               LISTEN      -                   
tcp        0      0 manager01:43896         192.168.1.199:7946      TIME_WAIT   -                   
tcp        0      0 manager01:42470         192.168.1.200:7946      TIME_WAIT   -                   
tcp        0    628 manager01:ssh           192.168.150.2:59662     ESTABLISHED -                   
tcp        0      0 manager01:37364         ubuntu-mirror-1.ps:http TIME_WAIT   -                   
tcp6       0      0 [::]:ssh                [::]:*                  LISTEN      -                   
tcp6       0      0 [::]:2377               [::]:*                  LISTEN      -                   
tcp6       0      0 [::]:7946               [::]:*                  LISTEN      -                   
tcp6       0      0 manager01:2377          192.168.1.200:55782     ESTABLISHED -                   
tcp6       0      0 manager01:2377          192.168.1.199:52928     ESTABLISHED -                   

Then paste the following code into the nodes:

check_ips=" 192.168.1.198 192.168.1.199 192.168.1.200"
for _ip in ${check_ips}; do
  echo "## ip: ${_ip}"
  nc -zv ${_ip} -t 2377 7946
  nc -zv ${_ip} -u 7946 4789
done
accuhealth@worker01:~$ check_ips=" 192.168.1.198 192.168.1.199 192.168.1.200"
for _ip in ${check_ips}; do
  echo "## ip: ${_ip}"
  nc -zv ${_ip} -t 2377 7946
  nc -zv ${_ip} -u 7946 4789
done
## ip: 192.168.1.198
Connection to 192.168.1.198 2377 port [tcp/*] succeeded!
Connection to 192.168.1.198 7946 port [tcp/*] succeeded!
Connection to 192.168.1.198 7946 port [udp/*] succeeded!
Connection to 192.168.1.198 4789 port [udp/*] succeeded!
## ip: 192.168.1.199
nc: connect to 192.168.1.199 port 2377 (tcp) failed: Connection refused
Connection to 192.168.1.199 7946 port [tcp/*] succeeded!
Connection to 192.168.1.199 7946 port [udp/*] succeeded!
Connection to 192.168.1.199 4789 port [udp/*] succeeded!
## ip: 192.168.1.200
nc: connect to 192.168.1.200 port 2377 (tcp) failed: Connection refused
Connection to 192.168.1.200 7946 port [tcp/*] succeeded!
Connection to 192.168.1.200 7946 port [udp/*] succeeded!
Connection to 192.168.1.200 4789 port [udp/*] succeeded!
accuhealth@worker01:~$ check_ips=" 192.168.1.198 192.168.1.199 192.168.1.200"
for _ip in ${check_ips}; do
  echo "## ip: ${_ip}"
  nc -zv ${_ip} -t 2377 7946
  nc -zv ${_ip} -u 7946 4789
done
## ip: 192.168.1.198
Connection to 192.168.1.198 2377 port [tcp/*] succeeded!
Connection to 192.168.1.198 7946 port [tcp/*] succeeded!
Connection to 192.168.1.198 7946 port [udp/*] succeeded!
Connection to 192.168.1.198 4789 port [udp/*] succeeded!
## ip: 192.168.1.199
nc: connect to 192.168.1.199 port 2377 (tcp) failed: Connection refused
Connection to 192.168.1.199 7946 port [tcp/*] succeeded!
Connection to 192.168.1.199 7946 port [udp/*] succeeded!
Connection to 192.168.1.199 4789 port [udp/*] succeeded!
## ip: 192.168.1.200
nc: connect to 192.168.1.200 port 2377 (tcp) failed: Connection refused
Connection to 192.168.1.200 7946 port [tcp/*] succeeded!
Connection to 192.168.1.200 7946 port [udp/*] succeeded!
Connection to 192.168.1.200 4789 port [udp/*] succeeded!```

Now we know that worker01 can reach all required port on all other nodes. I would suggest running the snippet on each node, to make sure no firewall prevents outgoing traffic. Note: you seem to have shared only the output of worker01 twice. If the output is identical on all nodes, then we can rule out firewall problems.

What about the mtu sizes? Please run this command on each node and share the output:

ip addr show scope global | grep mtu

Can you describe your setup? Are you using Docker Swarm in VMs? All on the same host? Are you using a VPN/VLAN/VSwitch between Swarm nodes? From where do you try to access the web server service?

1 Like

Hi @bluepuma77

Yes, the 3 nodes are on a physical machine and each node is a VM (esxi) and I am trying to access the website from my notebook… the strange thing is that if the containers only fall on the workers I do not reach the website

accuhealth@worker01:~$ check_ips=" 192.168.1.198 192.168.1.199 192.168.1.200"
for _ip in ${check_ips}; do
  echo "## ip: ${_ip}"
  nc -zv ${_ip} -t 2377 7946
  nc -zv ${_ip} -u 7946 4789
done
## ip: 192.168.1.198
Connection to 192.168.1.198 2377 port [tcp/*] succeeded!
Connection to 192.168.1.198 7946 port [tcp/*] succeeded!
Connection to 192.168.1.198 7946 port [udp/*] succeeded!
Connection to 192.168.1.198 4789 port [udp/*] succeeded!
## ip: 192.168.1.199
nc: connect to 192.168.1.199 port 2377 (tcp) failed: Connection refused
Connection to 192.168.1.199 7946 port [tcp/*] succeeded!
Connection to 192.168.1.199 7946 port [udp/*] succeeded!
Connection to 192.168.1.199 4789 port [udp/*] succeeded!
## ip: 192.168.1.200
nc: connect to 192.168.1.200 port 2377 (tcp) failed: Connection refused
Connection to 192.168.1.200 7946 port [tcp/*] succeeded!
Connection to 192.168.1.200 7946 port [udp/*] succeeded!
Connection to 192.168.1.200 4789 port [udp/*] succeeded!```

Is it okay for workers to have port 2377 closed?
On each node (M and W) I have installed:
Docker version 24.0.6, build ed223bc

The Manager replicates the containers to the Workers well, but if the container is not deployed in the Master then the service (for example web) is not seen… therefore the visualization of the replicas is not working.

This port is used amongst managers for the cluster state. So yes, it is normal that works don’t bind that port.

I am not sure if it’s related, but I kind of recall that back when I still used ESXi, I had to enable promiscuous mode on the vswitch.

Your problems are most likely related to: networking - Docker-swarm overlay network is not working for containers in different hosts - Stack Overflow

Update2:
you might also find this known issues post useful:
Known issues with VMware

1 Like

thank you!!

The solution was to create the cluster again with the command: --data-path-port=7789

docker swarm startup --data-path-port=7789

VTEP Port is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application."

But we can change docker swarm data-path-port(the default port number 4789 is used) to another:

Glad it’s sorted out now.