Docker Swarm - The web service without loading

accuhealth · September 23, 2023, 1:14am

Hello everyone, I have a question about docker swarm using docker-compose.yaml

this is my .yaml:

version: "3.7"
services:
  web:
    image: httpd:latest
    deploy:
      placement:
        constraints:
          - node.role == worker
      replicas: 3
    ports:
      - "8888:80"

but when doing a:

docker stack deploy -c docker-compose.yaml webpage

and try to enter http://ip:8888 it does not load, if I remove the line “- node.role == worker” and a container loads in the manager then if the website loads, it is only shown if a container is working in the master node, if there is no other node and they are all in the worker then the web page does not load.

pc@manager01:~/swarm$ docker node ls
ID                            HOSTNAME    STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
k4ean1zqjydcyc9l7858seku9 *   manager01   Ready     Active         Leader           24.0.6
yf4eemen3b2vhkx6s66u2ksgi     worker01    Ready     Active                          24.0.6
mxv2in62u0earrhyedwuc3k1f     worker02    Ready     Active                          24.0.6

pc@manager01:~/swarm$ docker service ls
ID             NAME          MODE         REPLICAS   IMAGE          PORTS
snwsob823vxe   webpage_web   replicated   3/3        httpd:latest   *:8888->80/tc

pc@manager01:~/swarm$ docker service ps webpage_web
ID             NAME            IMAGE          NODE       DESIRED STATE   CURRENT STATE                ERROR     PORTS
2gkvuhqcoomq   webpage_web.1   httpd:latest   worker02   Running         Running about a minute ago             
knnv2qge66ed   webpage_web.2   httpd:latest   worker01   Running         Running about a minute ago             
xamw3ehx2tx1   webpage_web.3   httpd:latest   worker01   Running         Running about a minute ago

It is running on a physical server in the office, each node is a vm and all the vms are on the same network and see each other without problem.

bluepuma77 · September 23, 2023, 7:13am

From my understanding, when using a Docker Swarm service with ports, it should open the port on all nodes (managers and workers), and forward it to a container round robin.

Not sure what IP you are using to access it.

In general, you would use a reverse proxy (like Traefik, nginx, haproxy, caddy) in front of your workers when in a HTTP world, which can route by (sub-) domain to different services and create necessary TLS/SSL certificates. See simple Traefik example.

accuhealth · September 25, 2023, 5:48pm

Thank you for responding… I am accessing the manager’s IP to view the website

manager: 192.168.1.198
worker01:192.179.1.199
worker02: 192.168.1.200

I enter http://192.168.1.198 and if the web container is in the manager it shows me the page, but if the container is deployed in the workers and not in the manager it does not show me anything when entering the manager IP.

I assume that the worker ports are open, at least from the vm nothing is closed.

rimelek · September 25, 2023, 6:27pm

How do you expect it to work with 3 replicas using the same port on the same node? I don’t use Swarm in production, but only one container should be able to listen on one port. When you remove the constraint, each node can get one service task.

Quote:

If you expect to run multiple service tasks on each node (such as when you have 5 nodes but run 10 replicas), you cannot specify a static target port. Either allow Docker to assign a random high-numbered port (by leaving off the published), or ensure that only a single instance of the service runs on a given node, by using a global service rather than a replicated one, or by using placement constraints.

I would try the service with two replicas to see if it was the problem. You could also test without replicas and try to access to a port from worker2 when it is on worker1. This way you can check if the communication between the manager and the workers is the problem or none of the nodes can forward the requests to the other where the container is.

Make sure the ingress and the docker_qwbridge networks are present on each node.

Quote:

an overlay network called ingress, which handles the control and data traffic related to swarm services. When you create a swarm service and do not connect it to a user-defined overlay network, it connects to the ingress network by default.

a bridge network called docker_gwbridge, which connects the individual Docker daemon to the other daemons participating in the swarm.

meyay · September 25, 2023, 7:08pm

If the stack indeed is deployed as swarm stack (as in docker stack deploy) there is no problem to run multiple replicas on the same node. The service vip will balance the traffic to the tasks. Of course this wouldn’t work with docker compose, as there is no service abstraction that would own a virtual ip and no ingress

The quote from the documentation is from the section “bypass the routing mesh”, where it uses mode: host instead of mode: ingress for publishing ports, which of course can only be bound to a single replica per node.

I would put my money on the usual suspects:

firewall
mtu size mismatch
no low latency network

rimelek · September 25, 2023, 7:15pm

Good to know, thanks. I actually read about the VIP, but apparently didn’t understand this is what solves the problem.

My first idea was the firewall, then I read

On the other hand, the port could be open and a traffic still blocked later…

meyay · September 25, 2023, 7:27pm

You know how it is Ákos: always be suspicous

@accuhealth you can run this snippet on the manager node, to generate snippets that can be executed on each node, which check if ports are reachable:

# on manager
node_ids=$(docker node ls -q)
check_ips=""
for node in ${node_ids}; do
  check_ips="${check_ips} $(docker node inspect ${node} --format '{{.Status.Addr}}')"
done
cat << EOF
# execute this on each node:
check_ips="${check_ips}"
for _ip in \${check_ips}; do
  echo "## ip: \${_ip}"
  nc -zv \${_ip} -t 2377 7946
  nc -zv \${_ip} -u 7946 4789
done
EOF

If I remember right, port 2377/tcp should only be open on the manager node-

accuhealth · October 9, 2023, 11:39pm

hi @meyay,

Sorry, I was a little busy with work…The code you gave me gave the following:

# execute this on each node:
check_ips=" 192.168.1.198 192.168.1.199 192.168.1.200"
for _ip in ${check_ips}; do
  echo "## ip: ${_ip}"
  nc -zv ${_ip} -t 2377 7946
  nc -zv ${_ip} -u 7946 4789
done

netstat -plta
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      -                   
tcp        0      0 localhost:domain        0.0.0.0:*               LISTEN      -                   
tcp        0      0 manager01:43896         192.168.1.199:7946      TIME_WAIT   -                   
tcp        0      0 manager01:42470         192.168.1.200:7946      TIME_WAIT   -                   
tcp        0    628 manager01:ssh           192.168.150.2:59662     ESTABLISHED -                   
tcp        0      0 manager01:37364         ubuntu-mirror-1.ps:http TIME_WAIT   -                   
tcp6       0      0 [::]:ssh                [::]:*                  LISTEN      -                   
tcp6       0      0 [::]:2377               [::]:*                  LISTEN      -                   
tcp6       0      0 [::]:7946               [::]:*                  LISTEN      -                   
tcp6       0      0 manager01:2377          192.168.1.200:55782     ESTABLISHED -                   
tcp6       0      0 manager01:2377          192.168.1.199:52928     ESTABLISHED -

Then paste the following code into the nodes:

check_ips=" 192.168.1.198 192.168.1.199 192.168.1.200"
for _ip in ${check_ips}; do
  echo "## ip: ${_ip}"
  nc -zv ${_ip} -t 2377 7946
  nc -zv ${_ip} -u 7946 4789
done

accuhealth@worker01:~$ check_ips=" 192.168.1.198 192.168.1.199 192.168.1.200"
for _ip in ${check_ips}; do
  echo "## ip: ${_ip}"
  nc -zv ${_ip} -t 2377 7946
  nc -zv ${_ip} -u 7946 4789
done
## ip: 192.168.1.198
Connection to 192.168.1.198 2377 port [tcp/*] succeeded!
Connection to 192.168.1.198 7946 port [tcp/*] succeeded!
Connection to 192.168.1.198 7946 port [udp/*] succeeded!
Connection to 192.168.1.198 4789 port [udp/*] succeeded!
## ip: 192.168.1.199
nc: connect to 192.168.1.199 port 2377 (tcp) failed: Connection refused
Connection to 192.168.1.199 7946 port [tcp/*] succeeded!
Connection to 192.168.1.199 7946 port [udp/*] succeeded!
Connection to 192.168.1.199 4789 port [udp/*] succeeded!
## ip: 192.168.1.200
nc: connect to 192.168.1.200 port 2377 (tcp) failed: Connection refused
Connection to 192.168.1.200 7946 port [tcp/*] succeeded!
Connection to 192.168.1.200 7946 port [udp/*] succeeded!
Connection to 192.168.1.200 4789 port [udp/*] succeeded!

accuhealth@worker01:~$ check_ips=" 192.168.1.198 192.168.1.199 192.168.1.200"
for _ip in ${check_ips}; do
  echo "## ip: ${_ip}"
  nc -zv ${_ip} -t 2377 7946
  nc -zv ${_ip} -u 7946 4789
done
## ip: 192.168.1.198
Connection to 192.168.1.198 2377 port [tcp/*] succeeded!
Connection to 192.168.1.198 7946 port [tcp/*] succeeded!
Connection to 192.168.1.198 7946 port [udp/*] succeeded!
Connection to 192.168.1.198 4789 port [udp/*] succeeded!
## ip: 192.168.1.199
nc: connect to 192.168.1.199 port 2377 (tcp) failed: Connection refused
Connection to 192.168.1.199 7946 port [tcp/*] succeeded!
Connection to 192.168.1.199 7946 port [udp/*] succeeded!
Connection to 192.168.1.199 4789 port [udp/*] succeeded!
## ip: 192.168.1.200
nc: connect to 192.168.1.200 port 2377 (tcp) failed: Connection refused
Connection to 192.168.1.200 7946 port [tcp/*] succeeded!
Connection to 192.168.1.200 7946 port [udp/*] succeeded!
Connection to 192.168.1.200 4789 port [udp/*] succeeded!```

meyay · October 10, 2023, 5:48am

Now we know that worker01 can reach all required port on all other nodes. I would suggest running the snippet on each node, to make sure no firewall prevents outgoing traffic. Note: you seem to have shared only the output of worker01 twice. If the output is identical on all nodes, then we can rule out firewall problems.

What about the mtu sizes? Please run this command on each node and share the output:

ip addr show scope global | grep mtu

bluepuma77 · October 10, 2023, 5:55am

Can you describe your setup? Are you using Docker Swarm in VMs? All on the same host? Are you using a VPN/VLAN/VSwitch between Swarm nodes? From where do you try to access the web server service?

accuhealth · October 13, 2023, 12:30pm

Hi @bluepuma77

Yes, the 3 nodes are on a physical machine and each node is a VM (esxi) and I am trying to access the website from my notebook… the strange thing is that if the containers only fall on the workers I do not reach the website

accuhealth@worker01:~$ check_ips=" 192.168.1.198 192.168.1.199 192.168.1.200"
for _ip in ${check_ips}; do
  echo "## ip: ${_ip}"
  nc -zv ${_ip} -t 2377 7946
  nc -zv ${_ip} -u 7946 4789
done
## ip: 192.168.1.198
Connection to 192.168.1.198 2377 port [tcp/*] succeeded!
Connection to 192.168.1.198 7946 port [tcp/*] succeeded!
Connection to 192.168.1.198 7946 port [udp/*] succeeded!
Connection to 192.168.1.198 4789 port [udp/*] succeeded!
## ip: 192.168.1.199
nc: connect to 192.168.1.199 port 2377 (tcp) failed: Connection refused
Connection to 192.168.1.199 7946 port [tcp/*] succeeded!
Connection to 192.168.1.199 7946 port [udp/*] succeeded!
Connection to 192.168.1.199 4789 port [udp/*] succeeded!
## ip: 192.168.1.200
nc: connect to 192.168.1.200 port 2377 (tcp) failed: Connection refused
Connection to 192.168.1.200 7946 port [tcp/*] succeeded!
Connection to 192.168.1.200 7946 port [udp/*] succeeded!
Connection to 192.168.1.200 4789 port [udp/*] succeeded!```

Is it okay for workers to have port 2377 closed?
On each node (M and W) I have installed:
Docker version 24.0.6, build ed223bc

The Manager replicates the containers to the Workers well, but if the container is not deployed in the Master then the service (for example web) is not seen… therefore the visualization of the replicas is not working.

meyay · October 13, 2023, 5:08pm

This port is used amongst managers for the cluster state. So yes, it is normal that works don’t bind that port.

I am not sure if it’s related, but I kind of recall that back when I still used ESXi, I had to enable promiscuous mode on the vswitch.

Your problems are most likely related to: networking - Docker-swarm overlay network is not working for containers in different hosts - Stack Overflow

Update2:
you might also find this known issues post useful:
Known issues with VMware

accuhealth · October 13, 2023, 5:30pm

thank you!!

The solution was to create the cluster again with the command: --data-path-port=7789

docker swarm startup --data-path-port=7789

VTEP Port is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application."

But we can change docker swarm data-path-port(the default port number 4789 is used) to another:

meyay · October 13, 2023, 6:09pm

Glad it’s sorted out now.

alkurmi · February 13, 2025, 3:34pm

Like you… It works for me on my vmware VM swarm cluster!

Topic		Replies	Views
Docker stack deploy service not listening to port Swarm swarm	1	12808	July 27, 2017
Docker swarm: load-balancer doesn't cycle through all tasks General swarm	0	955	August 20, 2018
Docker Swarm Ingress Network Load Balancing Issue Swarm	4	1708	September 22, 2020
Confused about docker swarm mode networking General swarm	2	738	October 9, 2019
Swarm host availability Swarm	0	955	October 19, 2016

Docker Swarm - The web service without loading

Related topics