Hi,
I’m using docker swarm and deploy services through docker stack deploy
.
I have two problems, which probably are connected to each other.
I cannot expose every detail of my configuration, so I created separate stack for test purposes, where problem is the same.
services:
nginx-no-expose:
image: nginx
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.labels.env.node == 1
volumes:
- /home/docker/templates:/etc/nginx/templates
networks:
- test-network
environment:
- NGINX_HOST=foobar.com
- NGINX_PORT=80
nginx-exposed:
image: nginx
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.labels.env.node == 1
volumes:
- /home/docker/templates:/etc/nginx/templates
ports:
- "8080:81"
networks:
- test-network
environment:
- NGINX_HOST=foobar.com
- NGINX_PORT=81
networks:
test-network:
driver: overlay
attachable: true
external: true
I’ve created test-network at the start
docker network create --driver overlay --attachable test-network
Then I’ve created the stack
docker stack deploy --detach=true --with-registry-auth --compose-file test-stack.yml test && watch -n2 docker service ls
180.0.0.4
is my database (MySQL) VIP.
After stack deploy I cannot connect to database from withing service that is exposing ports (nginx-exposed
). nginx-no-expose
is working properly. Snippet from tests below.
root@env:/home/docker# docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
dec7fe825b65 nginx:latest "/docker-entrypoint.…" 2 minutes ago Up 2 minutes 80/tcp test_nginx-no-expose.1.zi5x4g6z9x4j7ag2m36so5pgm
8ce2433d6631 nginx:latest "/docker-entrypoint.…" 2 minutes ago Up 2 minutes 80/tcp test_nginx-exposed.1.5jm9nch6a2i3j57vxu1d5a81l
root@env:/home/docker# docker exec -it dec7fe825b65 /bin/bash
root@dec7fe825b65:/# ping 180.0.0.4
PING 180.0.0.4 (180.0.0.4) 56(84) bytes of data.
64 bytes from 180.0.0.4: icmp_seq=1 ttl=63 time=0.363 ms
64 bytes from 180.0.0.4: icmp_seq=2 ttl=63 time=0.268 ms
64 bytes from 180.0.0.4: icmp_seq=3 ttl=63 time=0.400 ms
64 bytes from 180.0.0.4: icmp_seq=4 ttl=63 time=0.695 ms
^C
--- 180.0.0.4 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3069ms
rtt min/avg/max/mdev = 0.268/0.431/0.695/0.159 ms
root@dec7fe825b65:/# exit
exit
root@env:/home/docker# docker exec -it 8ce2433d6631 /bin/bash
root@8ce2433d6631:/# ping 180.0.0.4
PING 180.0.0.4 (180.0.0.4) 56(84) bytes of data.
From 180.0.0.40 icmp_seq=1 Destination Host Unreachable
From 180.0.0.40 icmp_seq=2 Destination Host Unreachable
From 180.0.0.40 icmp_seq=3 Destination Host Unreachable
^C
--- 180.0.0.4 ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4093ms
pipe 3
root@8ce2433d6631:/#
The second problem (which probably is connected) to that is very common routing mesh problem.
When I deploy services on different machines using global mode, they cannot communicate with each other.
My machines look like so:
Machine A - 180.0.0.1
Machine B - 180.0.0.2
Machine C - 180.0.0.3
Database (MySQL) - 180.0.0.4
services:
nginx-exposed:
image: nginx
deploy:
mode: global
volumes:
- /home/docker/templates:/etc/nginx/templates
ports:
- "8567:80"
networks:
- test-network
environment:
- NGINX_HOST=foobar.com
- NGINX_PORT=80
nginx-no-expose:
image: nginx
deploy:
mode: global
volumes:
- /home/docker/templates:/etc/nginx/templates
networks:
- test-network
environment:
- NGINX_HOST=foobar.com
- NGINX_PORT=81
networks:
test-network:
driver: overlay
attachable: true
external: true
Ping from machine A to machine A:
root@env:/home/docker# curl http://180.0.0.1:8567/hello
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.27.0</center>
</body>
</html>
Ping from machine B to machine A:
root@env2:/home/docker# curl http://180.0.0.1:8567/hello
(HANGS FOREVER)
Ping from machine B to machine B:
root@env2:/home/docker# curl http://180.0.0.2:8567/hello
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.27.0</center>
</body>
</html>
I’ve tested if it is overlay network problem, but it is not. Deploying container outside of swarm on machine A:
docker run --rm -p "4032:80" --network=test-network --env="NGINX_PORT=80" -it nginx
I can normally curl on machine B by port 4032
and it works like a charm.
It is not the overlay network problem, it is the ingress network problem which clashes with network between my machine (that is my prediction).
I have also podman installed on my machine. By far i cannot find any information regarding if I should be concerned about it - I stopped the service for a moment, but the problem remains. Unfortunately, I cannot uninstall it cause it’s used by other people in this environment.
What I’ve tried:
- checking firewalls (
ufw
inactive) - clearing
iptables
- checking ports required for docker swarm using
netcat
(all are open between machines) - customizing
--data-path-port
usingswarm init
- re-initializing swarm couple times
- disabling tx on every node using below command
sudo ethtool -K <iface> tx-checksum-ip-generic off
sudo ethtool -K <iface> tx off
- stopping podman service on my machine
- stopping zabbix and nginx on my machine
- changing MTU to 1450 or 1400 on my network
Docker version:
Client: Docker Engine - Community
Version: 27.0.3
API version: 1.46
Go version: go1.21.11
Git commit: 7d4bcd8
Built: Sat Jun 29 00:02:33 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.0.3
API version: 1.46 (minimum version 1.24)
Go version: go1.21.11
Git commit: 662f78c
Built: Sat Jun 29 00:02:33 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.18
GitCommit: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc:
Version: 1.7.18
GitCommit: v1.1.13-0-g58aa920
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Ubuntu version:
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy
What can I do to debug where the problem is?