Problem Statement: in a Docker Swarm a service is failing to bind to local port. As a result other nodes in the network cannot connect to the service port (connection refused)
I created 6 nodes (3 each for managers and workers) all using Ubuntu 20.4 LTS Server. To make the matter interesting all these are LxD containers. A DHCP server providing fixed IPs to them based on the MAC address. For folks who are interested in LxD part, both security.nesting and security.privleged set to true. Each node has 2 CPUs and 4.5GB RAM based on a profile I created
u2004dm1 - 192.168.2.2
u2004dm2 - 192.168.2.3
u2004dm3 - 192.168.2.4
u2004dw1 - 192.168.2.9
u2004dw2 - 192.168.2.10
u2004dw3 - 192.168.2.11
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
x2kosaerz2jiejcq0qck52yb4 * u2004dm1 Ready Active Reachable 19.03.12
142b9jp9aix20l0l74vyof8sc u2004dm2 Ready Active Reachable 19.03.12
nhew4gbfugvrld2qduc25ypse u2004dm3 Ready Active Leader 19.03.12
nb7wkfz14d9lfvsck095g69sg u2004dw1 Ready Active 19.03.12
eeda085k5j0h1syem0bw8752z u2004dw2 Ready Active 19.03.12
eqm3lq5skjyqwo87y5eqgpo6z u2004dw3 Ready Active 19.03.12
Here is the iptables output on u2004dm1 and others with identical output
$ sudo iptables --list
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
DOCKER-USER all – anywhere anywhere
DOCKER-ISOLATION-STAGE-1 all – anywhere anywhere
ACCEPT all – anywhere anywhere ctstate RELATED,ESTABLISHED
DOCKER all – anywhere anywhere
ACCEPT all – anywhere anywhere
ACCEPT all – anywhere anywhere
ACCEPT all – anywhere anywhere ctstate RELATED,ESTABLISHED
DOCKER all – anywhere anywhere
ACCEPT all – anywhere anywhere
DROP all – anywhere anywhere
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain DOCKER (2 references)
target prot opt source destination
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target prot opt source destination
DOCKER-ISOLATION-STAGE-2 all – anywhere anywhere
DOCKER-ISOLATION-STAGE-2 all – anywhere anywhere
RETURN all – anywhere anywhere
Chain DOCKER-ISOLATION-STAGE-2 (2 references)
target prot opt source destination
DROP all – anywhere anywhere
DROP all – anywhere anywhere
RETURN all – anywhere anywhere
Chain DOCKER-USER (1 references)
target prot opt source destination
RETURN all – anywhere anywhere
To test out few relevant theories I ran the following from u2004dm1
$ docker run --rm -it alpine ping -c4 u2004dw3
PING u2004dw3 (192.168.2.11): 56 data bytes
64 bytes from 192.168.2.11: seq=0 ttl=63 time=0.074 ms
my current swarm config ports as seen on u2004dm1 also binds on other node ports
$ netstat -lntu
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN
tcp 0 0 192.168.2.2:2377 0.0.0.0:* LISTEN
tcp 0 0 192.168.2.2:7946 0.0.0.0:* LISTEN
udp 0 0 192.168.2.2:7946 0.0.0.0:*
udp 0 0 127.0.0.53:53 0.0.0.0:*
udp 0 0 192.168.2.2:68 0.0.0.0:*
udp 0 0 0.0.0.0:4789 0.0.0.0:*
I can run an instance of nginx as a standalone container on port 80 on u2004dm1
$ docker run --name my-web --rm -p 80:80 -d nginx
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN
tcp 0 0 192.168.2.2:2377 0.0.0.0:* LISTEN
tcp 0 0 192.168.2.2:7946 0.0.0.0:* LISTEN
tcp6 0 0 :::80 :::* LISTEN
udp 0 0 192.168.2.2:7946 0.0.0.0:*
udp 0 0 127.0.0.53:53 0.0.0.0:*
udp 0 0 192.168.2.2:68 0.0.0.0:*
udp 0 0 0.0.0.0:4789 0.0.0.0:*
as it is very clear the port 80 is locally bound and accepting requests, to test that out I ran curl from u2004dm1 as well as u2004dw3, and in both cases I see the following output
$ curl -I http://u2004dm1
HTTP/1.1 200 OK
Server : nginx/1.19.1
Date : Tue, 21 Jul 2020 20:58:21 GMT
Content-Type : text/html
Content-Length : 612
Last-Modified : Tue, 07 Jul 2020 15:52:25 GMT
Connection : keep-alive
ETag : “5f049a39-264”
Accept-Ranges : bytes
Now, in order test the real issue, I removed the running nginx container and installed nginx as a service with 1 replica
$ docker service create --name test-webserver --publish published=80,target=80,mode=ingress --replicas 1 nginx
i1tzu5273nxprf46nmmsxukl7
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
Now checked whether the service is up and running
$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
i1tzu5273nxp test-webserver replicated 1/1 nginx:latest *:80->80/tcp
Just to verify whether nginx is running as service, for readability I am showing the output as key value
$ docker service ps test-webserver
ID: npqivzarp685
NAME: test-webserver.1
IMAGE: nginx:latest
NODE: u2004dw1
DESIRED STATE: Running
CURRENT STATE: Running 2 minutes ago
ERROR
PORTS
Please note that no errors reported so is the port, so I ran the familiar command
$ netstat -lntu
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN
tcp 0 0 192.168.2.2:2377 0.0.0.0:* LISTEN
tcp 0 0 192.168.2.2:7946 0.0.0.0:* LISTEN
udp 0 0 192.168.2.2:7946 0.0.0.0:*
udp 0 0 127.0.0.53:53 0.0.0.0:*
udp 0 0 192.168.2.2:68 0.0.0.0:*
udp 0 0 0.0.0.0:4789 0.0.0.0:*
Interestingly, the PORT 80 is missing in the list above, the question is Why? so I ran the command as below, it is a long string so have patience. To save readers some time, I have checked the list below shows published port as 80.
$ docker service inspect test-webserver
[
{
“ID”: “i1tzu5273nxprf46nmmsxukl7”,
“Version”: {
“Index”: 486255
},
“CreatedAt”: “2020-07-21T21:02:31.367873172Z”,
“UpdatedAt”: “2020-07-21T21:02:31.41079312Z”,
“Spec”: {
“Name”: “test-webserver”,
“Labels”: {},
“TaskTemplate”: {
“ContainerSpec”: {
“Image”: “nginx:latest@sha256:a93c8a0b0974c967aebe868a186e5c205f4d3bcb5423a56559f2f9599074bbcd”,
“Init”: false,
“StopGracePeriod”: 10000000000,
“DNSConfig”: {},
“Isolation”: “default”
},
“Resources”: {
“Limits”: {},
“Reservations”: {}
},
“RestartPolicy”: {
“Condition”: “any”,
“Delay”: 5000000000,
“MaxAttempts”: 0
},
“Placement”: {
“Platforms”: [
{
“Architecture”: “amd64”,
“OS”: “linux”
},
{
“OS”: “linux”
},
{
“OS”: “linux”
},
{
“Architecture”: “arm64”,
“OS”: “linux”
},
{
“Architecture”: “386”,
“OS”: “linux”
},
{
“Architecture”: “mips64le”,
“OS”: “linux”
},
{
“Architecture”: “ppc64le”,
“OS”: “linux”
},
{
“Architecture”: “s390x”,
“OS”: “linux”
}
]
},
“ForceUpdate”: 0,
“Runtime”: “container”
},
“Mode”: {
“Replicated”: {
“Replicas”: 1
}
},
“UpdateConfig”: {
“Parallelism”: 1,
“FailureAction”: “pause”,
“Monitor”: 5000000000,
“MaxFailureRatio”: 0,
“Order”: “stop-first”
},
“RollbackConfig”: {
“Parallelism”: 1,
“FailureAction”: “pause”,
“Monitor”: 5000000000,
“MaxFailureRatio”: 0,
“Order”: “stop-first”
},
“EndpointSpec”: {
“Mode”: “vip”,
“Order”: “stop-first”
},
“EndpointSpec”: {
“Mode”: “vip”,
“Ports”: [
{
“Protocol”: “tcp”,
“TargetPort”: 80,
“PublishedPort”: 80,
“PublishMode”: “ingress”
}
]
}
},
“Endpoint”: {
“Spec”: {
“Mode”: “vip”,
“Ports”: [
{
“Protocol”: “tcp”,
“TargetPort”: 80,
“PublishedPort”: 80,
“PublishMode”: “ingress”
}
]
},
“Ports”: [
{
“Protocol”: “tcp”,
“TargetPort”: 80,
“PublishedPort”: 80,
“PublishMode”: “ingress”
}
],
“VirtualIPs”: [
{
“NetworkID”: “e0f9c9cmg3yitvttefr0s8qbz”,
“Addr”: “10.0.0.17/24”
}
]
}
}
]
Now, I wanted to test the connectivity from u2004dm1 and others, but first from u2004dm1 and ran curl command as above but got the following output
curl: (7) Failed to connect to u2004dm1 port 80: Connection refused
I switched to u2004dw3 node hoping to see some results and ran the same command and ran curl again, same output
curl: (7) Failed to connect to u2004dm1 port 80: Connection refused
Now, at this point, it is a clear question, why the service not binding to port 80.