Routing Mesh not binding to service port on the node

Problem Statement: in a Docker Swarm a service is failing to bind to local port. As a result other nodes in the network cannot connect to the service port (connection refused)

I created 6 nodes (3 each for managers and workers) all using Ubuntu 20.4 LTS Server. To make the matter interesting all these are LxD containers. A DHCP server providing fixed IPs to them based on the MAC address. For folks who are interested in LxD part, both security.nesting and security.privleged set to true. Each node has 2 CPUs and 4.5GB RAM based on a profile I created

u2004dm1 - 192.168.2.2
u2004dm2 - 192.168.2.3
u2004dm3 - 192.168.2.4
u2004dw1 - 192.168.2.9
u2004dw2 - 192.168.2.10
u2004dw3 - 192.168.2.11

$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
x2kosaerz2jiejcq0qck52yb4 * u2004dm1 Ready Active Reachable 19.03.12
142b9jp9aix20l0l74vyof8sc u2004dm2 Ready Active Reachable 19.03.12
nhew4gbfugvrld2qduc25ypse u2004dm3 Ready Active Leader 19.03.12
nb7wkfz14d9lfvsck095g69sg u2004dw1 Ready Active 19.03.12
eeda085k5j0h1syem0bw8752z u2004dw2 Ready Active 19.03.12
eqm3lq5skjyqwo87y5eqgpo6z u2004dw3 Ready Active 19.03.12

Here is the iptables output on u2004dm1 and others with identical output

$ sudo iptables --list

Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination
DOCKER-USER all – anywhere anywhere
DOCKER-ISOLATION-STAGE-1 all – anywhere anywhere
ACCEPT all – anywhere anywhere ctstate RELATED,ESTABLISHED
DOCKER all – anywhere anywhere
ACCEPT all – anywhere anywhere
ACCEPT all – anywhere anywhere
ACCEPT all – anywhere anywhere ctstate RELATED,ESTABLISHED
DOCKER all – anywhere anywhere
ACCEPT all – anywhere anywhere
DROP all – anywhere anywhere

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain DOCKER (2 references)
target prot opt source destination

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target prot opt source destination
DOCKER-ISOLATION-STAGE-2 all – anywhere anywhere
DOCKER-ISOLATION-STAGE-2 all – anywhere anywhere
RETURN all – anywhere anywhere

Chain DOCKER-ISOLATION-STAGE-2 (2 references)
target prot opt source destination
DROP all – anywhere anywhere
DROP all – anywhere anywhere
RETURN all – anywhere anywhere

Chain DOCKER-USER (1 references)
target prot opt source destination
RETURN all – anywhere anywhere

To test out few relevant theories I ran the following from u2004dm1

$ docker run --rm -it alpine ping -c4 u2004dw3

PING u2004dw3 (192.168.2.11): 56 data bytes
64 bytes from 192.168.2.11: seq=0 ttl=63 time=0.074 ms

my current swarm config ports as seen on u2004dm1 also binds on other node ports
$ netstat -lntu

Proto Recv-Q Send-Q Local Address           Foreign Address         State 
tcp        0         0         127.0.0.53:53              0.0.0.0:*               LISTEN 
tcp        0        0          192.168.2.2:2377        0.0.0.0:*               LISTEN
tcp        0        0          192.168.2.2:7946        0.0.0.0:*               LISTEN
udp        0      0           192.168.2.2:7946        0.0.0.0:*                          
udp        0      0           127.0.0.53:53              0.0.0.0:*                          
udp        0      0           192.168.2.2:68            0.0.0.0:*                           
udp        0      0           0.0.0.0:4789                0.0.0.0:*         

I can run an instance of nginx as a standalone container on port 80 on u2004dm1

$ docker run --name my-web --rm -p 80:80 -d nginx

Proto Recv-Q Send-Q           Local Address           Foreign Address         State      
tcp        0        0          127.0.0.53:53              0.0.0.0:*               LISTEN     
tcp        0        0          192.168.2.2:2377            0.0.0.0:*               LISTEN     
tcp        0        0           192.168.2.2:7946           0.0.0.0:*               LISTEN     
tcp6       0        0            :::80                           :::*                        LISTEN     
udp        0        0           192.168.2.2:7946       0.0.0.0:*                          
udp        0        0           127.0.0.53:53            0.0.0.0:*                          
udp        0        0           192.168.2.2:68          0.0.0.0:*                          
udp        0        0            0.0.0.0:4789             0.0.0.0:*

as it is very clear the port 80 is locally bound and accepting requests, to test that out I ran curl from u2004dm1 as well as u2004dw3, and in both cases I see the following output

$ curl -I http://u2004dm1

HTTP/1.1 200 OK
Server : nginx/1.19.1
Date : Tue, 21 Jul 2020 20:58:21 GMT
Content-Type : text/html
Content-Length : 612
Last-Modified : Tue, 07 Jul 2020 15:52:25 GMT
Connection : keep-alive
ETag : “5f049a39-264”
Accept-Ranges : bytes

Now, in order test the real issue, I removed the running nginx container and installed nginx as a service with 1 replica

$ docker service create --name test-webserver --publish published=80,target=80,mode=ingress --replicas 1 nginx

i1tzu5273nxprf46nmmsxukl7
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged

Now checked whether the service is up and running

$ docker service ls

ID NAME MODE REPLICAS IMAGE PORTS
i1tzu5273nxp test-webserver replicated 1/1 nginx:latest *:80->80/tcp

Just to verify whether nginx is running as service, for readability I am showing the output as key value

$ docker service ps test-webserver

ID: npqivzarp685
NAME: test-webserver.1
IMAGE: nginx:latest
NODE: u2004dw1
DESIRED STATE: Running
CURRENT STATE: Running 2 minutes ago
ERROR
PORTS

Please note that no errors reported so is the port, so I ran the familiar command

$ netstat -lntu

Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN
tcp 0 0 192.168.2.2:2377 0.0.0.0:* LISTEN
tcp 0 0 192.168.2.2:7946 0.0.0.0:* LISTEN
udp 0 0 192.168.2.2:7946 0.0.0.0:*
udp 0 0 127.0.0.53:53 0.0.0.0:*
udp 0 0 192.168.2.2:68 0.0.0.0:*
udp 0 0 0.0.0.0:4789 0.0.0.0:*

Interestingly, the PORT 80 is missing in the list above, the question is Why? so I ran the command as below, it is a long string so have patience. To save readers some time, I have checked the list below shows published port as 80.

$ docker service inspect test-webserver

[
{
“ID”: “i1tzu5273nxprf46nmmsxukl7”,
“Version”: {
“Index”: 486255
},
“CreatedAt”: “2020-07-21T21:02:31.367873172Z”,
“UpdatedAt”: “2020-07-21T21:02:31.41079312Z”,
“Spec”: {
“Name”: “test-webserver”,
“Labels”: {},
“TaskTemplate”: {
“ContainerSpec”: {
“Image”: “nginx:latest@sha256:a93c8a0b0974c967aebe868a186e5c205f4d3bcb5423a56559f2f9599074bbcd”,
“Init”: false,
“StopGracePeriod”: 10000000000,
“DNSConfig”: {},
“Isolation”: “default”
},
“Resources”: {
“Limits”: {},
“Reservations”: {}
},
“RestartPolicy”: {
“Condition”: “any”,
“Delay”: 5000000000,
“MaxAttempts”: 0
},
“Placement”: {
“Platforms”: [
{
“Architecture”: “amd64”,
“OS”: “linux”
},
{
“OS”: “linux”
},
{
“OS”: “linux”
},
{
“Architecture”: “arm64”,
“OS”: “linux”
},
{
“Architecture”: “386”,
“OS”: “linux”
},
{
“Architecture”: “mips64le”,
“OS”: “linux”
},
{
“Architecture”: “ppc64le”,
“OS”: “linux”
},
{
“Architecture”: “s390x”,
“OS”: “linux”
}
]
},
“ForceUpdate”: 0,
“Runtime”: “container”
},
“Mode”: {
“Replicated”: {
“Replicas”: 1
}
},
“UpdateConfig”: {
“Parallelism”: 1,
“FailureAction”: “pause”,
“Monitor”: 5000000000,
“MaxFailureRatio”: 0,
“Order”: “stop-first”
},
“RollbackConfig”: {
“Parallelism”: 1,
“FailureAction”: “pause”,
“Monitor”: 5000000000,
“MaxFailureRatio”: 0,
“Order”: “stop-first”
},
“EndpointSpec”: {
“Mode”: “vip”,
“Order”: “stop-first”
},
“EndpointSpec”: {
“Mode”: “vip”,
“Ports”: [
{
“Protocol”: “tcp”,
“TargetPort”: 80,
“PublishedPort”: 80,
“PublishMode”: “ingress”
}
]
}
},
“Endpoint”: {
“Spec”: {
“Mode”: “vip”,
“Ports”: [
{
“Protocol”: “tcp”,
“TargetPort”: 80,
“PublishedPort”: 80,
“PublishMode”: “ingress”
}
]
},
“Ports”: [
{
“Protocol”: “tcp”,
“TargetPort”: 80,
“PublishedPort”: 80,
“PublishMode”: “ingress”
}
],
“VirtualIPs”: [
{
“NetworkID”: “e0f9c9cmg3yitvttefr0s8qbz”,
“Addr”: “10.0.0.17/24”
}
]
}
}
]

Now, I wanted to test the connectivity from u2004dm1 and others, but first from u2004dm1 and ran curl command as above but got the following output

curl: (7) Failed to connect to u2004dm1 port 80: Connection refused

I switched to u2004dw3 node hoping to see some results and ran the same command and ran curl again, same output

curl: (7) Failed to connect to u2004dm1 port 80: Connection refused

Now, at this point, it is a clear question, why the service not binding to port 80.

I never bother to look that close to the difference of the outputs of docker service ps ${servicename} and docker service ls --filter name=${servicename}.

While ps shows runtime details for each task (as in task controlling a container instance of the service) of the service, ls shows static details of the service itself. it is the same for me: my published mode: ingress ports are only listed with ls.If you would have published them in mode: host, they would only show up in ps as they would belong to the task then…

docker service inspect ${servicename} will return more static details than ls would provide and not a single detail psprovides.

You can inspect task details with something like docker inspect $(docker service ps --format '{{.ID}}' ${servicename})'. You will see that the json structure is not identical to the return of `docker service inspect ${servicename}’, as it will include runtime specific details.

Do overlay networks generaly work between your lxd’nodes? Ingress is nothing more than a special overlay network. Unless you specify mode: host, a published port is by default nothing more then a vip for your service connected to the ingress network.

All of the LxD containers are available in the LAN. I can ping them from other physical computers in the LAN. By LxD design, the LxD host cannot reach containers and vice versa. But LxD containers can PING each other.

I re-read Docker Swarm documentation today, and followed the instructions and included optional flags when creating swarm such listen-port etc.

It must be fundamental somewhere. here’s the ingress network output

$ docker network inspect ingress

[
{
“Name”: “ingress”,
“Id”: “e0f9c9cmg3yitvttefr0s8qbz”,
“Created”: “2020-07-20T18:52:36.248623945-04:00”,
“Scope”: “swarm”,
“Driver”: “overlay”,
“EnableIPv6”: false,
“IPAM”: {
“Driver”: “default”,
“Options”: null,
“Config”: [
{
“Subnet”: “10.0.0.0/24”,
“Gateway”: “10.0.0.1”
}
]
},
“Internal”: false,
“Attachable”: false,
“Ingress”: true,
“ConfigFrom”: {
“Network”: “”
},
“ConfigOnly”: false,
“Containers”: {
“ingress-sbox”: {
“Name”: “ingress-endpoint”,
“EndpointID”: “424ee22e2fade7b1c240a9e236bd09de7f1a440de61ff2a58af0803672610c2e”,
“MacAddress”: “02:42:0a:00:00:49”,
“IPv4Address”: “10.0.0.73/24”,
“IPv6Address”: “”
}
},
“Options”: {
“com.docker.network.driver.overlay.vxlanid_list”: “4096”
},
“Labels”: {},
“Peers”: [
{
“Name”: “557ef54b9f77”,
“IP”: “192.168.2.4”
},
},
“ConfigOnly”: false,
“Containers”: {
“ingress-sbox”: {
“Name”: “ingress-endpoint”,
“EndpointID”: “424ee22e2fade7b1c240a9e236bd09de7f1a440de61ff2a58af0803672610c2e”,
“MacAddress”: “02:42:0a:00:00:49”,
“IPv4Address”: “10.0.0.73/24”,
“IPv6Address”: “”
}
},
“Options”: {
“com.docker.network.driver.overlay.vxlanid_list”: “4096”
},
“Labels”: {},
“Peers”: [
{
“Name”: “557ef54b9f77”,
“IP”: “192.168.2.4”
},
{
“Name”: “95038cc173ba”,
“IP”: “192.168.2.9”
},
{
“Name”: “680235837c8b”,
“IP”: “192.168.2.10”
},
{
“Name”: “67c2fa73099a”,
“IP”: “192.168.2.3”
},
{
“Name”: “810c35ae7919”,
“IP”: “192.168.2.11”
},
{
“Name”: “a85554e319c8”,
“IP”: “192.168.2.2”
}
]
}
]

The test is only valid if it’s done with swarm services connected to a docker overlay network.
Of course a plain docker container will always be able to ping another container on the same host.

Also, can you provide the output of docker inspect $(docker service ps --format '{{.ID}}' test-webserver)'

the test nginx service was created in mode=ingress. Based on the Docker Swarm documentation that itself is sufficient to connect all Docker Nodes to participate in the overlay network.

Network connectivity tests clearly shows nodes participate in the Docker swarm as also evident in the Docker network inspect ingress command.

True, if overlay is proofed to be working between nodes.
How did you verify that it is?

[
{
    "ID": "npqivzarp6851pl1cfm7uqk75",
    "Version": {
        "Index": 486261
    },
    "CreatedAt": "2020-07-21T21:02:31.396546838Z",
    "UpdatedAt": "2020-07-21T21:02:38.844762964Z",
    "Labels": {},
    "Spec": {
        "ContainerSpec": {
            "Image": "nginx:latest@sha256:a93c8a0b0974c967aebe868a186e5c205f4d3bcb5423a56559f2f9599074bbcd",
            "Init": false,
            "DNSConfig": {},
            "Isolation": "default"
        },
        "Resources": {
            "Limits": {},
            "Reservations": {}
        },
        "Placement": {
            "Platforms": [
                {
                    "Architecture": "amd64",
                    "OS": "linux"
                },
                {
                    "OS": "linux"
                },
                {
                    "OS": "linux"
                },
                {
                    "Architecture": "arm64",
                    "OS": "linux"
                },
                {
                    "Architecture": "386",
                    "OS": "linux"
                },
                {
                    "Architecture": "mips64le",
                    "OS": "linux"
                },
                {
                    "Architecture": "ppc64le",
                    "OS": "linux"
                },
                {
                    "Architecture": "s390x",
                    "OS": "linux"
                }
            ]
        },
        "ForceUpdate": 0
    },
    "ServiceID": "i1tzu5273nxprf46nmmsxukl7",
    "Slot": 1,
    "NodeID": "nb7wkfz14d9lfvsck095g69sg",
    "Status": {
        "Timestamp": "2020-07-21T21:02:38.791153725Z",
        "State": "running",
        "Message": "started",
        "ContainerStatus": {
            "ContainerID": "11cac21836f94af1f481113f971fe57e9e33d29068f4f35468128bf83d12f958",
            "PID": 794,
            "ExitCode": 0
        },
        "PortStatus": {}
    },
    "DesiredState": "running",
    "NetworksAttachments": [
        {
            "Network": {
                "ID": "e0f9c9cmg3yitvttefr0s8qbz",
                "Version": {
                    "Index": 486107
                },
                "CreatedAt": "2020-07-18T19:41:40.952615402Z",
                "UpdatedAt": "2020-07-21T02:31:28.893853649Z",
                "Spec": {
                    "Name": "ingress",
                    "Labels": {},
                    "DriverConfiguration": {},
                    "Ingress": true,
                    "IPAMOptions": {
                        "Driver": {}
                    },
                    "Scope": "swarm"
                },
                "DriverState": {
                    "Name": "overlay",
                    "Options": {
                        "com.docker.network.driver.overlay.vxlanid_list": "4096"
                    }
                },
                "IPAMOptions": {
                    "Driver": {
                        "Name": "default"
                    },
                    "Configs": [
                        {
                            "Subnet": "10.0.0.0/24",
                            "Gateway": "10.0.0.1"
                        }
                    ]
                }
            },
            "Addresses": [
                "10.0.0.18/24"
            ]
        }
    ]
}

]

Can you edit your post and wrap the json with the </> Preformated text control? It makes reading the json post easier.

although, the test service did not bind to a physical port and hence curl failed. but I increased the count of replicas to 25 and they operation was successful

$ docker service scale test-webserver=25

test-webserver scaled to 25
overall progress: 25 out of 25 tasks
verify: Service converged

I am expecting to see a similar output of netstat -lntu between two following cases on u2004dm1 node that the test-server binds to a physical port 80. Docker run works but, unlike what is said in the Docker Swarm documentation, a service such as the test-webserver not binding. And the question is why?

  1. Docker run …
  2. docker service create …

although, I am replying to myself, but this time I see a difference but without an answer.

Docker run command works using tcp6 (see netstat output), whereas apparently in Docker swarm, Docker service create is oblivious to that. Mysterious…

Thank you for wrapping the ouput of the task in preformated text - it makes it easier to read.
The .NetworksAttachments section looks about right. The empty .Status.PortStatus is correct for a published ingress port - it would have been populated if host ports would have been published instead,

I assume the problem is located in the nodes network stack (permission, missing kernel modules?)

Scalling replicas or perform any tasks on the services via docker cli are not going to give you any insights about this particular situation. You realy need to test if node spanning overlay network communication amongst swarm services generaly works.

Dualstack bindings are listed as tcp6, but still respond on ipv4 as well. Swarm services do bind ports and they are listet in netstat the same way bindings for plain docker containers are listed. Though, something seem broken in your setup.

And taking a look at the logs won’t hurt either: journalctl --no-pager --unit=docker

Before I take down the swarm and re do it from scratch, I ran few other service create options, this time using service mode=host

$ docker service create --name test-webserver --publish published=80,target=80,mode=host --replicas 1 nginx

ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
w0qsggt4y5ni        test-webserver.1    nginx:latest        u2004dw2            Running             Running 4 minutes ago                       *:80->80/tcp

as it is evident from the output the service is running in the host u2004dw2, and switching over there to get the netstat output, surely enough, it bound to port 80

Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN     
tcp6       0      0 :::80                   :::*                    LISTEN     
tcp6       0      0 :::7946                 :::*                    LISTEN     
udp        0      0 127.0.0.53:53           0.0.0.0:*                          
udp        0      0 192.168.2.10:68         0.0.0.0:*                          
udp        0      0 0.0.0.0:4789            0.0.0.0:*                          
udp6       0      0 :::7946                 :::*

i keep bumping on this finding, i.e. every time I run either Docker run or Docker service create with mode=host the port binds but on tcp6. The ingress mode overlay driver not binding to tcp6. Could that be the problem?

Like I already wrote:

Before destroying your setup, you should definitly try if overlay traffic generaly works. Non of your posts indicate you tested it so far…

You might want to take a look at GitHub - nicolaka/netshoot: a Docker + Kubernetes network trouble-shooting swiss-army container

thanks for the link. I am going to test the overlay next

i ran the overlay test in the following way.

  1. created an overlay network

$ docker network create -d overlay my-ovl

then ran nc utility in listening mode as a service

$ docker service create --name service-a --network my-ovl -p 8080:8080 nicolaka/netshoot nc -l 8080

i inspected the service to figure out the container IP address it is running and the docker node hosting the container. the container hosting the listening service had two IPs

10.0.0.21 (default ingress network)
10.0.2.8 (my-ovl)

I ran another service to be able to connect on the port 8080 of service-a. Although, the new service runs eternal ping to one of the docker nodes, but it gives me a container to land into running on a different docker node

$ docker service create --name service-c --network my-ovl alpine ping u2004dm1

after inspecting the relevant details I found the IP of the new container is 10.0.2.6. this is in the same subnet as my-ovl. the service-a also has IP in the same subnet

I ran docker exec on the container running service-c, and inside the container I ran the ultimate test

/ # nc -vn 10.0.2.8 8080
10.0.2.8 (10.0.2.8:8080) open

the above output suggests that the service-a port in the my-ovl network is available from the service-c container. The same test failed when connecting to the ingress IP address. This probably ultimately proves that the routing mesh network (ingress) is where the trouble is. This matches my original theory.

Next, I will have to take down the swarm, create a new overlay network and recreate the swarm using the new docker network

Did you use the container ip of the task or the vip ip of the service to test the communication?

Finally, found the difference. The issue was in linux kernel config for LxD containers and has to set the below
linux.kernel_modules: bridge,ip_tables,nf_nat,overlay,br_netfilter

1 Like