Cannot connect to swarm service with published port on custom Linux

Hi,

I run a single node swarm.
I start a single web service.
I publish its port 80 port to 8088.
I expect to be able to connect to 127.0.0.1 port 8088.
But I get connection refused.
In fact, I don’t seem to be able to connect to that container at all.

Please note that the nature of this question is probably less about how to run Docker Swarm,
and more about how to configure Linux, cgroups2, iptables, etc. correctly so that Docker Swarm will fully work.

Because the thing is: This issue only happens on the minimal home-made Linux distro I’m developing (ie. the test works on my mainstream Linux workstation as expected).
Other Swarm features do work, e.g. joining more nodes and replicating services to these nodes.
Also, standard docker run -p 8088:80 ... works as expected.
So I feel like I’m just missing some tiny bit of configuration, to make everything click.

I’m adhering to the prerequisites (link in reply below), though I’m using cgroupfs v2, and I think the notes only apply to v1.

I did run https://raw.githubusercontent.com/moby/moby/master/contrib/check-config.sh
and I’m only missing:

  • CONFIG_MEMCG_SWAP, but reports “(cgroup swap accounting is currently enabled)”.
  • CONFIG_SECURITY_SELINUX
  • CONFIG_SECURITY_APPARMOR
  • and everything zfs

I’m intentionally running without the mentioned security mechanism (so far), and I don’t use zfs. I can’t find CONFIG_MEMCG_SWAP in Linux 6.4.

This is the swarm test:

docker swarm init --default-addr-pool 10.22.0.0/16

docker network create --driver=overlay --attachable www-net
docker network inspect www-net

docker service create --publish published=8088,target=80 --name=www --network=www-net nginx:1.25.2-alpine

telnet 127.0.0.1 8088  # Connection refused!

I explicitly created the www-net network so I could make it attachable and use it for debugging:

docker run -it --rm --network=www-net busybox ping -c2 www    # OK
docker run -it --rm --network=www-net busybox telnet www 8088 # Connection refused

I’ve placed everything needed to reproduce and investigate the issue here:
https://lightwhale.asklandd.dk/dev/swarm/

This directory includes:

  • swarm-test.log, a script log file with the experiment and lots of debugging info. It’s readable in its raw form, but more so using cat swarm-test.log. However, it’s probably easier to just boot the OS in QEMU.
  • lightwhale-2.0.5-dev3-kernel, the kernel.
  • lightwhale-2.0.5-dev3-rootfs, the rootfs.
  • start-lightwhale, a script to download said kernel and rootfs, and boot into qemu-system-x86_x64 and start Lightwhale.

I would really appreciate any helpful pointers that could help me understand and resolve what is causing my system to fail with swarm. And if more information is needed, I’ll be happy to provide it, of course.

Thanks,
Stephan

The mentioned prerequisites are here: Install Docker Engine from binaries | Docker Docs
(I could only post two links in one post).

From your host or from inside a container? A container localhost is not your hosts localhost.

From the same host that just started the container. The “swarm test” described above is literally the test.

What does docker ps or docker inspect <cid> tell you?

I just researched running Docker more securely with sysbox, but they have the issue that vip (virtual IP) is not working. So you have to set the compose endpoint_mode to dnsrr (doc) or use host networking (compose example):

services:
  traefik:
    image: traefik:v2.10
    hostname: '{{.Node.Hostname}}'
    ports:
      # listen on host without ingress network
      - target: 80
        published: 80
        protocol: tcp
        mode: host
      - target: 443
        published: 443
        protocol: tcp
        mode: host

So, continuing from the test case specified above…

op@lightwhale:~$ d ps
CONTAINER ID   IMAGE                 COMMAND                  CREATED          STATUS          PORTS     NAMES
58fd588e1604   nginx:1.25.2-alpine   "/docker-entrypoint.…"   47 seconds ago   Up 45 seconds   80/tcp    www.1.zvmo0rdapfrgpz4qx10ja6qz4

And:

op@lightwhale:~$ d inspect 58fd588e1604
[
    {
        "Id": "58fd588e1604e1fc18063f8f0cae2892ef0389d8a2b7167ed8b3124b2163cfb7",
        "Created": "2023-10-09T11:17:35.762082356Z",
        "Path": "/docker-entrypoint.sh",
        "Args": [
            "nginx",
            "-g",
            "daemon off;"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 973,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2023-10-09T11:17:36.644440429Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:d571254277f6a0ba9d0c4a08f29b94476dcd4a95275bd484ece060ee4ff847e4",
        "ResolvConfPath": "/mnt/lightwhale-data/docker/containers/58fd588e1604e1fc18063f8f0cae2892ef0389d8a2b7167ed8b3124b2163cfb7/resolv.conf",
        "HostnamePath": "/mnt/lightwhale-data/docker/containers/58fd588e1604e1fc18063f8f0cae2892ef0389d8a2b7167ed8b3124b2163cfb7/hostname",
        "HostsPath": "/mnt/lightwhale-data/docker/containers/58fd588e1604e1fc18063f8f0cae2892ef0389d8a2b7167ed8b3124b2163cfb7/hosts",
        "LogPath": "/mnt/lightwhale-data/docker/containers/58fd588e1604e1fc18063f8f0cae2892ef0389d8a2b7167ed8b3124b2163cfb7/58fd588e1604e1fc18063f8f0cae2892ef0389d8a2b7167ed8b3124b2163cfb7-json.log",
        "Name": "/www.1.zvmo0rdapfrgpz4qx10ja6qz4",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": null,
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "default",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "ConsoleSize": [
                0,
                0
            ],
            "CapAdd": null,
            "CapDrop": null,
            "CgroupnsMode": "private",
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "Isolation": "default",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": null,
            "PidsLimit": null,
            "Ulimits": [],
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ],
            "Init": false
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/mnt/lightwhale-data/docker/overlay2/a9c1dd49cd31e890e1fbeb8633cab58504e7e031822ec720e74d3a2526f8c694-init/diff:/mnt/lightwhale-data/docker/overlay2/16ea02fd244867a1c4f695fc8b08337499d1c61d16d4ad609c8276ebd1f63a34/diff:/mnt/lightwhale-data/docker/overlay2/8f831ce946b1fd4dc29d4c236e01e7c78233a181cfd2cc88c57f7a36e73cf023/diff:/mnt/lightwhale-data/docker/overlay2/227dc1fbfb12d02764ea66bd8ae03605d0025e4c4df4fe687db837c0ab577b23/diff:/mnt/lightwhale-data/docker/overlay2/1cfbd4f6681ec74c1ebb6792f865ac09833eb2caf77bae418329b2fde96c39d7/diff:/mnt/lightwhale-data/docker/overlay2/c3bf1feb189150e28f03f76fb9f60f3cdb73b29008f0b7cfd8877e89d1e2ebea/diff:/mnt/lightwhale-data/docker/overlay2/c87a5502976edbe728c9af98aa8ffad998ae0df0b079a420dc909054c8d743b7/diff:/mnt/lightwhale-data/docker/overlay2/e6d8a8ba9a954281c40628e8f62712b51bcd09bcfede0e84c063c8d7f5a4c38e/diff:/mnt/lightwhale-data/docker/overlay2/dcb1da9ae6dcc7fd368700e869bae6e1dc3d8e792eb5aadd7c7c8aad58091054/diff",
                "MergedDir": "/mnt/lightwhale-data/docker/overlay2/a9c1dd49cd31e890e1fbeb8633cab58504e7e031822ec720e74d3a2526f8c694/merged",
                "UpperDir": "/mnt/lightwhale-data/docker/overlay2/a9c1dd49cd31e890e1fbeb8633cab58504e7e031822ec720e74d3a2526f8c694/diff",
                "WorkDir": "/mnt/lightwhale-data/docker/overlay2/a9c1dd49cd31e890e1fbeb8633cab58504e7e031822ec720e74d3a2526f8c694/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [],
        "Config": {
            "Hostname": "58fd588e1604",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "80/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "NGINX_VERSION=1.25.2",
                "PKG_RELEASE=1",
                "NJS_VERSION=0.8.0"
            ],
            "Cmd": [
                "nginx",
                "-g",
                "daemon off;"
            ],
            "Image": "nginx:1.25.2-alpine@sha256:4c93a3bd8bf95412889dd84213570102176b6052d88bb828eaf449c56aca55ef",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": [
                "/docker-entrypoint.sh"
            ],
            "OnBuild": null,
            "Labels": {
                "com.docker.swarm.node.id": "u76h6xar8mb5ppuqshumlqfg9",
                "com.docker.swarm.service.id": "rfdkkfpluyjk94o6q3ytf0ki5",
                "com.docker.swarm.service.name": "www",
                "com.docker.swarm.task": "",
                "com.docker.swarm.task.id": "zvmo0rdapfrgpz4qx10ja6qz4",
                "com.docker.swarm.task.name": "www.1.zvmo0rdapfrgpz4qx10ja6qz4",
                "maintainer": "NGINX Docker Maintainers <docker-maint@nginx.com>"
            },
            "StopSignal": "SIGQUIT"
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "4a132810258da8b94e002f3878affa05f57b77f682d830bbadce1e5ffe6fef55",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "80/tcp": null
            },
            "SandboxKey": "/var/run/docker/netns/4a132810258d",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "ingress": {
                    "IPAMConfig": {
                        "IPv4Address": "10.22.0.4"
                    },
                    "Links": null,
                    "Aliases": [
                        "58fd588e1604"
                    ],
                    "NetworkID": "a9rwtcuos38b17o7lxqrtndez",
                    "EndpointID": "b5fe807d5f1ba594a29f6b9ff0930365ed355479ae9358682304b426d8ed3094",
                    "Gateway": "",
                    "IPAddress": "10.22.0.4",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:0a:16:00:04",
                    "DriverOpts": null
                },
                "www-net": {
                    "IPAMConfig": {
                        "IPv4Address": "10.22.1.3"
                    },
                    "Links": null,
                    "Aliases": [
                        "58fd588e1604"
                    ],
                    "NetworkID": "1jio72n635ftl4pgvayup2vpi",
                    "EndpointID": "2ec12414857d333d68f9f6c978c01a04608c45bad474aa05cd66ed7200b552fa",
                    "Gateway": "",
                    "IPAddress": "10.22.1.3",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:0a:16:01:03",
                    "DriverOpts": null
                }
            }
        }
    }
]

Using host networking mode “works”, in that it allows me to connect. But host networking is not what I want, and I’d consider it a work-around in this case. First and foremost I’d really like swarm to work. And I’d also like to experiment with swarm mesh routing.

The official Docker doc uses

docker service create --name my_web --replicas 3 --publish published=8080,target=80 nginx
curl http://localhost:8080

Which works for me.

I doubt someone here will spend half an hour to dig into your custom Linux setup to debug it. Try installing it on regular Linux with the same swarm setup and compare service and container with to your custom Linux with docker inspect.

This is not helpful, but thanks for your time. The idea of comparing system is valid, and I’ve tried but not succeeded in finding the actual cause. Most systems come with a firewall which clutters iptables and makes it more difficult to compare. But yeah, I can give it another look.

I think @bluepuma77 wanted to say that most of us here work with Docker on supported systems so we don’t have much experience with custom Linux distributions and we certainly don’t have experience creating a custom Linux distribution. Probably the only way I could solve an issue like this is what @bluepuma77 suggested, but for that I would have to be there and maybe have someone by my side like you :slight_smile:

You can still share you thoughts here which will be valuable in the future and let’s hope someone will see your post who can give you more ideas or we will have more when you can share more details that you find out.

On second thought, there is one idea i have, but I’m pretty sure you already tried it. Using wireshark or tshark to trace network packets in containers and on the host.

Sorry if I came rude. It’s just that I really tried to be up-front about this being a problem “less about how to run Docker Swarm, and more about how to configure Linux, cgroups2, iptable”, and then when I’m linked to the docs and “works on my machine” it feel like waste of everyone’s time.

Sharing thoughts and idea is the way to go about this, I think. I haven’t tried wireshark, it’s a good idea, but since it isn’t built into Lightwhale, it’ll be … difficult. But I’ll try spinning up a supported Linux like Debian in QEMU next to Lightwhale. That way the Debian and Lightwhale instance will have the same IP addresses (as provided by QEMU), which will make comparing system a great deal easier.

I’ve noticed and I’ve really appreciated that. :+1:

1 Like

From my side I just wanted to point out that it “normally” works, so it had to be specific to your custom Linux.

And I wanted to set expectation that you probably won’t get support for your very special and niche issue here. I wanted to avoid the usual “why is no one helping here” :wink:

Now it goes really into tech details, tracing packets, checking firewall (which is usually managed by Docker). And as stated before, another “Linux wrapper” has problems with VIP, which is automatically used when opening a port not in host mode.

Just in case someone else ends up in here, and for general closure: The issue was a missing kernel module, ip_nf_mangle: Add IP_NF_MANGLE to check-config.sh by stephan-henningsen · Pull Request #46667 · moby/moby · GitHub

3 Likes