Understanding VIPs with overlay networks in swarm mode

I am setting up my first swarm installation and I am having trouble understanding how container to container communications works with overlay networks in swam mode.

I have created an overlay network as below

host$ docker network create -d overlay --attachable t2_proxy

And I have deployed with host$ docker stack deploy -c docker-compose.yaml test the following compose file:

version: "3.8"

networks:
  t2_proxy:
    external:
      name: t2_proxy

services:
  whoami:
    image: "traefik/whoami"
    networks:
      - t2_proxy
  alpine:
    image: alpine:latest
    networks:
      - t2_proxy
    command: [ tail, '-f', '/dev/null' ]

And launched a shell into alpine host$ docker exec -it $(docker ps | grep test_alpine | awk '{print $1}') ash

I want to be able to curl http://whoami, but I am getting connection refused

/ container# curl -v http://whoami
*   Trying 10.0.1.27:80...
* connect to 10.0.1.27 port 80 failed: Connection refused
* Failed to connect to whoami port 80 after 2 ms: Couldn't connect to server
* Closing connection 0
curl: (7) Failed to connect to whoami port 80 after 2 ms: Couldn't connect to serve

If I check the whoami containers IP address, it is not 10.0.1.27, but:

host$ docker inspect $(docker ps | grep test_whoami | awk '{print $1}') | jq -r '.[0].NetworkSettings.Networks.t2_proxy.IPAddress'
10.0.1.28

If I attempt to use the IP address it works fine

/ # curl http://10.0.1.28
Hostname: 200f1efe2677
IP: 127.0.0.1
IP: 10.0.1.28
IP: 172.18.0.3
RemoteAddr: 10.0.1.31:46892
GET / HTTP/1.1
Host: 10.0.1.28
User-Agent: curl/7.87.0
Accept: */*

So I assume this is something to do with the default --endpoint-mode of vip
And I can confirm this if I add endpoint_mode as described in bypass-the-routing-mesh

services:
  whoami:
    deploy:
      endpoint_mode: dnsrr

While it works, it feels like I am working around a basic understanding of how this system works.

What am I missing?

With endpoint_mode:vip a virtual ip is used for the service, it will balance incoming connections in a round robbin way amongst all replicas of the service. The Each user defined network provides a dns-based service discovery, which allows using the service name to communicate with other services. The service name will resolve to the single a-record of the vip.

The swarm scheduler creates tasks, which then create the final container. If the container of a task is stopped and respawned, it will create a new container which will have a different ip. The service vip on the other hand should remain stable. Still it’s a antipattern to communicate to them using their ip.

endpoint_mode: vip is not suited for long connections, for instance if you have a database connection pool which get kept open in idle state for a long time, as a vip connection times out after 900 second. In those cases endpoint_mode: dnsrr is required. With endpoint_mode: dnsrr the service name will be resolved to a multivalue a-record.

Even if endpoint_mode: vip is used, you can use tasks.{servicename} to resolve a multivalue a-record.

If the response is not what you where looking for: please phrase an exact question, as “what am I missing” is quite ambiguous :slight_smile:

Yup, it is, apologies, let me be a little more specific.

From configure-service-discovery

You don’t need to publish ports which are used between services on the same network. For instance, if you have a WordPress service that stores its data in a MySQL service, and they are connected to the same overlay network, you do not need to publish the MySQL port to the client, only the WordPress HTTP port.

I took this to mean that I should not need to have any additional configuration to makes these things work. In my example, whoami is the MySQL container and alpine the wordpress container, they are on the same network, yet I cannot make a connection to whoami from from alpine without going behind the scenes and use the container IP, which I agree, is an anti-pattern.

My assumption was that I should be able to make this connection.

It can be seen below that whoami has 80/tcp enabled.

$ docker ps
CONTAINER ID   IMAGE                   COMMAND               CREATED          STATUS          PORTS     NAMES
0848ba6b30d0   traefik/whoami:latest   "/whoami"             30 minutes ago   Up 30 minutes   80/tcp    test_whoami.1.57baq4bu9nr34ar8e3qkm6876
b05c68b0feec   alpine:latest           "tail -f /dev/null"   30 minutes ago   Up 30 minutes             test_alpine.1.yn7bbfh68ks4t66nbr2sf3z8a

So it would be my assumption that the VIP that fronts the containers, and as such should be load balancing port 80

But if I inspect the service, I can see the VIP defined (IP Addresses has changed as I have restarted the stack), but there are no ports being forwarded.

$ docker service ls
ID             NAME          MODE         REPLICAS   IMAGE                   PORTS
jr2mhgb9nzp9   test_alpine   replicated   1/1        alpine:latest
u9shqwufyuow   test_whoami   replicated   1/1        traefik/whoami:latest

$ docker service inspect u9shqwufyuow | jq '.[0].Endpoint'
{
  "Spec": {
    "Mode": "vip"
  },
  "VirtualIPs": [
    {
      "NetworkID": "w6k3s5sb93l4yb1ekii1hmvn5",
      "Addr": "10.0.1.35/24"
    }
  ]
}

This is where I think there is a gap in my knowledge.

Coming back to this statement from the documentation:

You don’t need to publish ports which are used between services on the same network.

This doesn’t seem to be the case with my setup.

How do I get the VIP to forward port 80 to whoami so I can make connections to it from another container on the same network?

So this has something to do with the platform I am trying to run it on, OpenWRT on an arm64 platform (RPi CM4). I created the same setup on ubuntu on both an amd64 EC2 instance and an older RPi3 and it worked.

Does the VIP rely on any kernel modules? I had to install the vxlan module to allow it to attach to the overlay network.

is there any documentation about how the VIPs are setup and work?

Of course it does.

Here is a link to a description to the config check: Verify your Linux Kernel for Container Compatibility · Docker Pirates ARMed with explosive stuff

Ok, that is super awesome.

info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: missing
- CONFIG_CGROUP_FREEZER: missing
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: missing
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_NETFILTER_XT_MARK: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_CGROUP_BPF: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_SECCOMP_FILTER: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: missing
- CONFIG_CGROUP_HUGETLB: missing
- CONFIG_NET_CLS_CGROUP: missing
- CONFIG_CGROUP_NET_PRIO: missing
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_NF_TARGET_REDIRECT: missing
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_SECURITY_SELINUX: missing
- CONFIG_SECURITY_APPARMOR: missing
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: missing
- CONFIG_EXT4_FS_SECURITY: missing
    enable these ext4 configs if you are using ext3 or ext4 as backing filesystem
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
    - CONFIG_BRIDGE_VLAN_FILTERING: enabled
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
  - "ipvlan":
    - CONFIG_IPVLAN: enabled (as module)
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: enabled (as module)
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_FTP: enabled (as module)
    - CONFIG_NF_NAT_TFTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: missing
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: missing
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled (as module)
    - CONFIG_DM_THIN_PROVISIONING: missing
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

Missing on OpenWRT

- CONFIG_CGROUP_DEVICE: missing
- CONFIG_CGROUP_FREEZER: missing
- CONFIG_IP_NF_TARGET_MASQUERADE: missing
- CONFIG_CGROUP_PERF: missing
- CONFIG_CGROUP_HUGETLB: missing
- CONFIG_NET_CLS_CGROUP: missing
- CONFIG_CGROUP_NET_PRIO: missing
- CONFIG_IP_NF_TARGET_REDIRECT: missing
- CONFIG_SECURITY_SELINUX: missing
- CONFIG_SECURITY_APPARMOR: missing
- CONFIG_EXT4_FS_POSIX_ACL: missing
- CONFIG_EXT4_FS_SECURITY: missing
    - CONFIG_AUFS_FS: missing
    - CONFIG_BTRFS_FS_POSIX_ACL: missing
    - CONFIG_DM_THIN_PROVISIONING: missing
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Missing on ubuntu

- CONFIG_RT_GROUP_SCHED: missing
    - CONFIG_AUFS_FS: missing
    - zfs command: missing
    - zpool command: missing

CONFIG_IP_NF_TARGET_MASQUERADE and CONFIG_IP_NF_TARGET_REDIRECT seem related.

The missing modules on Ubuntu look fine, they are Storage driver related and irrelvant (unless you use those Storage Drivers, AUFS is deprecated in favor of Overlay2).The Realtime Group Scheduler is not required either.

The CONFIG_IP_NF_* modules seem like good candidates for OpenWRT.

Though, there is no official release for OpenWRT, so it is high likely they use a custom-made docker engine, which may behave like vanilla docker in some areas and will not in others. Not every docker distribution behaves like the official docker releases from Docker itself.

Yup, agree.

Just found the following in the logs:

Wed Jan 25 19:54:07 2023 daemon.info modprobe: ip_vs is already loaded
Wed Jan 25 19:54:07 2023 daemon.err dockerd[31859]: time="2023-01-25T19:54:07.424604476+11:00" level=debug msg="Creating service for vip 10.0.1.2 fwMark 256 ingressPorts libnetwork.portConfigs(nil) in sbox lb_0llo (lb-t2_p)"
Wed Jan 25 19:54:07 2023 daemon.err dockerd[31859]: time="2023-01-25T19:54:07+11:00" level=error msg="set up rule failed, [-t nat -A POSTROUTING -m ipvs --ipvs -d 10.0.1.0/24 -j SNAT --to-source 10.0.1.4]:  (iptables failed: iptables --wait -t nat -A POSTROUTING -m ipvs --ipvs -d 10.0.1.0/24 -j SNAT --to-source 10.0.1.4: iptables v1.8.7 (legacy): Couldn't load match `ipvs':No such file or directory\n\nTry `iptables -h' or 'iptables --help' for more information.\n (exit status 2))"
Wed Jan 25 19:54:07 2023 daemon.err dockerd[31859]: time="2023-01-25T19:54:07.516665626+11:00" level=error msg="Failed to add firewall mark rule in sbox lb_0llo (lb-t2_p): reexec failed: exit status 8"

At least I have something to go on now.

Thanks for your time on this.

This log snippet looks like a winner :slight_smile: