Swarm web service not available to External Load Balancer

lepe · June 16, 2022, 3:04am

I’m relatively new to docker swarm. I have a swarm setup with one manager and two workers. Accessing directly with workers IP address, works fine, but when using a load balancer service (provided by my cloud service), it seems it reaches the workers but it can’t connect to port. This is my setup (Public IPs were replaced):

LOAD BALANCER  :  
    Public IP: 130.0.0.130
    Private IP: 172.16.1.1
WORKER1 :
    Public IP: 130.0.0.140
    Private IP (LB): 172.16.1.2
    Private IP (Docker): 10.0.0.10
WORKER2 :
    Public IP: 130.0.0.150
    Private IP (LB): 172.16.1.3
    Private IP (Docker): 10.0.0.11
MANAGER:
    Public IP: 130.0.0.160
    Private IP (Docker): 10.0.0.5

I have 2 stacks running:

1. HAProxy:

version: '3.2'

services:
  ha:
    image: haproxytech/haproxy-debian:2.7
    ports:
      - published: 80
        target: 80
        protocol: tcp
        mode: host
    volumes:
      - type: bind
        source: "/etc/haproxy/"
        target: "/etc/haproxy/"
        read_only: true
    networks:
      - hanet
    dns:
      - 127.0.0.1
    deploy:
      mode: global
      placement:
        constraints: [node.role==worker]

networks:
  hanet:
    driver: overlay

haproxy.cfg:

global
    log          fd@2 local2
    chroot       /var/lib/haproxy
    pidfile      /var/run/haproxy.pid
    maxconn      4000
    user         haproxy
    group        haproxy
    stats socket /var/lib/haproxy/stats expose-fd listeners
    master-worker

resolvers docker
    nameserver dns1 127.0.0.11:53
    resolve_retries 3
    timeout resolve 1s
    timeout retry   1s
    hold other      10s
    hold refused    10s
    hold nx         10s
    hold timeout    10s
    hold valid      10s
    hold obsolete   10s

defaults
    timeout connect 10s
    timeout client 30s
    timeout server 30s
    log global
    mode http
    option httplog

frontend  fe_web
    bind *:80
    default_backend be_service 

backend be_service
    balance roundrobin
    server-template wp- 2 site_wordpress:80 check resolvers docker init-addr libc,none

2. Wordpress

version: '3.2'
    
services:
  wordpress:
    image: wordpress:6.0.0-apache
    volumes:
      - type: bind
        source: "/var/www/"
        target: "/var/www/html/wp-content/"
    networks:
      - hanet
    deploy:
      mode: replicated
      replicas: 1
      endpoint_mode: dnsrr
      placement:
        constraints: [node.role==worker]

networks:
    hanet:
      external:
        name: haproxy_hanet

General checks:

Nodes are up and running
Services are working fine
Even if 1 instance of wordpress is running, you can access it correctly in any of the workers public IP addresses (HAProxy is working without issues)
Firewall in each worker is disabled (for testing)
Ping works from any worker to the load balancer
Docker networks:

NETWORK ID     NAME                  DRIVER    SCOPE
8j3slzxfgoqd   agent_agent_network   overlay   swarm
fb3a56f97fd5   bridge                bridge    local
04f5faf70051   docker_gwbridge       bridge    local
weijdpq2bpcj   haproxy_hanet         overlay   swarm
7462a6a83e27   host                  host      local
s6fx2m3hh5ba   ingress               overlay   swarm
7b7ed9c3c89c   none                  null      local

Interfaces (workers):

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:59:db:13 brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    inet 130.0.0.140/24 brd 130.0.0.255 scope global ens3
       valid_lft forever preferred_lft forever
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:43:c8:ac brd ff:ff:ff:ff:ff:ff
    altname enp0s4
    inet 10.0.0.10/24 brd 10.0.0.255 scope global ens4
       valid_lft forever preferred_lft forever
4: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:3f:40:83 brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    inet 172.16.1.2/24 brd 172.16.1.255 scope global ens5
       valid_lft forever preferred_lft forever
5: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:d1:04:00:7e brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global docker_gwbridge
       valid_lft forever preferred_lft forever
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:0b:3e:b2:0f brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
# and some other virtual interfaces ...

Netstat shows the port is bind to all interfaces:

root@worker1: netstat -na | grep "LISTEN "
tcp        0      0 0.0.0.0:9001            0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:34175         0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN ***
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN
tcp6       0      0 :::9001                 :::*                    LISTEN
tcp6       0      0 :::111                  :::*                    LISTEN
tcp6       0      0 :::80                   :::*                    LISTEN
tcp6       0      0 :::22                   :::*                    LISTEN
tcp6       0      0 :::7946                 :::*                    LISTEN

Nmap reports open from manager (same for worker2):

root@manager:  nmap -p80 worker1
Starting Nmap 7.80 ( https://nmap.org ) at 2022-06-16 02:39 UTC
Nmap scan report for worker1 (127.0.1.1)
Host is up (0.00014s latency).

PORT   STATE SERVICE
80/tcp open  http

Nmap done: 1 IP address (1 host up) scanned in 0.10 seconds

Nmap shows filtered when done from inside the workers node (except for 127.0.0.1):

root@worker1: nmap -p80 130.0.0.140
80/tcp filtered http
root@worker1: nmap -p80 10.0.0.10
80/tcp filtered http
root@worker1: nmap -p80 172.16.1.2
80/tcp filtered http
root@worker1: nmap -p80 172.18.0.1
80/tcp filtered http
root@worker1: nmap -p80 172.17.0.1
80/tcp filtered http

root@worker1: nmap -p80 127.0.0.1
80/tcp open http

tcpdump` in workers displays the connection reaching them when accessing the load balancer public IP address (130.0.0.130):

root@worker1: tcpdump -i ens5 -vvv
02:25:59.405938 IP (tos 0x0, ttl 55, id 18362, offset 0, flags [DF], proto TCP (6), length 60)
    static-222-222-111-111.b-fam.host.example.com.49208 > 172.16.1.3.http: Flags [S], cksum 0x7e2d (correct), seq 574746115, win 64240, options [mss 1414,sackOK,TS val 3130339086 ecr 0,nop,wscale 7], length 0
02:25:59.405938 IP (tos 0x0, ttl 55, id 29373, offset 0, flags [DF], proto TCP (6), length 60)
    static-222-222-111-111.b-fam.host.example.com.49206 > worker1.http: Flags [S], cksum 0x4a6b (correct), seq 947981193, win 64240, options [mss 1414,sackOK,TS val 3130339086 ecr 0,nop,wscale 7], length 0
02:25:59.409034 IP (tos 0x0, ttl 55, id 18362, offset 0, flags [DF], proto TCP (6), length 60)
02:25:59.539267 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has worker1 tell 172.16.1.1, length 28
02:25:59.539290 ARP, Ethernet (len 6), IPv4 (len 4), Reply worker1 is-at fa:16:3e:3f:40:83 (oui Unknown), length 28

It seems to me that both workers were contacted in port 80 by the load balancer, but they didn’t get any packet (only flag S it is shown).

Questions:

What do I need to do to be able to access the site using the load balancer public address?
It is normal that a stack service shows ‘filtered’ in any address except 127.0.0.1 ?

Thank you

lepe · June 22, 2022, 12:54am

I fixed the issue. The load balancer required specific network setup in order to be able to forward the packages.

Topic		Replies	Views
Swarm mode not load balancing General	3	4522	November 14, 2016
Failed to find a load balancer IP to use for network:Ingress General	1	2523	January 23, 2018
Docker Swarm Ingress Network Load Balancing Issue Swarm	4	1734	September 22, 2020
Load balancer works on CE version? General	15	2327	August 29, 2017
Docker Swarm Mode network and load balancing doesn't work for a service in worker node Swarm	2	2654	November 6, 2017