CURL request times out, but service can ping in between swarm services on different nodes

Hi,
I am experiencing a strange behavior with my swarm services.

I’m using a swarm of 2 nodes, to which I am deploying some service on one node (n1) and some on another node (n2). I chose to do so because I know that some services will require more resources and some less, thus placing them on specific node would allow me to balance it. It is to note that the nodes are virtual machines.

I deploy a stack using the following composer :

version: '3.7'

networks:
  test_nodes_intercomm_net:
    driver: overlay

services:
  serviceN1_ng:
    image: nginx
    ports:
      - 8889:80
    deploy:
      placement:
        constraints:
          - node.labels.cloud == true
    networks:
     - test_nodes_intercomm_net
    logging:
      driver: json-file
    labels:
      - "stack=nodes_intercomm"
    restart: always

  serviceN2_ng:
    image: nginx
    expose:
      - "80"
    deploy:
      placement:
        constraints:
          - node.labels.stream == true
    networks:
     - test_nodes_intercomm_net
    logging:
      driver: json-file
    labels:
      - "stack=nodes_intercomm"
    restart: always

The file above is a reduced representation of the real file, because I cannot share the original, but the issues are the same.

I can confirm that the services are created on the proper node with the command: docker service ps <serviceID>

The problem is that service on different nodes cannot seem to share data and requests.
I can ping in between them:

# From n2 to n1:
root@aa53baafe938:/# ping serviceN1_ng
PING serviceN1_ng (10.0.4.2): 56 data bytes
64 bytes from 10.0.4.2: icmp_seq=0 ttl=64 time=0.724 ms
64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=0.175 ms
64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.187 ms

# From n1 to n1 
root@844afa392516:/# ping serviceN2_ng
PING serviceN2_ng (10.0.4.5): 56 data bytes
64 bytes from 10.0.4.5: icmp_seq=0 ttl=64 time=1.078 ms
64 bytes from 10.0.4.5: icmp_seq=1 ttl=64 time=0.187 ms
64 bytes from 10.0.4.5: icmp_seq=2 ttl=64 time=0.138 ms
64 bytes from 10.0.4.5: icmp_seq=3 ttl=64 time=0.157 ms

And I can nslookup in between them:

# From n2 to n1
root@aa53baafe938:/# nslookup serviceN1_ng
Server:         127.0.0.11
Address:        127.0.0.11#53

Non-authoritative answer:
Name:   serviceN1_ng
Address: 10.0.4.2

# From n1 to n2
root@844afa392516:/# nslookup serviceN2_ng
Server:         127.0.0.11
Address:        127.0.0.11#53

Non-authoritative answer:
Name:   serviceN2_ng
Address: 10.0.4.5

But when I CURL, it reaches a timeout:

# From n2 to n1
curl http://serviceN1_ng:80/

However, when I curl from the container, using the real IP of the VM, with the proper port, I get a response.

And I don’t understand why?
All the suggested ports are open and they seem to be able to communicate (ping and nslookup).
However, a CURL request using the service name, nor VIP is not working.

With my real setup, that uses nginx with upstream and reverse proxies I have the same issue. My reverse proxy request do not reach and timeout.
A difference is that my personal image are build and manually loaded on the second node. But even with images pulled from public registry, I can’t seem to make them interconnect.

Any idea?

Usually it’s a MTU size issue with the overlay network. Any VLAN in use? Simply try ping with a payload of 1500 bytes.

Hi,

I’ll try pinging with a payload and see what happens.

However, I’ve done the same test on my personal setup, and it works, I can CURL between the Nginx containers just fine.

At the time I was writting the the post, I forgot that the VMs are behind a virtual network I don’t control, so I will start investigating there. It could be a restrictive firewall rule.

Thanks!

Hi,

I re-did the tests today and tried the ping with a payload. The ping test past, but not he CURL.
I believing it is something with the virtual network the machines are in.