Docker swarm networking hostnames and service names

I have a docker-compose file, which works as I want it to in docker-compose. I am trying to use it with swarm, but I am having some issues with the DNS side of things.

I am running swarm on Linux, Docker version “19.03.7”.

I have this compose file:

version: "3.4"

x-common-config:
  &common-config
  volumes:
    -  ./conf:/etc/hadoop/conf
    -  /home/centos/hadoop/hadoop-dist/target/hadoop-3.3.0-SNAPSHOT:/hadoop
    -  /mnt/data
  build: ../  
  image: hadoop-docker

services:
  nn:
    <<: *common-config
    environment:
      - ENSURE_NAMENODE_DIR=/mnt/data/nn/current
    ports:
      - "9870:9870"
    command: hdfs namenode

  snn:
    <<: *common-config
    environment:
      - WAITFOR=nn:8020
      - ENSURE_SECONDARY_NAMENODEDIR=/mnt/data/snn/current
    ports:
      - "9871:9870"
    command: hdfs secondarynamenode
    hostname: snamenode

  dn:
    <<: *common-config
    command: hdfs datanode  
    deploy: 
      replicas: 3

I deploy this to swarm like this:

sudo docker stack deploy --compose-file docker-compose.yml hdfstest

This works, and the containers start, but due to some Hadoop / HDFS issues, the “nn” service needs to do a reverse dns lookup on the “dn” services which connect, which is where things break down.

Connecting to the NN service, I can see its hostname, which is the container ID as expected, I can resolve that namenode and do a reverse lookup on it too:

docker exec -it 55f90c2045d5 /bin/bash
# hostname
55f90c2045d5
[root@55f90c2045d5 /]# nslookup 55f90c2045d5
Server:		127.0.0.11
Address:	127.0.0.11#53

Non-authoritative answer:
Name:	55f90c2045d5
Address: 10.0.9.11

[root@55f90c2045d5 /]# dig -x 10.0.9.11

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el8 <<>> -x 10.0.9.11
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26408
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;11.9.0.10.in-addr.arpa.		IN	PTR

;; ANSWER SECTION:
11.9.0.10.in-addr.arpa.	600	IN	PTR	hdfstest_nn.1.1haheykwhhp62markmrv6k460.hdfstest_default.

;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Thu Mar 12 16:58:55 UTC 2020
;; MSG SIZE  rcvd: 132

This all looks good. However, I have tried to make the other services connect to this host using the docker compose service name, “nn”. So checking it:

nslookup nn
Server:		127.0.0.11
Address:	127.0.0.11#53

Non-authoritative answer:
Name:	nn
Address: 10.0.9.10

[root@55f90c2045d5 /]# dig -x 10.0.9.10

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el8 <<>> -x 10.0.9.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 25276
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;10.9.0.10.in-addr.arpa.		IN	PTR

;; Query time: 8 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Thu Mar 12 17:00:23 UTC 2020
;; MSG SIZE  rcvd: 51

It does map to this same container, but with a different IP (10.0.9.10 vs 10.0.9.11) and if I try to do a reverse lookup on 10.0.9.10 it does not resolve.

Other services can connect to “nn” but then the source address seems to be an address which is not associated with any of my containers - perhaps a VIP of some sort from each of the docker-enginer hosts? As HDFS cannot do a reverse lookup on it, it does not work correctly.

Could anyone explain why the “nn” container seems to have 2 IPs - one associated with its hostname and the other associated with the docker-compose service name?

Is there something different happens with network routes if you attempt to connect to the “service name IP” rather than the one related to the hostname which seem like the request is coming from a gateway or something?

I tried another test which makes this easier to see. I recreated my environment so the IPs have changed.

The address for “nn” is “10.0.10.2”.
The address associated with the container host name is “10.0.10.3”.
The IP of the container connecting to NN is “10.0.10.10”.

If I run “nc 10.0.10.3 8020” I see the real container IP at the destination:

tcp        0      0 10.0.10.3:8020          10.0.10.10:51154        ESTABLISHED

But if I run “nc nn 8020” I see what I reckon is a gateway IP for all containers on that docker-engine host. Its not the container IP:

tcp        0      0 10.0.10.3:8020          10.0.10.7:36980         ESTABLISHED

What I want, is to be able to connect to “nn” with a defined hostname, where the NN host gets to see the real IP of the source connection and not a gateway address.

I though adding “hostname: namenode” to my docker compose file would do this, but this simply changed the hostname on that container, without creating a DNS entry for it :frowning:

By default swarm services resolve the servie name to a vip, which takes care of forwarding traffic to the cotainers.

If you switch the endpoint_mode to ‘dnsrr’ no vip is used and the ip of the container replicas (=1 if no replacas are configures)

services:
  nn:
    deploy:
      mode: replicated
      endpoint_mode: dnsrr
    ..

Thank you. That does indeed seem to work. I had eventually figured out VIPs were involved, but I had not come across the endpoint_mode yet.

Enabling dnsrr gets my service working, however it seems to stop me being able to publish ports for those services.

Do you know if I can have dnsrr and somehow publish a port to the docker hosts too?

You are right, dnsrr and ingress are incompatible:

version: '3.7'
services:
  docker-demo:
    image: ehazlett/docker-demo
    deploy:
      mode: replicated
      replicas: 3
      endpoint_mode: dnsrr
    ports:
    - target: 8080
      published: 8080
      protocol: tcp
      mode: ingress

docker stack deploy -c docker-compose.yml docker-demo

failed to create service docker-demo_docker-demo: Error response from daemon: rpc error: code = InvalidArgument desc = EndpointSpec: port published with ingress mode can’t be used with dnsrr mode

I wasn’t aware both can not be combined. Usualy I publish ports on a reverse proxy container and declare the target containers as upstream. Typicaly I use dnsrr for databases or kafka - both don’t require published ports in my use cases, though my client applications are troubled by the vip’s timeout.

Actually, I am surprised this doesn’t work. Ingress is nothing more then a globaly bound port on all nodes where traffic is forwarded to a target service and port using an overlay network… honestly nothing fency. I have no idea why this was designed to not work with dsnrr…

As Swarn seems to expect each “service” you run in the cluster to be a replicated thing, I guess it makes sense it cannot publish the port if you are in rrdns mode.

Normally it forwards a port to a VIP, and the VIP load balances across the service containers. If you are in rrdsn, it cannot simply forward the port to the VIP, it would need to forward to one of the containers directly. I guess it could do that via round robin too, but its somewhat different.

What I am trying to do, is have a service which will only ever have 1 replica, but that does not seem compatible with how the port forwarding works. I guess I could work around this using a reverse proxy container like you said and publish any ports that way.

Ultimately, I guess I would like a way to have the VIPs but also make the internal nodes communicate using the IP too - kind of like 2 services names for each service. I’m not sure that is possible though.

1 Like

Though, even dnrss for a single replica works for service task (=container) to service task communiction. The overlay network’s dns service either returns the vip or ips of the dnsrr pool, if there is only one replica dnsrr should return always the same ip. As the ingress network is an overlay network, for me this is an not understandable restriction.

In productive environments you usally have a load balancer in front of docker and want to have as less entrpoint ports as possible. Using a containerized reverse proxy is not that uncommon.

You could try to remove endpoint_mode: dnsrr and use tasks.<service-name>. for task to task communication (see: https://docs.docker.com/network/overlay/#container-discovery). I would expect it to bypass the vip communication.