I have a docker-compose file, which works as I want it to in docker-compose. I am trying to use it with swarm, but I am having some issues with the DNS side of things.
I am running swarm on Linux, Docker version “19.03.7”.
I have this compose file:
version: "3.4"
x-common-config:
&common-config
volumes:
- ./conf:/etc/hadoop/conf
- /home/centos/hadoop/hadoop-dist/target/hadoop-3.3.0-SNAPSHOT:/hadoop
- /mnt/data
build: ../
image: hadoop-docker
services:
nn:
<<: *common-config
environment:
- ENSURE_NAMENODE_DIR=/mnt/data/nn/current
ports:
- "9870:9870"
command: hdfs namenode
snn:
<<: *common-config
environment:
- WAITFOR=nn:8020
- ENSURE_SECONDARY_NAMENODEDIR=/mnt/data/snn/current
ports:
- "9871:9870"
command: hdfs secondarynamenode
hostname: snamenode
dn:
<<: *common-config
command: hdfs datanode
deploy:
replicas: 3
I deploy this to swarm like this:
sudo docker stack deploy --compose-file docker-compose.yml hdfstest
This works, and the containers start, but due to some Hadoop / HDFS issues, the “nn” service needs to do a reverse dns lookup on the “dn” services which connect, which is where things break down.
Connecting to the NN service, I can see its hostname, which is the container ID as expected, I can resolve that namenode and do a reverse lookup on it too:
docker exec -it 55f90c2045d5 /bin/bash
# hostname
55f90c2045d5
[root@55f90c2045d5 /]# nslookup 55f90c2045d5
Server: 127.0.0.11
Address: 127.0.0.11#53
Non-authoritative answer:
Name: 55f90c2045d5
Address: 10.0.9.11
[root@55f90c2045d5 /]# dig -x 10.0.9.11
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el8 <<>> -x 10.0.9.11
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26408
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;11.9.0.10.in-addr.arpa. IN PTR
;; ANSWER SECTION:
11.9.0.10.in-addr.arpa. 600 IN PTR hdfstest_nn.1.1haheykwhhp62markmrv6k460.hdfstest_default.
;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Thu Mar 12 16:58:55 UTC 2020
;; MSG SIZE rcvd: 132
This all looks good. However, I have tried to make the other services connect to this host using the docker compose service name, “nn”. So checking it:
nslookup nn
Server: 127.0.0.11
Address: 127.0.0.11#53
Non-authoritative answer:
Name: nn
Address: 10.0.9.10
[root@55f90c2045d5 /]# dig -x 10.0.9.10
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el8 <<>> -x 10.0.9.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 25276
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;10.9.0.10.in-addr.arpa. IN PTR
;; Query time: 8 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Thu Mar 12 17:00:23 UTC 2020
;; MSG SIZE rcvd: 51
It does map to this same container, but with a different IP (10.0.9.10 vs 10.0.9.11) and if I try to do a reverse lookup on 10.0.9.10 it does not resolve.
Other services can connect to “nn” but then the source address seems to be an address which is not associated with any of my containers - perhaps a VIP of some sort from each of the docker-enginer hosts? As HDFS cannot do a reverse lookup on it, it does not work correctly.
Could anyone explain why the “nn” container seems to have 2 IPs - one associated with its hostname and the other associated with the docker-compose service name?
Is there something different happens with network routes if you attempt to connect to the “service name IP” rather than the one related to the hostname which seem like the request is coming from a gateway or something?