I have been trying to figure out this issue for a while and can’t pinpoint the actual cause. This started from trying to run HDFS in docker swarm on AWS EC2. The services start up correctly and the logs show that HDFS datanode is able to get the correct container IP but when it tries to connect to the namenode the request contains an IP that is in the subnet but the host part is wrong. I also noticed when doing netstat on the name node all the connections from datanodes show the same Foreign address with ec2.internal prostfix
[root@hdfs-namenode /]# netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 hdfs-namenod:cslistener ip-10-0-255-10.ec:54714 ESTABLISHED
tcp 0 0 hdfs-namenode:ssh ip-10-0-255-10.ec:53178 ESTABLISHED
tcp 0 0 hdfs-namenode:ssh ip-10-0-255-10.ec:34606 ESTABLISHED
tcp 0 0 hdfs-namenod:cslistener ip-10-0-255-10.ec:54548 ESTABLISHED
tcp 0 79 hdfs-namenod:cslistener ip-10-0-255-10.ec:59888 ESTABLISHED
Active UNIX domain sockets (w/o servers)
The swam overlay network subnet is 10.0.255.0/24. If I change the subnet the IP address is always has the right subnet but different last octet.
If you check the nestat output the ssh connection is coming from different containers on the same host.
When I add another ec2 instance as a worker I see the same issue. The datanodes that connect to the name node would all have the same IP address. It would be the same subnet and a different last octet from the manager node. It seems like there is NATing on the host network but I can’t find any information about it. This happens only in swarm mode and on AWS EC2. The swarm network works correctly when I try running it on my mac.
I was looking into /etc/resolv.conf
[root@hdfs-namenode /]# cat /etc/resolv.conf
search ec2.internal
nameserver 127.0.0.11
options ndots:0
I am not sure if this configuration would effect the DNS resolution.
netstat -n shows the same Foreign Address but it is not correct
root@hdfs-namenode /]# netstat -n
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 10.0.255.3:9000 10.0.255.10:54714 ESTABLISHED
tcp 0 0 10.0.255.3:22 10.0.255.10:53178 ESTABLISHED
tcp 0 0 10.0.255.3:22 10.0.255.10:34606 ESTABLISHED
tcp 0 0 10.0.255.3:9000 10.0.255.10:54548 ESTABLISHED
tcp 0 0 10.0.255.3:9000 10.0.255.10:59888 ESTABLISHED
Docker version is
Server Version: 19.03.1
Storage Driver: overlay2
Hopefully this is the right forum to post this question. Any advice would be greatly appreciated.