I have a stack that contains 15 services. There’s a Mongo database (db), a message queue (msgs), a proxy (proxy) wit the rest being Node.js based services. The Node services are all based off the official Node.js 4.5 Dockerfile and none of the services is linked - I just rely on the built-in networking features with the exposed ports for each.
And it’s 95% working. From each container, all of the other service names are properly resolving: except one. For example, from a container in the “auth” service, I can ping any other service, the db, etc and get something like the following:
# ping event PING scope.xxx.dockerapp.io (10.7.0.19): 56 data bytes 64 bytes from 10.7.0.19: icmp_seq=0 ttl=64 time=3.300 ms 64 bytes from 10.7.0.19: icmp_seq=1 ttl=64 time=1.341 ms ...
And this is the same for all of the services except for one named “docs”. When I ping docs, I get:
# ping docs PING docs (127.0.53.53): 56 data bytes 64 bytes from 127.0.53.53: icmp_seq=0 ttl=64 time=0.068 ms 64 bytes from 127.0.53.53: icmp_seq=1 ttl=64 time=0.072 ms
So it appears it can see something called “docs” but if I try to do something like:
It won’t work. It’s just this one service and, other than the name, it is built and deployed in the same way as the others. Not sure what else to try so looking for some suggestions/help.
Update 1: If I create a new service from the same repository and just call it something else (e.g. “cdoc”), it resolves to the public dns name just fine. It really seems like it’s only the name “docs” that causes an issue.
Update 2: The ip address 127.0.53.53 seemed odd. Networking isn’t really my area but this apparently means that there is a namespace collision. This is all hosted on AWS and if I ssh into one of the actual nodes and then:
I get the same thing. So maybe it’s something specific to AWS?
Update 3: After a bit more research, I think I have some idea of what’s going on, but I don’t pretend to have a deep understanding. Looks like some kind of DNS leakage where an internal name (“docs” in this case) ends up colliding with the gTLD. More info here:
Turns out that a bunch of these new TLD entries cause a problem depending perhaps on the Linux distro. The Docker AMI for AWS uses Ubuntu. There are a number of TLDs that you can ping that return 127.0.53.53 rather than unknown host. Trying to find away to prevent this leakage. Otherwise, I’ll likely need to rename our service to something that doesn’t collide with anything.
Update #4: After some more reading and experimenting, it could be related to the issue outlined in this other forum thread:
In there, they recommended removing the ndots:0 option from resolv.conf and that does seem to help. Just not sure how to “officially” inject this into my stack or service configurations.