Docker Community Forums

Share and learn in the Docker community.

Intermittent faults of docker-internal DNS with IPv6

Hi all,

I’m running a small docker environment on a raspberry pi and I’ve recently enabled IPv6.

For various reasons (isolation, daily changing IPv6 prefix…) I went the road of disabling the userland proxy (via daemon.json) and adding the ipv6nat container. The IPv6 addresses for the containers are chose from a randomly selected ULA prefix (fd00:).
After convincing a number of containers (node-red, influx…) to actually use IPv6, they are able to reach each other and can be reached from outside accessing only the IPv4 or IPv6 address of the host machine. DNS resolution in my home network also works as expected.

However, a somewhat unexpected problem surfaced: The node red container complains roughly every 10 minutes that it couldn’t reach the influx container due to a DNS resolution error (getaddrinfo ENOTFOUND). It tries to access it every minute, but then about 10 concurrent requests at the same time. It doesn’t fail always, but apparently only in about 1 % of the trials, randomly.

I tried to do two things to debug: Log into the container and run nslookup manually and at the same time look at the docker logs after setting it into debug mode. Here are the results:

Normal, expected case:

~ $ nslookup influx
nslookup: can't resolve '(null)': Name does not resolve
Name:      influx
Address 1: 172.18.0.5 smart-home_influx_1.smart-home_smart-home-net
Address 2: fdc0:655:b8ac:6cee::5 smart-home_influx_1.smart-home_smart-home-net

Logs look like this:

Dec 15 19:24:50 raspberrypi dockerd[22914]: time="2019-12-15T19:24:50.547353704+01:00" level=debug msg="Name To resolve: influx."
Dec 15 19:24:50 raspberrypi dockerd[22914]: time="2019-12-15T19:24:50.547496905+01:00" level=debug msg="[resolver] lookup for influx.: IP [172.18.0.5]"
Dec 15 19:24:50 raspberrypi dockerd[22914]: time="2019-12-15T19:24:50.548540662+01:00" level=debug msg="Name To resolve: influx."
Dec 15 19:24:50 raspberrypi dockerd[22914]: time="2019-12-15T19:24:50.548655456+01:00" level=debug msg="[resolver] lookup for influx.: IP [fdc0:655:b8ac:6cee::5]"
Dec 15 19:24:50 raspberrypi dockerd[22914]: time="2019-12-15T19:24:50.550023928+01:00" level=debug msg="IP To resolve 5.0.18.172"
Dec 15 19:24:50 raspberrypi dockerd[22914]: time="2019-12-15T19:24:50.550145093+01:00" level=debug msg="[resolver] lookup for IP 5.0.18.172: name smart-home_influx_1.smart-home_smart-home-net"
Dec 15 19:24:50 raspberrypi dockerd[22914]: time="2019-12-15T19:24:50.551397179+01:00" level=debug msg="IP To resolve 5.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.e.e.c.6.c.a.8.b.5.5.6.0.0.c.d.f"
Dec 15 19:24:50 raspberrypi dockerd[22914]: time="2019-12-15T19:24:50.551562583+01:00" level=debug msg="[resolver] lookup for IP 5.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.e.e.c.6.c.a.8.b.5.5.6.0.0.c.d.f: name smart-home_influx_1.smart-home_smart-home-net"

Every now and then one of the following happens, i.e. either IPv4 or IPv6 is missing:

~ $ nslookup influx
nslookup: can't resolve '(null)': Name does not resolve
Name:      influx
Address 1: 172.18.0.5 smart-home_influx_1.smart-home_smart-home-net

~ $ nslookup influx
nslookup: can't resolve '(null)': Name does not resolve
Name:      influx
Address 1: fdc0:655:b8ac:6cee::5

The logs then look like this:

Dec 15 19:23:54 raspberrypi dockerd[22914]: time="2019-12-15T19:23:54.352789971+01:00" level=debug msg="Name To resolve: influx."
Dec 15 19:23:54 raspberrypi dockerd[22914]: time="2019-12-15T19:23:54.353013948+01:00" level=debug msg="[resolver] lookup for influx.: IP [172.18.0.5]"
Dec 15 19:23:54 raspberrypi dockerd[22914]: time="2019-12-15T19:23:54.353400015+01:00" level=debug msg="Name To resolve: influx."
Dec 15 19:23:54 raspberrypi dockerd[22914]: time="2019-12-15T19:23:54.353554437+01:00" level=debug msg="[resolver] lookup for influx.: IP [fdc0:655:b8ac:6cee::5]"
Dec 15 19:23:54 raspberrypi dockerd[22914]: time="2019-12-15T19:23:54.355044500+01:00" level=debug msg="IP To resolve 5.0.18.172"
Dec 15 19:23:54 raspberrypi dockerd[22914]: time="2019-12-15T19:23:54.355546509+01:00" level=debug msg="[resolver] lookup for IP 5.0.18.172: name smart-home_influx_1.smart-home_smart-home-net"

Dec 15 19:25:24 raspberrypi dockerd[22914]: time="2019-12-15T19:25:24.075780659+01:00" level=debug msg="Name To resolve: influx."
Dec 15 19:25:24 raspberrypi dockerd[22914]: time="2019-12-15T19:25:24.076029617+01:00" level=debug msg="[resolver] lookup for influx.: IP [fdc0:655:b8ac:6cee::5]"
Dec 15 19:25:24 raspberrypi dockerd[22914]: time="2019-12-15T19:25:24.075807307+01:00" level=debug msg="Name To resolve: influx."
Dec 15 19:25:24 raspberrypi dockerd[22914]: time="2019-12-15T19:25:24.076460497+01:00" level=debug msg="[resolver] lookup for influx.: IP [172.18.0.5]"
Dec 15 19:25:24 raspberrypi dockerd[22914]: time="2019-12-15T19:25:24.077406423+01:00" level=debug msg="IP To resolve 5.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.e.e.c.6.c.a.8.b.5.5.6.0.0.c.d.f"
Dec 15 19:25:24 raspberrypi dockerd[22914]: time="2019-12-15T19:25:24.077777971+01:00" level=debug msg="[resolver] lookup for IP 5.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.e.e.c.6.c.a.8.b.5.5.6.0.0.c.d.f: name smart-home_influx_1.smart-home_smart-home-net"

Worst case it this, happening less frequently, but still in about 1 % of the cases:

~ $ nslookup influx
nslookup: can't resolve '(null)': Name does not resolve
nslookup: can't resolve 'influx': Name does not resolve

Logs:

Dec 15 19:22:27 raspberrypi dockerd[22914]: time="2019-12-15T19:22:27.002926895+01:00" level=debug msg="Name To resolve: influx."
Dec 15 19:22:27 raspberrypi dockerd[22914]: time="2019-12-15T19:22:27.005502807+01:00" level=debug msg="[resolver] lookup for influx.: IP [172.18.0.5]"
Dec 15 19:22:27 raspberrypi dockerd[22914]: time="2019-12-15T19:22:27.003265666+01:00" level=debug msg="Name To resolve: influx."
Dec 15 19:22:27 raspberrypi dockerd[22914]: time="2019-12-15T19:22:27.005844967+01:00" level=debug msg="[resolver] lookup for influx.: IP [fdc0:655:b8ac:6cee::5]"

Anyone any idea what is going on? And if not, any idea where I can look or what I can try to get more information? To me it feels like a race condition or something like that.

Cheers,
thewaldschrat

Hi,

I’m seeing the exact same issue in my setup, I’m using globally routeable ipv6 adresses (using ndpd on my host).

I also see the dns lookup switching between IPv4 and IPv6, or failing altogether

/usr/src/app # for i in 1 2 3 4 5 6 7 8; do ping -c 1 cerebro-http; sleep 2; don
e
PING cerebro-http (172.19.0.7): 56 data bytes
64 bytes from 172.19.0.7: seq=0 ttl=64 time=0.071 ms

--- cerebro-http ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.071/0.071/0.071 ms
PING cerebro-http (172.19.0.7): 56 data bytes
64 bytes from 172.19.0.7: seq=0 ttl=64 time=0.089 ms

--- cerebro-http ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.089/0.089/0.089 ms
PING cerebro-http (2xxx:xxxx:xxxx:18:ffff:2:0:ff02): 56 data bytes
64 bytes from 2xxx:xxxx:xxxx:18:ffff:2:0:ff02: seq=0 ttl=64 time=0.057 ms

--- cerebro-http ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.057/0.057/0.057 ms
PING cerebro-http (172.19.0.7): 56 data bytes
64 bytes from 172.19.0.7: seq=0 ttl=64 time=0.093 ms

--- cerebro-http ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.093/0.093/0.093 ms
PING cerebro-http (2xxx:xxxx:xxxx:18:ffff:2:0:ff02): 56 data bytes
64 bytes from 2xxx:xxxx:xxxx:18:ffff:2:0:ff02: seq=0 ttl=64 time=0.087 ms

--- cerebro-http ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.087/0.087/0.087 ms
ping: bad address 'cerebro-http'
PING cerebro-http (172.19.0.7): 56 data bytes
64 bytes from 172.19.0.7: seq=0 ttl=64 time=0.326 ms

--- cerebro-http ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.326/0.326/0.326 ms
PING cerebro-http (172.19.0.7): 56 data bytes
64 bytes from 172.19.0.7: seq=0 ttl=64 time=0.096 ms

Using Docker version 18.09.6, build 481bc77156 and docker-compose version 1.24.1, build 4667896b

But it doesn’t seem to interest anyone. I even made an issue on Github in the repository that contains the offending code: https://github.com/docker/libnetwork/issues/2492
But that too wasn’t even acknowledged by anyone even remotely responsible. Maybe it all fell into the cracks when they sold their business.