Consul DNS round robin works for host but not for containers

craig1234 · February 25, 2016, 3:54am

Ive setup consul and registrator, both seem to be working well and all my containers are registering as services. consuls DNS is bound to the docker0 bridge IP and all containers point to this IP as their DNS servers.

I have one service that contains 2 containers and because i am using the -internal switch when i start registrator both of the container IPs are registered in consuls DNS. I can ping the service name (appserver.service.consul) from the host server as Ive set it to use consuls DNS as well and i get round robin responses as expected. however the containers do not behave the same - if i docker exec -it bash into them and ping appserver.service.consul i always get the same IP returned.

Ive installed dnsutils on one of the containers so i can dig the consul DNS server and i get both A records returned - but i dont understand why i never get a round robin response - does anyone have any ideas on why round robin isnt working for my containers?

thanks

jjohnston · April 6, 2016, 2:44pm

Did you ever figure this out? We have the same problem right now and the same setup. If we use nslookup or dig we get the ip of both services. But if we use ping it only brings back one ip. This is also a container for apache (httpd) and it only serves up one ip. We do have “disablereuse=On” when using ProxyPass and that should prevent any caching from apache. Researching this it looks like the only tools in Linux that do caching is nscd and dnsmasq, but neither of those tools are in this container.

The really strange thing for us it is only happening in this container. The container extends from the httpd image and only adds some apache configuration. My plan now is to try and keep stripping out this container and see if there is something special about it, but it looks pretty stripped down as-is. The httpd also extends from the same image as our other containers (debian:jessie) so I am not very hopeful.

nathanleclaire · April 6, 2016, 5:07pm

I’d highly suggest opening an issue at Sign in to GitHub · GitHub with a detailed, minimally reproducible example, and CCing @mrjana and @mavenugo . They should be able to help you get sorted out if the problem is reproducible.

jjohnston · April 6, 2016, 6:05pm

@nathanleclaire, I will do that once I try out a few more things. It was starting to look like my global dns settings were the problem. With those settings in I could break it just by firing up a debian:jessie container and see ping always return one ip. Once I took out my global dns settings then my debian:jessie container would round robin with ping. But then firing up my httpd container it is still stuck only getting one ip (ping or apache). If I run that same container with --net=host then it round robins. I want to see what this behaves like on Redhat 7 as well. Once I have that I will throw together an example and hopefully it is reproducible. I keep checking nslookup and dig and it looks like Consul is totally fine.

craig1234 · April 6, 2016, 9:38pm

No, I ended up parking consul and registrator for now. my workaround was to use dnsmasq on the host, add this as a DNS server for the containers, and update A records whenever a container spawned.

I’ve since moved to the latest version of docker & docker compose and use network aliases and the internal docker DNS to group my services (so I’ve dropped running dnsmasq on the host). Round robin doesn’t work with that setup either but at least if a container in a service disappears the DNS is updated instantly and the other containers can find the rest of the containers in that service. At this stage thats all I need for HA but it would be perfect to have RR working out of the box then i could use that to load balance the backend. Im using haproxy to load balance the front end

jjohnston · April 6, 2016, 10:49pm

I was able to reproduce this in a simple and reproducible manner. I created an issue ticket at https://github.com/docker/docker/issues/21823.

What we are going to do if we need to is repackage our containers using the heavier ubuntu image if we need to. We have our own base images so we can get away with doing that. Hopefully the Docker team is able to reproduce the error and it makes sense.

campbech · April 12, 2016, 8:37pm

Have you tried using https://github.com/gliderlabs/connectable?

I know this is the solution gliderlabs had for doing proper load balancing across all of the service instances.

Topic		Replies	Views
Enabling DNS Round Robin General dns	2	6674	April 14, 2016
Swarm Deploy Endpoint Mode: How to turn off virtual ip (vip) and expose individual IPs via DNS _without_ round-robin? General	6	3696	February 2, 2020
Swarm is not round robin routing requests Swarm dns	5	5574	July 25, 2016
Container cannot ping another container on a diff host with consul General docker	0	2792	May 17, 2017
Consul dockerized General	2	1888	March 2, 2016

Consul DNS round robin works for host but not for containers

Related topics