DNS name resolution is unstable and fails or works within minutes

Hello everyone.

I have a “Ubuntu 22.04.5 LTS” machine with Docker version 26.1.3.

The simplified container setup is:

  • I have a Ubuntu server, say “lxm1234.my.company” with IP “10.198.140.203”
  • a certificate for this machine, say “myserver.my.company”
  • “traefik” container (version 3.3.6 ) for routing HTTPS requests
  • “wildfly” container (version 26.1.3.Final) for REST requests
  • “client” container with “cURL” installed
  • all containers use the same custom bridge network, using default setting (I do not configure anything but the name)

From my Windows machine, it is always possible to make a GET like this:
https://myserver.my.company/myapp/restservice
The call is routed from “traefik” to “wildfly” and the response is a “200” with a JSON.

When I do the same call from the “client” container, in the past this was working without a problem.
Since some days it is not allways possible.

The behaviour is strange in the sense that it works for a couple of minutes, then the request fails for some minutes, and then works again. This alternation keeps going all day.

If the “client” works fine, a “nslookup” from with the container returns the correct IP address.

nslookup myserver.my.company
Server:         127.0.0.11
Address:        127.0.0.11:53

myserver.my.company canonical name = lxm1234.my.company

Non-authoritative answer:
myserver.my.company canonical name = lxm1234.my.company
Name:   lxm1234.my.company
Address: 10.198.140.203

A “ping” works fine as well.

ping -c 1 myserver.my.company
PING myserver.my.company (10.198.140.203): 56 data bytes
64 bytes from 10.198.140.203: seq=0 ttl=64 time=0.302 ms

If the problem occurs, the “nslookup” cannot resolve the real IP address, but “127.0.1.1”.

nslookup myserver.my.company
Server:         127.0.0.11
Address:        127.0.0.11:53

myserver.my.company canonical name = lxm1234.my.company
Name:   lxm1234.my.company
Address: 127.0.1.1

myserver.my.company canonical name = lxm1234.my.company

“ping” uses this wrong IP.

ping -c 1 myserver.my.company
PING myserver.my.company (127.0.1.1): 56 data bytes
64 bytes from 127.0.1.1: seq=0 ttl=64 time=0.058 ms

As a consequence, the “cURL” fails:

curl -v -k https://myserver.my.company/myapp/restservice
* Host myserver.my.company:443 was resolved.
* IPv6: (none)
* IPv4: 127.0.1.1
*   Trying 127.0.1.1:443...
* connect to 127.0.1.1 port 443 from 127.0.0.1 port 48358 failed: Connection refused
* Failed to connect to myserver.my.company port 443 after 2 ms: Could not connect to server
* closing connection #0
curl: (7) Failed to connect to myserver.my.company port 443 after 2 ms: Could not connect to server

Since I can always reach my REST service when the request originates from “outside the Docker network” for me it seems to be a problem with the DNS resolution within the Docker network.

What I have checked over time:

  • on the server “iptables --list”, they do not change
  • in the “client” container “/etc/resolv.conf”, it does not change
nameserver 127.0.0.11
search my.company
options edns0 trust-ad ndots:0
# Based on host file: '/etc/resolv.conf' (internal resolver)

On the server, in “/etc/docker/daemon.json” debugging is enabled.
Checking with “journalctl -u docker.service” I cannot see errors, but I can see entries with timestamps that match to my “works” and “fails” behaviour.

If it works:

May 08 16:13:28 lxm1234 dockerd[1374]: time="2025-05-08T16:13:28.726511615+02:00" level=debug msg="[resolver] forwarding query" client-addr="udp:127.0.0.1:51656" dns-server="udp:127.0.0.53:53" question=";myserver.my.company.\tIN\t A" spanID=8e3d456ec4ec5dc5 traceID=f6ff48e5fb33f62034fd41754093eec5

If it fails:

May 08 16:13:28 lxm1234 dockerd[1374]: time="2025-05-08T16:13:28.726923632+02:00" level=debug msg="[resolver] received A record \"127.0.1.1\" for \"lxm1234.my.company.\" from udp:127.0.0.53" spanID=191d7e681bacc8b1 traceID=f6ff48e5fb33f62034fd41754093eec5

I cannot understand what the problem is, since everything worked fine in the past.
It would be great to geht a hint what I can check to find the cause for this behaviour.

IT seems there is something wrong with your DNS setting. Maybe you already fixed it but Docker was not restarted since then or the containers were not recreated sicne then.

Have you configured multiple DNS servers either in /etc/docker/daemon.json or iwhen you started the containers, or anywhere on the host?

I can imagine that the host either has a different DNS setting or it handles multiple DNS servers differently. If you for example also set 127.0.0.1 somewhere that one of the DNS servers return, that could cause the issue.

Thanks for your message.

I have restarted the server (and thus the Docker Daemon) multiple times. Also, I have removed all Docker containers, volumes, networks, and used my Docker Compose file to recreate everything. Unfortunately this did not help.

The /etc/docker/daemon.json file does not contain any DNS setting, this is the complete file.

{
  "debug": true,
  "live-restore": true,
  "log-driver": "json-file",
  "log-opts": {
    "max-file": "10",
    "max-size": "10m"
  },
  "storage-driver": "overlay2"
}

The file /etc/resolv.conf on the server is a symlink to /run/systemd/resolve/stub-resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad
search my.company

While 127.0.0.11 is used inside the container the server setting 127.0.0.53 should be fine (the systemd-resolved process listens here).

A check with resolvectl status lists all Docker networks and the container virtual ethernet cards like this:

Link 8 (docker0)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

That is not the important part. Of course you see that when that is the DNS to which your system sends the request but there are always actual external DNS servers behind it, just not in the stub-resolv.conf

Try

/run/systemd/resolve/resolv.conf
``