Hello everyone.
I have a “Ubuntu 22.04.5 LTS” machine with Docker version 26.1.3.
The simplified container setup is:
- I have a Ubuntu server, say “lxm1234.my.company” with IP “10.198.140.203”
- a certificate for this machine, say “myserver.my.company”
- “traefik” container (version 3.3.6 ) for routing HTTPS requests
- “wildfly” container (version 26.1.3.Final) for REST requests
- “client” container with “cURL” installed
- all containers use the same custom bridge network, using default setting (I do not configure anything but the name)
From my Windows machine, it is always possible to make a GET like this:
https://myserver.my.company/myapp/restservice
The call is routed from “traefik” to “wildfly” and the response is a “200” with a JSON.
When I do the same call from the “client” container, in the past this was working without a problem.
Since some days it is not allways possible.
The behaviour is strange in the sense that it works for a couple of minutes, then the request fails for some minutes, and then works again. This alternation keeps going all day.
If the “client” works fine, a “nslookup” from with the container returns the correct IP address.
nslookup myserver.my.company
Server: 127.0.0.11
Address: 127.0.0.11:53
myserver.my.company canonical name = lxm1234.my.company
Non-authoritative answer:
myserver.my.company canonical name = lxm1234.my.company
Name: lxm1234.my.company
Address: 10.198.140.203
A “ping” works fine as well.
ping -c 1 myserver.my.company
PING myserver.my.company (10.198.140.203): 56 data bytes
64 bytes from 10.198.140.203: seq=0 ttl=64 time=0.302 ms
If the problem occurs, the “nslookup” cannot resolve the real IP address, but “127.0.1.1”.
nslookup myserver.my.company
Server: 127.0.0.11
Address: 127.0.0.11:53
myserver.my.company canonical name = lxm1234.my.company
Name: lxm1234.my.company
Address: 127.0.1.1
myserver.my.company canonical name = lxm1234.my.company
“ping” uses this wrong IP.
ping -c 1 myserver.my.company
PING myserver.my.company (127.0.1.1): 56 data bytes
64 bytes from 127.0.1.1: seq=0 ttl=64 time=0.058 ms
As a consequence, the “cURL” fails:
curl -v -k https://myserver.my.company/myapp/restservice
* Host myserver.my.company:443 was resolved.
* IPv6: (none)
* IPv4: 127.0.1.1
* Trying 127.0.1.1:443...
* connect to 127.0.1.1 port 443 from 127.0.0.1 port 48358 failed: Connection refused
* Failed to connect to myserver.my.company port 443 after 2 ms: Could not connect to server
* closing connection #0
curl: (7) Failed to connect to myserver.my.company port 443 after 2 ms: Could not connect to server
Since I can always reach my REST service when the request originates from “outside the Docker network” for me it seems to be a problem with the DNS resolution within the Docker network.
What I have checked over time:
- on the server “iptables --list”, they do not change
- in the “client” container “/etc/resolv.conf”, it does not change
nameserver 127.0.0.11
search my.company
options edns0 trust-ad ndots:0
# Based on host file: '/etc/resolv.conf' (internal resolver)
On the server, in “/etc/docker/daemon.json” debugging is enabled.
Checking with “journalctl -u docker.service” I cannot see errors, but I can see entries with timestamps that match to my “works” and “fails” behaviour.
If it works:
May 08 16:13:28 lxm1234 dockerd[1374]: time="2025-05-08T16:13:28.726511615+02:00" level=debug msg="[resolver] forwarding query" client-addr="udp:127.0.0.1:51656" dns-server="udp:127.0.0.53:53" question=";myserver.my.company.\tIN\t A" spanID=8e3d456ec4ec5dc5 traceID=f6ff48e5fb33f62034fd41754093eec5
If it fails:
May 08 16:13:28 lxm1234 dockerd[1374]: time="2025-05-08T16:13:28.726923632+02:00" level=debug msg="[resolver] received A record \"127.0.1.1\" for \"lxm1234.my.company.\" from udp:127.0.0.53" spanID=191d7e681bacc8b1 traceID=f6ff48e5fb33f62034fd41754093eec5
I cannot understand what the problem is, since everything worked fine in the past.
It would be great to geht a hint what I can check to find the cause for this behaviour.