Strange dns issue appeared out of a sudden

Hi,

I am using docker v24 on ubuntu 22.04, and everything was working for a year until recently. The dns resolution broke down on containers that use default bridge network. The host network, and a newly created custom bridge network also work as expected. Any container starting on default bridge, will not resolve any dns.

Ubuntu uses netplan, and here are my 2 netplan config files.

# default one
network:
  ethernets:
    ens192:
      addresses:
      - 10.12.0.10/24
      gateway4: 10.12.0.1
      nameservers:
        addresses:
        - 8.8.8.8
        search: []
  version: 2

# secondary one (for some nfs nas connection)
network:
  ethernets:
    ens224:
      addresses:
      - 10.191.241.103/21
      gateway4: 10.191.240.1
      nameservers:
        addresses:
        - 8.8.8.8
        search: []
  version: 2

Host resolv.conf:

nameserver 127.0.0.53
options edns0 trust-ad
search .

The container on default bridge network resolv.conf (not working):

nameserver 8.8.8.8
nameserver 8.8.8.8
search .

The container on custom bridge network resolv.conf (it works):

nameserver 127.0.0.11
options edns0 trust-ad ndots:0

The default bridge network:

[
    {
        "Name": "bridge",
        "Id": "918b3a55ef92c5e6d2d00fdddf331a9afff4bfaf5a1067c5c08810f277064f5f",
        "Created": "2024-10-18T14:32:35.329498902+03:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.17.0.0/16",
                    "Gateway": "172.17.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {},
        "Options": {
            "com.docker.network.bridge.default_bridge": "true",
            "com.docker.network.bridge.enable_icc": "true",
            "com.docker.network.bridge.enable_ip_masquerade": "true",
            "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
            "com.docker.network.bridge.name": "docker0",
            "com.docker.network.driver.mtu": "1500"
        },
        "Labels": {}
    }
]

The custom bridge network:

[
    {
        "Name": "my_custom_network",
        "Id": "3f859cff5f98feafb5131b3391ebfea4570b4499362fa9e3a8a0fbd2fb23cbdc",
        "Created": "2024-10-18T15:17:28.355709568+03:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.20.0.0/16",
                    "Gateway": "172.20.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {},
        "Options": {},
        "Labels": {}
    }
]

I haven’t touched the server for a long time, except running apt upgrade a few times. I dont understand why it stopped working. Tried changing a few settings and dns addresses with no luck. Does anybody have any idea?

Thanks in advance.

How did you test that the problem is the DNS resolution and not any other network issue?
Does name resolution work when you specify the DNS server and set it to for example one in your LAN network? Do you have a local DNS server?

Because the host networking works as expected. As i wrote before, even a new custom bridge network works fine. The resolution fails only on containers created in default network. I am going to erase everything about docker, and reinstall as a last resort if i can’t find a fix otherwise.

Name resolution does not work even if i specify using --dns option. I don’t have a local dns to test, i use public dns for testing.

It actually proves my point. If the connection works with the host network when there is no network isolation and works with custom networks which are different than the default bridge, you could still have a network issue affecting the connection to the DNS server. In fact, if you define the DNS parameter and the DNS resolution still doesn’t work, it could be another sign of a network issue.

Without defining the DNS option in the docker run command, Docker on the default bridge and on host network uses your DNS config on the host basically copying: /run/systemd/resolve/resolv.conf
So whatever you have in there, should be accesible from the container. If you overide it with the DNS option, and it still doesn’t work, I would expect other network operations failing too.

Try this:

docker run --rm -it nicolaka/netshoot curl --header 'Host: example.com' $(dig +short example.org)

This will resolve the IP address of example.com on the host and use the IP address in the container to access the website using curl. If that works, it you probably have a DNS issue. If it doesn’t work, it is a network issue and you either have a firewall on the host or there is some kind of incompatibility between your host network and the default Docker bridge.

You are right, there is something wrong going on.

curl: (7) Failed to connect to 93.184.215.14 port 80 after 3072 ms: Couldn’t connect to server

But, why is this not happening with a custom bridge network ?

You can try some debugging like I described in another topic

You have a different problem but checking the IP addresses and trying to access a port on the host could also give you new ideas. If you can’t even access a python server on the host that is worst than not being able to access the internet.

Make sue no firewall blocks the requests and that your docker bridge network is not colliding with any other ip on that host or the gateway IP on your LAN

I tried the local access as you instructed, and it works. I only can’t access internet. Disabling ufw also did not help.

EDIT: reinstalling after purging docker and /var/lib/docker also did not change anything.

I’ve installed tshark and monitored traffic while pinging google.com inside the container. Both default and custom bridge networks can’t get response back, but default one also can’t resolve.

Default bridge network:

$ curl ifconfig.me
 Could not resolve host: ifconfig.me
* Closing connection
curl: (6) Could not resolve host: ifconfig.me
$ tshark -i docker0
Capturing on 'docker0'
 ** (tshark:2124047) 12:04:07.636013 [Main MESSAGE] -- Capture started.
 ** (tshark:2124047) 12:04:07.636144 [Main MESSAGE] -- File: "/tmp/wireshark_docker09MSBW2.pcapng"
    1 0.000000000   172.17.0.2 → 8.8.8.8      DNS 82 Standard query 0xa3d0 A ifconfig.me OPT
    2 0.000090926   172.17.0.2 → 8.8.8.8      DNS 82 Standard query 0x140d AAAA ifconfig.me OPT
    3 2.001435165   172.17.0.2 → 8.8.8.8      DNS 82 Standard query 0xa3d0 A ifconfig.me OPT
    4 2.001488717   172.17.0.2 → 8.8.8.8      DNS 82 Standard query 0x140d AAAA ifconfig.me OPT
    5 3.061420009 10.191.241.103 → 172.17.0.2   ICMP 110 Destination unreachable (Host unreachable)
    6 3.061437895 10.191.241.103 → 172.17.0.2   ICMP 110 Destination unreachable (Host unreachable)
    7 3.061442410 10.191.241.103 → 172.17.0.2   ICMP 110 Destination unreachable (Host unreachable)
    8 3.061447533 10.191.241.103 → 172.17.0.2   ICMP 110 Destination unreachable (Host unreachable)
    9 4.809458035   172.17.0.2 → 8.8.8.8      DNS 82 Standard query 0x140d AAAA ifconfig.me OPT
   10 4.931837672   172.17.0.2 → 8.8.8.8      DNS 82 Standard query 0xa3d0 A ifconfig.me OPT
   11 7.865410687 10.191.241.103 → 172.17.0.2   ICMP 110 Destination unreachable (Host unreachable)
   12 7.865432041 10.191.241.103 → 172.17.0.2   ICMP 110 Destination unreachable (Host unreachable)

Custom bridge network:

$ curl ifconfig.me
curl: (7) Failed to connect to ifconfig.me port 80 after 3070 ms: Couldn't connect to server
$ tshark -i br-8d4945cda37b
Capturing on 'br-8d4945cda37b'
 ** (tshark:2125422) 12:05:31.403603 [Main MESSAGE] -- Capture started.
 ** (tshark:2125422) 12:05:31.403753 [Main MESSAGE] -- File: "/tmp/wireshark_br-8d4945cda37b0KRSV2.pcapng"
    1 0.000000000 02:42:ac:12:00:02 → Broadcast    ARP 42 Who has 172.18.0.1? Tell 172.18.0.2
    2 0.000039117 02:42:c6:ef:ce:15 → 02:42:ac:12:00:02 ARP 42 172.18.0.1 is at 02:42:c6:ef:ce:15
    3 0.000044621   172.18.0.2 → 34.160.111.145 TCP 74 35704 → 80 [SYN] Seq=0 Win=42340 Len=0 MSS=1460 SACK_PERM=1 TSval=3134608780 TSecr=0 WS=512
    4 1.009309740   172.18.0.2 → 34.160.111.145 TCP 74 [TCP Retransmission] [TCP Port numbers reused] 35704 → 80 [SYN] Seq=0 Win=42340 Len=0 MSS=1460 SACK_PERM=1 TSval=3134609790 TSecr=0 WS=512
    5 3.021344508   172.18.0.2 → 34.160.111.145 TCP 74 [TCP Retransmission] [TCP Port numbers reused] 35704 → 80 [SYN] Seq=0 Win=42340 Len=0 MSS=1460 SACK_PERM=1 TSval=3134611802 TSecr=0 WS=512
    6 3.053342010 10.191.241.103 → 172.18.0.2   ICMP 102 Destination unreachable (Host unreachable)
    7 3.053363723 10.191.241.103 → 172.18.0.2   ICMP 102 Destination unreachable (Host unreachable)
    8 3.053369945 10.191.241.103 → 172.18.0.2   ICMP 102 Destination unreachable (Host unreachable)
    9 8.269368360 02:42:c6:ef:ce:15 → 02:42:ac:12:00:02 ARP 42 Who has 172.18.0.2? Tell 172.18.0.1
   10 8.269439023 02:42:ac:12:00:02 → 02:42:c6:ef:ce:15 ARP 42 172.18.0.2 is at 02:42:ac:12:00:02

Wait, you wrote before that custom bridge networks worked

Does it mean a new custom bridge network works, but you had previously creatred bridge network that doesn’t work?

Network issues are often MTU issues, but your error messages don’t look like that.

Neither works, but only default network can’t resolve domains. I created another network and the same. I think it wasnt working before either, I was just too focused on dns earlier.

curl -v 34.160.111.145

*   Trying 34.160.111.145:80...
* connect to 34.160.111.145 port 80 from 172.19.0.2 port 60416 failed: Host is unreachable
* Failed to connect to 34.160.111.145 port 80 after 3056 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to 34.160.111.145 port 80 after 3056 ms: Couldn't connect to server

tshark output:

Capturing on 'br-8d4945cda37b'
 ** (tshark:3091525) 11:02:39.686217 [Main MESSAGE] -- Capture started.
 ** (tshark:3091525) 11:02:39.686334 [Main MESSAGE] -- File: "/tmp/wireshark_br-8d4945cda37bL0XUV2.pcapng"
    1 0.000000000 fe80::42:c6ff:feef:ce15 → ff02::16     ICMPv6 110 Multicast Listener Report Message v2
    2 0.028439395 02:42:ac:12:00:02 → Broadcast    ARP 42 Who has 172.18.0.1? Tell 172.18.0.2
    3 0.028478487 02:42:c6:ef:ce:15 → 02:42:ac:12:00:02 ARP 42 172.18.0.1 is at 02:42:c6:ef:ce:15
    4 0.028483873   172.18.0.2 → 34.160.111.145 TCP 74 42404 → 80 [SYN] Seq=0 Win=42340 Len=0 MSS=1460 SACK_PERM=1 TSval=3217235970 TSecr=0 WS=512
    5 0.436049066 fe80::42:c6ff:feef:ce15 → ff02::16     ICMPv6 110 Multicast Listener Report Message v2
    6 1.043998387   172.18.0.2 → 34.160.111.145 TCP 74 [TCP Retransmission] [TCP Port numbers reused] 42404 → 80 [SYN] Seq=0 Win=42340 Len=0 MSS=1460 SACK_PERM=1 TSval=3217236986 TSecr=0 WS=512
    7 3.059994388   172.18.0.2 → 34.160.111.145 TCP 74 [TCP Retransmission] [TCP Port numbers reused] 42404 → 80 [SYN] Seq=0 Win=42340 Len=0 MSS=1460 SACK_PERM=1 TSval=3217239002 TSecr=0 WS=512
    8 3.092013377 10.191.241.103 → 172.18.0.2   ICMP 102 Destination unreachable (Host unreachable)
    9 3.092030889 10.191.241.103 → 172.18.0.2   ICMP 102 Destination unreachable (Host unreachable)
   10 3.092036860 10.191.241.103 → 172.18.0.2   ICMP 102 Destination unreachable (Host unreachable)

Finally solved the issue. For some reason, they added another default route to nfs location and it is seen as more priority from inside the docker containers. Removing that default route fixed the issue. Thanks for your help, i learned some valuable knowledge.

1 Like