Dockerd failing DNS timeouts

0

I am running 350 containers, however they are having dns timeout issues.

The below logs are being observed in the docker.service

Oct 20 22:18:54 node1 dockerd[22149]: time="2023-10-20T22:18:54.208340628Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;us1.storj.io.\tIN\t A" error="read udp 10.1.1.248:59201->1.1.1.1:53: i/o timeout"
Oct 20 22:20:14 node1 dockerd[22149]: time="2023-10-20T22:20:14.302824917Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;us1.storj.io.\tIN\t A" error="read udp 10.1.1.35:52059->1.1.1.1:53: i/o timeout"
Oct 20 22:20:59 node1 dockerd[22149]: time="2023-10-20T22:20:59.359135519Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;us1.storj.io.\tIN\t A" error="read udp 10.1.2.97:39852->1.1.1.1:53: i/o timeout"
Oct 20 22:23:01 node1 dockerd[22149]: time="2023-10-20T22:23:01.080541412Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;collectora.storj.io.\tIN\t A" error="read udp 10.1.1.169:46591->1.1.1.1:53: i/o timeout"
Oct 20 22:23:48 node1 dockerd[22149]: time="2023-10-20T22:23:48.319370297Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;us1.storj.io.\tIN\t AAAA" error="read udp 10.1.1.211:48580->1.1.1.1:53: i/o timeout"
Oct 20 22:24:25 node1 dockerd[22149]: time="2023-10-20T22:24:25.994840345Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;version.storj.io.\tIN\t A" error="read udp 10.1.1.32:39979->1.1.1.1:53: i/o timeout"
Oct 20 22:26:18 node1 dockerd[22149]: time="2023-10-20T22:26:18.656264587Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;us1.storj.io.\tIN\t A" error="read udp 10.1.1.146:57250->1.1.1.1:53: i/o timeout"
Oct 20 22:29:51 node1 dockerd[22149]: time="2023-10-20T22:29:51.320022392Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;version.storj.io.\tIN\t AAAA" error="read udp 10.1.1.233:47655->1.1.1.1:53: i/o timeout"
Oct 20 22:29:51 node1 dockerd[22149]: time="2023-10-20T22:29:51.320369184Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;version.storj.io.\tIN\t A" error="read udp 10.1.1.233:42511->1.1.1.1:53: i/o timeout"
Oct 20 22:32:31 node1 dockerd[22149]: time="2023-10-20T22:32:31.558775625Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;version.storj.io.\tIN\t AAAA" error="read udp 10.1.1.15:34058->1.1.1.1:53: i/o timeout"

I test running a container on the network and it’s able to resolve without issue.

root@node1:~# docker run -it --rm --network my_custom_network alpine:latest
/ # cat /etc/resolv.conf 
nameserver 127.0.0.11
options ndots:0
/ # ping google.com
PING google.com (142.250.64.78): 56 data bytes
64 bytes from 142.250.64.78: seq=0 ttl=118 time=10.165 ms
^C
--- google.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 10.165/10.165/10.165 ms
/ # ping version.storj.io
PING version.storj.io (34.173.164.90): 56 data bytes
64 bytes from 34.173.164.90: seq=0 ttl=59 time=32.469 ms
64 bytes from 34.173.164.90: seq=1 ttl=59 time=32.599 ms

the docker compose is structured as such in regards to the network

networks:
  my_custom_network:
    driver: bridge
    ipam:
      config:
        - subnet: 10.1.0.0/22

It would appear DNS is working fine for the containers, but the dockerd is for some reason timing out on some of the requests. How can i troubelshoot figure out the reason for these timeouts? is it possible there is virtual network congestion?

root@node1:~# docker version
Client: Docker Engine - Community
 Version:           24.0.6
 API version:       1.43
 Go version:        go1.20.7
 Git commit:        ed223bc
 Built:             Mon Sep  4 12:31:44 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.6
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.7
  Git commit:       1a79695
  Built:            Mon Sep  4 12:31:44 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.24
  GitCommit:        61f9fd88f79f081d64d6fa3bb1a0dc71ec870523
 runc:
  Version:          1.1.9
  GitCommit:        v1.1.9-0-gccaecfc
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Did you check your daemon config file? I read before that people set a proxy there and were disappointed, that the config was not applied to containers, so network settings are different between daemon and containers.

I have met the similar issue.
I am running some docker containers to generate HTTP/TCP or UDP traffic to a system.
Once every while there’s traffic failure with connection ‘TIME OUT’, no sign of any overload, port used up, etc.

Finally I found the docker uses containerd which was on version 1.6.x.
But if I switch to use containerd on version 1.7.x, the issue is gone.
I believe the issue on containerd v1.6.x but fixed in v1.7.x