I’m facing a highly unusual and persistent network issue with the Docker daemon that we’ve been unable to solve after many days of in-depth troubleshooting. I’m hoping someone here might have insights into what could be happening at a deeper system level.
'As a non-English speaker and NPP, most work and problem descriptions list below are completed by Gemini from Google.
The Docker daemon (dockerd), running as a systemd service, consistently fails to connect to the internet to pull base images during a docker build, resulting in an i/o timeout. This happens even when the daemon is explicitly configured to use a working proxy. However, a user-initiated docker pull for the same image works perfectly. The evidence points to a system-level policy (like AppArmor or an iptables conflict) that is specifically blocking the dockerd process’s network access, but we’ve been unable to pinpoint the exact cause
System Environment
- OS: Ubuntu-based Linux distribution
- Docker Version: 28.5.0
- Proxy: Clash, running in a Docker container with
--network host. - Other Relevant Software: MicroK8s is also installed on the system.
The Problem
When running docker build on any Dockerfile that requires pulling a base image from Docker Hub, the process fails at the FROM instruction after about 30 seconds with an I/O timeout.
Example Error:
ERROR [internal] load metadata for docker.io/library/python:3.10-slim-bullseye: failed to solve: failed to fetch anonymous token: Get "https://auth.docker.io/token?...": dial tcp 74.86.151.162:443: i/o timeout
We have tried a huge number of diagnostic and configuration steps. Here is a summary:
1. Initial Proxy Configuration: We first configured the Docker daemon to use a running Clash proxy. We tried two methods, both of which ultimately failed:
daemon.json: Addinghttp-proxyandhttps-proxykeys to/etc/docker/daemon.json.systemdDrop-in File: Creating/etc/systemd/system/docker.service.d/http-proxy.confto set theHTTP_PROXYandHTTPS_PROXYenvironment variables for the service. This is the method we are currently using as it’s considered more robust.
2. Verifying the Proxy Service: We confirmed the proxy itself is not the issue.
- The proxy is a Clash instance running in a Docker container with
--network host. - The container is stable and running (not in a restart loop).
- The proxy service is listening on
0.0.0.0:7890and the API on0.0.0.0:9099. - A manual
curl --proxy http://127.0.0.1:7890 https://google.comfrom the host succeeds. - We created a script to automatically select a working node via the Clash API, and this script works perfectly, confirming the proxy is active and ready before we run
docker build.
3. The 172.17.0.1 vs 127.0.0.1 Issue: We spent a long time determining the correct IP for the daemon to use. We concluded that the daemon process should use the docker0 bridge IP (172.17.0.1) to connect to the host-networked proxy. The systemd file is currently configured with this IP.
4. Deep Dive with strace: strace on the dockerd PID revealed a critical contradiction: even when systemd was configured to use 172.17.0.1, the connect system call from the dockerd process was attempting to connect to 127.0.0.1:7890. This led us to discover that ~/.docker/config.json was also being read by the daemon and seemed to have a higher priority. We have since unified all configuration files (systemd drop-in and ~/.docker/config.json) to use 172.17.0.1, but the timeout persists.
5. Investigating System Policies: This led us to believe a system-level policy is the root cause.
- AppArmor: We found that the
docker-defaultprofile is inenforcemode. We tried setting it tocomplainmode (sudo aa-complain /etc/apparmor.d/docker-default) and restarting Docker, but the build still failed. This suggests AppArmor may not be the culprit, or the issue is elsewhere. - Systemd Unit File: We inspected the output of
systemctl cat docker.serviceand found no unusual security sandboxing options likePrivateNetwork=yes. - Firewall (
iptables): The output ofsudo iptables-saveshowed a complex ruleset due to the presence of MicroK8s. This creates a high probability of a rule conflict, but we couldn’t identify a specific rule that would block thedockerdprocess’s outbound traffic.
The Current “Impossible” State & The Core Question
This is where we are now:
- The Docker daemon’s
systemdservice is definitively configured to use the proxy athttp://172.17.0.1:7890. - The proxy service is definitively working and accessible at that address.
- The AppArmor profile for Docker is in “complain” mode, so it should not be blocking the connection.
- The user-initiated
docker pull python:3.10-slim-bullseyecommand works perfectly. - The daemon-initiated
docker buildstill fails with a timeout when it needs to fetch the same image.
The core question for the community is:
What could be blocking a systemd-launched dockerd process from making a proxied network connection, when all configurations appear correct and a user-launched docker client command can successfully make the same connection? The presence of MicroK8s seems highly relevant.
The Only Workaround
The only way to get a build to succeed is to first run sudo docker pull <base_image>, and only then run sudo docker build .... This “pre-warms” the cache, so the daemon doesn’t need to make a network request.
Any insights into what could cause this specific daemon behavior or how to further diagnose the system-level block would be greatly appreciated. Thank you for your time.