Docker containers with images based on Debian can't access the internet but containers based on Alpine work just fine

I have been running a bunch of containers for over a year alongside UFW on my system (which has docker and non-docker services running on it.)

Some of my containers are using regular docker, some are using docker-compose. I have not had any networking issues before.

Out of nowhere my containers no longer have internet access if UFW is enabled. With UFW disabled, all my alpine based can run apk update and it connects to the repos successfully and updates. On my python containers (based on Debian) I run apt-get update it says temporary failure in name resolution. It attempts to access deb.debian.org or by the domain name and the IP address, both fail.

On all my containers /etc/resolv.conf reads:

nameserver 127.0.0.11
options ndots: 0

I use a pihole on my network at IP 10.64.187.1 as the DNS server for this computer. The host’s /etc/resolv.conf reads:

nameserver 10.64.187.1
nameserver 127.0.0.53

The host can update it’s own packages. The host OS is Bodhi Linux (Based on Ubuntu 20) and docker is version: Docker version 20.10.24, build 297e128 installed via snap.

I tried running sudo wg-quick down wg0 to disconnect from the pihole, which changes the /etc/resolv.conf to:

nameserver 127.0.0.53

The host can still update repos but these Debian based containers cannot.

Here are a few things I have tried.

Installed ufw-docker and set up the rules – changed nothing
Ran sudo snap remove docker --purge and then sudo snap install docker
Ran the following:

sudo pkill docker
sudo iptables -t nat -F
sudo ifconfig docker0 down
sudo brctl delbr docker0
sudo snap restart docker

I have documented a lot of other information in this reddit post and the comments

Please help me to troubleshoot and pinpoint the issue here because I have a project that requires internet access that I have been developing for 6 months and it’s development has been put on hold until this is solved.

I have included my compose.yaml and Dockerfiles here: pastebin .com/qxU5vUGC

I think the first that you should do is remove the snap version of Docker and install Docker from the official repository provided by Docker Inc that you can find in the official documentation:

I seriously don’t know why that snap version exists. It is not officially supported by Docker, not even created by Docker and very often causes problems. LXD can be installed from snap too, but LXD was created by Canonical and it is optimized to work as a snap package.

When you install Docker from the official repository make sure you install a specific version not the latest. The latest is now 24.0.1, but I would still install 23.0.x or if you want to keep 20.10.x then that. The documentation shows how you can install a specific version. After the installation I also recommend running the following command to hold the installed version othervise you would upgrade it to 24.0.1 later.

apt-mark hold docker-ce docker-ce-cli docker-ce-rootless-extras containerd.io

containerd.io is optional but if you want to make sure that containerd is compatible with the installed Docker version it is better holding that too and periodically checking the new patch versions.

update:

I guess the snap version was created so Docker could be installed on non-supported distributions, but it still doesn’t seem to be stable. Since you don’t have Ubuntu just a distro based on Ubuntu, you can try the installation guide for Ubuntu. It is not guaranteed to work, but If the distribution didn’t change what is required by Docker, it could work beter than the snap package.

What is the reasoning for not using the latest version (24.0.1)?

23.0.0 came out in February. That is still pretty new and there were some issues at the beginning. So I would still wait for some patches before upgrading since 23.0.x already proved that it works for me and I didn’t find announcements of 24.0.

On the other hand, if you run Docker on your local machine and not in a production environment, you can try 24.0 too.

Ok, since it is not a prod environment I will try 24.0 for now. So far I am able to pull from the repos when building but I need to further test. Will report back my findings.

Ok, I initially installed the latest version from the apt repository. I was able to build the image, which included an update, upgrade, and multiple installs from the repos, I was optimistic because of this that the container could connect to the internet. However, once I attached a root shell to the container and attempted a apt update it failed.

I then tried to purge 24.0.1 and installed Docker version 23.0.6, build ef23cbc from the apt repo and attempted again. Same result. But now even the alpine based containers are unable to access the internet.

Note that I cannot resolve docker containers either. One of my python containers needs to connect to a rabbitmq container but it is unable to connect to it based on the container service name.

What other steps can I take to troubleshoot?

Also note that my firewall is disabled

EDIT: I just noticed something. When I run ip a I can see that there are a ton of br-******** networks, some with the same IP range. Including 172.10.10.0/24 which is the IP range the specific network I’m using.

I deleted all br-******* and veth-******** networking interfaces and the issue still persists

br-*” interfaces are the docker network bridges. If you your host, gateway or DNS server is in the same subnet that could indeed cause internet issues.

If you have nothing important in the docker data root (/var/lib/docker) you can stop docker, remove the folder and start Docker again. Sometimes that helps if something remained in that folder from a previous installation.

If that doesnt help or you can’t do that, can you run the following commands?

docker run --rm -it nicolaka/netshoot ping 8.8.8.8
docker run --rm -it nicolaka/netshoot nslookup google.com 8.8.8.8
docker run --rm -it nicolaka/netshoot nslookup google.com 

Ok i deleted /var/lib/docker and restarted.

When I try to run the docker command you gave I get this:

Unable to find image 'nicolaka/netshoot:latest' locally
latest: Pulling from nicolaka/netshoot
8a49fdb3b6a5: Pulling fs layer 
f08cc7654b42: Pulling fs layer 
bacdb080ad6d: Pulling fs layer 
df75a2676b1d: Pulling fs layer 
d30ac41fb6a9: Waiting 
3f3eebe79603: Waiting 
086410b5650d: Waiting 
4f4fb700ef54: Waiting 
5a7fe97d184f: Waiting 
a6d1b2d7a50e: Waiting 
599ae1c27c63: Waiting 
dd5e50b27eb9: Waiting 
2681a5bf3176: Waiting 
2517e0a2f862: Waiting 
7b5061a1528d: Waiting 
docker: open /var/lib/docker/tmp/GetImageBlob1007678346: no such file or directory.
See 'docker run --help'.

Attempted to reinstall docker and it won’t install due to this error:

Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
invoke-rc.d: initscript docker, action "start" failed.
● docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
     Active: activating (auto-restart) (Result: exit-code) since Sun 2023-05-21 16:02:32 CDT; 10ms ago
TriggeredBy: ● docker.socket
       Docs: https://docs.docker.com
    Process: 14687 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
   Main PID: 14687 (code=exited, status=1/FAILURE)

May 21 16:02:32 bodhilinux systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
May 21 16:02:32 bodhilinux systemd[1]: docker.service: Failed with result 'exit-code'.
May 21 16:02:32 bodhilinux systemd[1]: Failed to start Docker Application Container Engine.
dpkg: error processing package docker-ce (--configure):
 installed docker-ce package post-installation script subprocess returned error exit status 1
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for systemd (245.4-4ubuntu3.21) ...
Errors were encountered while processing:
 docker-ce
E: Sub-process /usr/bin/dpkg returned an error code (1)

Ok, I was able to force the install to go through by just repeated doing the install command until it went thru.

Ran your commands with the following results:

docker run --rm -it nicolaka/netshoot ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=58 time=23.3 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=58 time=18.0 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=58 time=19.0 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=58 time=18.8 ms
docker run --rm -it nicolaka/netshoot nslookup google.com 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
Name:   google.com
Address: 142.251.32.206
Name:   google.com
Address: 2607:f8b0:4000:80a::200e
docker run --rm -it nicolaka/netshoot nslookup google.com 
Server:         68.105.28.11
Address:        68.105.28.11#53

Non-authoritative answer:
Name:   google.com
Address: 142.251.45.46
Name:   google.com
Address: 2607:f8b0:4023:1004::65
Name:   google.com
Address: 2607:f8b0:4023:1004::71
Name:   google.com
Address: 2607:f8b0:4023:1004::66
Name:   google.com
Address: 2607:f8b0:4023:1004::64

The python containers can still not access other containers in the compose network or external sites

EDIT: Wait it is working now. I will continue to test, so far after enabling UFW it is still working

Didn’t you stopped Docker first before deleting the folder? If Docker is still running when you delete the fodler, Docker could keep something in memory and write it back when stopping.

If it was not a clean install, there is a risk that something is still missing, but let’s hope not. Don’t ask what could be missing, it is just not normal for a package that it requires to be installed multiple times and some post installation script could have unexpected results.

When you reinstalled Docker, how did you do that? Are you sure you removed everything before trying to install it again?

If the python containers are trying to access a domain that you didn’t try from other containers, it is also possible that it is somehow caused by IPv6. It wouldn’t be the first time. For example if apk update wants to access the apk repository which (I have no idea if it is true) doesn’t have IPv6 address, i that could work while other containers could try to resolve different domains and get IPv6 address. If your network doesn’t support it, it explains why the internet doesn’t work.

I would still check the IPv6 addresses. The required domains could be resolved randomly to IPv6 and IPV4 too.