Docker breaks network after short period

Seeing something odd with the default network that leads to breakage of my host networking.

Debian 12.4, docker-ce installed as per Debian instructions (Install Docker Engine on Debian | Docker Docs)

version 25.0.3, build 4debf41

I noticed failed network operations when adding packages to an ubuntu 18.04 image, then discovered that my host networking had also failed. It appears to be caused by an extra default route that docker installs after some period of network activity (10-30 seconds typically)

Reproducible using a simple bash container and just pinging an external host for a while.

Routes before:

default via 192.168.99.254 dev enp0s31f6
default via 192.168.99.254 dev enp0s31f6 proto dhcp src 192.168.99.193 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.99.128/25 dev enp0s31f6 proto kernel scope link src 192.168.99.193 metric 100

run basic bash container: (docker run -it --rm bash) and ping external host, after ~ 30 seconds the routes now:

0.0.0.0 dev vetha636f44 scope link
default dev vetha636f44 scope link
default via 192.168.99.254 dev enp0s31f6
default via 192.168.99.254 dev enp0s31f6 proto dhcp src 192.168.99.193 metric 100
169.254.0.0/16 dev vetha636f44 proto kernel scope link src 169.254.240.111
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.99.128/25 dev enp0s31f6 proto kernel scope link src 192.168.99.193 metric 100

Note the new routes to dev veth* … external networking broken at this point, can recover by removing default dev veth* route.

Once the bad route is removed, networking is restored but exiting container and restarting will cause it to fail again the same way.

I don’t see this on an almost identical system running docker-ce 24.0.5, build ced0996

Update: tried downgrading the docker version to match my other system, same problem (+/- a second or two on the loss of network).

So not likely specific to the version but some configuration ? Hints on where to look appreciated, thanks.

Default routes should not be added, but I don’t think 169.254.0.0/16 should be added to the routes either. That is an IP range which is assigned to machines usualy when the DHCP could not assign an IP address to it. And I think I saw it using for other special cases as well. It should not depend on starting a container. If it does, my guess is that the container could not get an IP address for some reason from the original ip range and it got another one somehow, but I have no idea how and why.

Where is your Debian 12 host machine? Is it in a cloud? I think I saw a similar issue before, but I only have a vague memory about that, I think that issue was related to a cloud provider but I am not sure.

What happens if you create a custom docker network and use that instead of the default network?

It is a local Debian 12 machine. I also saw some issues with cloud instances in searching but the solutions didn’t seem applicable here.

I tried a non default bridge network, same issue as before. Using --networking=host is OK though and might be my fallback … weird my other server works fine with default (bridge) networking, haven’t found a significant difference yet.

using host networks can’t be a fallback. You can compare your two machines what was installed on one that wasn’t on the other or how the confiuration is different.

I also found this for you

Thanks, that link appears to be a very similar problem. Their solution was blacklisting in conman which I’m not using (NetworkManager instead) and when I investigated blacklisting in NM it seems the veth* devices are already unmanaged ?

Comparing with my working host, the primary eth0 was not managed by NM either so I tried that change but no improvement. It turns out network activity in the container is not required, I can start bash and do nothing, eventually those extra routes for veth* appear, despite the fact NM isn’t managing them.

Something must be starting them though for the docker containers, will need to identify what … always about 30 seconds after the container starts.

I meant comparing instaled packages, running services, maybe the content of /etc. This is one way ro write all the content of etc into a single file so you can use the diff command to see the difference between the two servers etc folder.

find /etc -type f -exec sh -c "cat \"{}\" | awk -v "file={}" '{printf \"%s: \",file ; print}'" \; > etc.txt

Each line in the file will start with the filename in which the line was found.

You can also export the list of installed apt packages by running the following command:

dpkg -l > apt.txt

and run

ps -e -o command > proc.txt

to export the running processes, but I wouldn’t use diff here. It is easier to just look at the output.

I found many similar issues easily so I would not link all of them, but you cans earch for

networkmanager docker default routes

on Google. Hopefully one of the results will help.

1 Like

It turns out the problem was connman ! … I didn’t think it was running but apparently it was, likely because I had installed multiple desktop environments on this system.

Fixing the blacklist in /etc/connman/main.conf and restarting the daemon and it is working :slight_smile:

It is interesting many of the issues found via google remain unresolved, thanks for your assistance in moving this into the right column !!!