Docker breaks network after short period

balskan · February 16, 2024, 8:32pm

Seeing something odd with the default network that leads to breakage of my host networking.

Debian 12.4, docker-ce installed as per Debian instructions (Install Docker Engine on Debian | Docker Docs)

version 25.0.3, build 4debf41

I noticed failed network operations when adding packages to an ubuntu 18.04 image, then discovered that my host networking had also failed. It appears to be caused by an extra default route that docker installs after some period of network activity (10-30 seconds typically)

Reproducible using a simple bash container and just pinging an external host for a while.

Routes before:

default via 192.168.99.254 dev enp0s31f6
default via 192.168.99.254 dev enp0s31f6 proto dhcp src 192.168.99.193 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.99.128/25 dev enp0s31f6 proto kernel scope link src 192.168.99.193 metric 100

run basic bash container: (docker run -it --rm bash) and ping external host, after ~ 30 seconds the routes now:

0.0.0.0 dev vetha636f44 scope link
default dev vetha636f44 scope link
default via 192.168.99.254 dev enp0s31f6
default via 192.168.99.254 dev enp0s31f6 proto dhcp src 192.168.99.193 metric 100
169.254.0.0/16 dev vetha636f44 proto kernel scope link src 169.254.240.111
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.99.128/25 dev enp0s31f6 proto kernel scope link src 192.168.99.193 metric 100

Note the new routes to dev veth* … external networking broken at this point, can recover by removing default dev veth* route.

Once the bad route is removed, networking is restored but exiting container and restarting will cause it to fail again the same way.

I don’t see this on an almost identical system running docker-ce 24.0.5, build ced0996

balskan · February 21, 2024, 9:05pm

Update: tried downgrading the docker version to match my other system, same problem (+/- a second or two on the loss of network).

So not likely specific to the version but some configuration ? Hints on where to look appreciated, thanks.

rimelek · February 21, 2024, 10:54pm

Default routes should not be added, but I don’t think 169.254.0.0/16 should be added to the routes either. That is an IP range which is assigned to machines usualy when the DHCP could not assign an IP address to it. And I think I saw it using for other special cases as well. It should not depend on starting a container. If it does, my guess is that the container could not get an IP address for some reason from the original ip range and it got another one somehow, but I have no idea how and why.

Where is your Debian 12 host machine? Is it in a cloud? I think I saw a similar issue before, but I only have a vague memory about that, I think that issue was related to a cloud provider but I am not sure.

What happens if you create a custom docker network and use that instead of the default network?

balskan · February 21, 2024, 11:41pm

It is a local Debian 12 machine. I also saw some issues with cloud instances in searching but the solutions didn’t seem applicable here.

I tried a non default bridge network, same issue as before. Using --networking=host is OK though and might be my fallback … weird my other server works fine with default (bridge) networking, haven’t found a significant difference yet.

rimelek · February 22, 2024, 2:35pm

using host networks can’t be a fallback. You can compare your two machines what was installed on one that wasn’t on the other or how the confiuration is different.

I also found this for you

github.com/moby/libnetwork

Docker adds a default dev in routes which breaks host connectivity

opened 09:19PM - 08 Nov 19 UTC

closed 01:28PM - 11 Dec 19 UTC

eoli3n

Docker version 19.03.4-ce, build 9013bf583a Archlinux (up to date) All iptab…les flushed docker package freshly reinstalled default configuration (no daemon.json) When i start a container, after 30 sec, a strange default dev comes and breaks hosts connectivity. Container never have connectivity. ``` root@osz ~# docker run --rm -d -p 8080:80 nginx d086de00d5593253ed1e95bc834572346c2106f81d20e02beea112be488b3a77 root@osz ~# ip route default via 192.168.1.254 dev eno1 proto dhcp src 192.168.1.46 metric 10 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 192.168.1.0/24 dev eno1 proto kernel scope link src 192.168.1.46 192.168.1.254 dev eno1 proto dhcp scope link src 192.168.1.46 metric 10 root@osz ~# ping 216.58.215.35 PING 216.58.215.35 (216.58.215.35) 56(84) bytes of data. 64 bytes from 216.58.215.35: icmp_seq=1 ttl=53 time=14.4 ms 64 bytes from 216.58.215.35: icmp_seq=2 ttl=53 time=13.8 ms 64 bytes from 216.58.215.35: icmp_seq=3 ttl=53 time=13.6 ms 64 bytes from 216.58.215.35: icmp_seq=4 ttl=53 time=14.2 ms 64 bytes from 216.58.215.35: icmp_seq=5 ttl=53 time=13.8 ms 64 bytes from 216.58.215.35: icmp_seq=6 ttl=53 time=13.9 ms 64 bytes from 216.58.215.35: icmp_seq=7 ttl=53 time=13.9 ms 64 bytes from 216.58.215.35: icmp_seq=8 ttl=53 time=13.6 ms 64 bytes from 216.58.215.35: icmp_seq=9 ttl=53 time=13.8 ms 64 bytes from 216.58.215.35: icmp_seq=10 ttl=53 time=13.3 ms 64 bytes from 216.58.215.35: icmp_seq=11 ttl=53 time=13.8 ms 64 bytes from 216.58.215.35: icmp_seq=12 ttl=53 time=13.9 ms 64 bytes from 216.58.215.35: icmp_seq=13 ttl=53 time=13.7 ms 64 bytes from 216.58.215.35: icmp_seq=14 ttl=53 time=13.7 ms 64 bytes from 216.58.215.35: icmp_seq=15 ttl=53 time=13.7 ms 64 bytes from 216.58.215.35: icmp_seq=16 ttl=53 time=14.2 ms From 169.254.179.145 icmp_seq=17 Destination Host Unreachable From 169.254.179.145 icmp_seq=18 Destination Host Unreachable From 169.254.179.145 icmp_seq=19 Destination Host Unreachable From 169.254.179.145 icmp_seq=20 Destination Host Unreachable From 169.254.179.145 icmp_seq=21 Destination Host Unreachable ^C --- 216.58.215.35 ping statistics --- 23 packets transmitted, 16 received, +5 errors, 30.4348% packet loss, time 22099ms rtt min/avg/max/mdev = 13.303/13.826/14.449/0.260 ms, pipe 4 root@osz ~# ip route default dev veth4a2ef1a scope link default via 192.168.1.254 dev eno1 proto dhcp src 192.168.1.46 metric 10 169.254.0.0/16 dev veth4a2ef1a proto kernel scope link src 169.254.179.145 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 192.168.1.0/24 dev eno1 proto kernel scope link src 192.168.1.46 192.168.1.254 dev eno1 proto dhcp scope link src 192.168.1.46 metric 10 ``` Killing the container solve the problem and remove default dev.

balskan · February 22, 2024, 7:36pm

Thanks, that link appears to be a very similar problem. Their solution was blacklisting in conman which I’m not using (NetworkManager instead) and when I investigated blacklisting in NM it seems the veth* devices are already unmanaged ?

Comparing with my working host, the primary eth0 was not managed by NM either so I tried that change but no improvement. It turns out network activity in the container is not required, I can start bash and do nothing, eventually those extra routes for veth* appear, despite the fact NM isn’t managing them.

Something must be starting them though for the docker containers, will need to identify what … always about 30 seconds after the container starts.

rimelek · February 22, 2024, 9:01pm

I meant comparing instaled packages, running services, maybe the content of /etc. This is one way ro write all the content of etc into a single file so you can use the diff command to see the difference between the two servers etc folder.

find /etc -type f -exec sh -c "cat \"{}\" | awk -v "file={}" '{printf \"%s: \",file ; print}'" \; > etc.txt

Each line in the file will start with the filename in which the line was found.

You can also export the list of installed apt packages by running the following command:

dpkg -l > apt.txt

and run

ps -e -o command > proc.txt

to export the running processes, but I wouldn’t use diff here. It is easier to just look at the output.

I found many similar issues easily so I would not link all of them, but you cans earch for

networkmanager docker default routes

on Google. Hopefully one of the results will help.

balskan · February 22, 2024, 9:34pm

It turns out the problem was connman ! … I didn’t think it was running but apparently it was, likely because I had installed multiple desktop environments on this system.

Fixing the blacklist in /etc/connman/main.conf and restarting the daemon and it is working

It is interesting many of the issues found via google remain unresolved, thanks for your assistance in moving this into the right column !!!

Fusseldieb · July 22, 2024, 3:16am

Just installed a clean Debian 12 with Docker, and ran into the very same issue. After some seconds, the connection drops and it only returns if I stop the Docker container(s).

This issue is driving me insane!

What should the value of “NetworkInterfaceBlacklist” be?

EDIT: Figured it out. What worked for me was:

# Open '/etc/connman/main.conf', uncomment NetworkInterfaceBlacklist and change to:
NetworkInterfaceBlacklist = vmnet,vboxnet,docker,veth
# Restart connman daemon
systemctl restart connman.service
# Start your container - issue should be gone

Topic		Replies	Views
Docker is breaking my routing table. Bug or misconfiguration? General	3	1257	February 28, 2024
Temporary failure resolving deb.debian.org inside all docker containers General docker , build , docker-compose	20	19710	April 4, 2023
Intermittent loss of network connectivity General	2	9192	March 13, 2017
Lost Network Connection After Docker Install - Fresh Debian Install General	2	710	May 16, 2024
[resolved] Building image, network lost during build Image Builds	1	1686	May 8, 2019

Docker breaks network after short period

Related topics