Host machine loses internet connection ~20-30 hours after starting docker container

I’ve posted this question on the Raspberry Pi StackExchange originally.

TL;DR: I have two Raspberry Pi’s that have different docker/docker-compose containers running on them, but both of them will work for about a day and suddenly lose internet connectivity on the host machine.

I’m having this issue on both a Pi 3B and a Pi 4 4GB model. I’m running some programs I wrote that run with docker-compose, and most days, if not every day, around the same times I suddenly won’t be able to connect to them from the local network.

From my computer I’ll not be able to ping/ssh them or load any web servers they host (even the non-docker ones). When I connect a monitor I can see they’re still running but are having DNS resolution issues so it can’t connect to anything. They also still show up on my router’s connected devices list. I tried using Cloudflare’s 1.1.1.1 on the 3B (using the Pi’s network settings modal) but the issue still came back the next day.

Additionally, the first time it happened on the Pi 4, I could connect to a web server hosted on it that’s port forwarded using my external IP, but not the local IP, and it wouldn’t respond to ssh. Was really strange, restarting the Pi fixed it.

This only started happening since I started using docker. Both of them had been running totally fine for a long time, then I dockerized something I was running on the Pi 3B and it started having this issue soon after. Then I made a new project a few days later and hosted it on the Pi 4 and then it started having this issue as well. Both of them are connected via ethernet. I have a Pi 3B+ that is also running the same thing as the Pi 3B (a python script in docker-compose that records things with the pi camera module) though it uses wifi and has not had this issue. Restarting the Pi would fix it, but I don’t want to have to power cycle them every day.

All of them are using static IPs that are outside the router’s DHCP address pool range so I know it’s not an IP address conflict with another device on the network. I tried disabling ipv6 and that didn’t help.

They’re all running Raspberry Pi OS and should be pretty up to date. I installed docker using the convenience script they say to use for Raspberry Pis, and installed docker-compose with pip3.

This is my docker setup running on the Pi 4. docker-compose.yml

version: '3.7'
services:
  app:
    build:
      context: .
      target: prod
    depends_on:
      - postgres
    restart: always
    ports:
      - 4042:3000
  postgres:
    image: postgres:13-alpine
    environment:
      POSTGRES_USER: $PGUSER
      POSTGRES_PASSWORD: $PGPASSWORD
      POSTGRES_DB: $PGDATABASE
    volumes:
      - type: volume
        source: postgres
        target: /var/lib/postgresql/data
        volume:
          nocopy: true
      - type: bind
        source: ./config/postgres
        target: /docker-entrypoint-initdb.d/
    restart: always
volumes:
  postgres:

Dockerfile

FROM node:12 AS dev
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm install

CMD ["npx", "nodemon", "konshuu-server.js"]

FROM dev AS prod
COPY . .
RUN npm run build
CMD ["node", "konshuu-server.js"]

I saw similar issues on other topics (like this one: Can’t access internet after installing docker in a fresh ubuntu 18.04 machine), but that sounds like it happens immediately, mine is fine for like a day before it cuts out.

Docker version 19.03.11, build 42e35e6
docker-compose version 1.26.0, build unknown

Using an up to date version of Raspberry Pi OS.

I think I’ve fixed it, it’s been three days and I think it would otherwise have broken by now but things are going well.

I checked /var/log/syslog and saw a couple suspicious looking things. One was a bunch of logs from dhcpcd that looked like:

dhcpcd[411]: eth0: pid 411 deleted route to 192.168.1.0/24
dhcpcd[411]: eth0: received approval for 192.168.1.200 (that's that device's static IP)
dhcpcd[411]: eth0: adding route to 192.168.1.0/24
dhcpcd[411]: eth0: adding default route via 192.168.1.1

And a lot of other similar looking messages between eth0 and some other docker created network interfaces. Those messages were coming in around 20 per second for hours and hours. The log was absolutely spammed with them, like hundreds of thousands of those.

Then right around when the internet would cut out there would be several logs like this:

dhcpcd[411]: eth0: checksum failure from 192.168.1.1

What causes that checksum failure I’m not sure. That being said I searched for this and found some similar threads from other people were having similar issues and it came down to how the static IP was set in my case at least. Raspberry Pi OS lets you edit your network settings with a GUI. That GUI has a checkbox for “Automatically configure empty options” and I had left that checked and only filled out the IPv4 Address. That was always fine beforehand but for whatever reason since starting to use Docker that doesn’t seem to work.

I undid everything in that GUI and edited /etc/dhcpcd.conf instead like is suggested on the Raspberry Pi website’s instructions for setting a static IP and since then I’ve not had any issues on either of the two Raspberry Pi devices.