172.17.0.1 starts refusing connections to a port

Hello,

I have two docker-compose deployments (A and B), each running several containers, on the same machine. A exposes port XXXX to the machine and B creates many connections to 172.17.0.1:XXXX.

Initially all works well, but after a while (and some amount of connections in the tens or hundreds from B) all attempts to connect to the service at A via 172.17.0.1 on port XXXX get refused. This clears up after a long while.

Before the problem starts I can access port localhost:XXXX from the host machine as well, and after the problem starts I get econnrefused from the host (as well as from B).

The interesting part is that the port is available and serving from within the relevant container in A, both before and after the problem. So it’s not the service that refuses those connections, but rather something in the birdge or in the OS.

Running netstat -anpt within the container shows :::XXXX in Listening mode before and after the problem, but when running the same command on the host 0.0.0.0:XXXX appears before the problem, and disappears after it.

I haven’t found any logs (including nothing in dmesg) and increasing ulimit doesn’t help either. There are about 350 connections from multiple containers before the listening port disappears on the host.

Linux: 5.15.0-1028-aws
Docker version 20.10.17, build 100c70180f

Any help would be highly appreciated!