Hello,
I have two docker-compose deployments (A and B), each running several containers, on the same machine. A exposes port XXXX to the machine and B creates many connections to 172.17.0.1:XXXX.
Initially all works well, but after a while (and some amount of connections in the tens or hundreds from B) all attempts to connect to the service at A via 172.17.0.1 on port XXXX get refused. This clears up after a long while.
Before the problem starts I can access port localhost:XXXX from the host machine as well, and after the problem starts I get econnrefused from the host (as well as from B).
The interesting part is that the port is available and serving from within the relevant container in A, both before and after the problem. So it’s not the service that refuses those connections, but rather something in the birdge or in the OS.
Running netstat -anpt
within the container shows :::XXXX
in Listening mode before and after the problem, but when running the same command on the host 0.0.0.0:XXXX
appears before the problem, and disappears after it.
I haven’t found any logs (including nothing in dmesg) and increasing ulimit doesn’t help either. There are about 350 connections from multiple containers before the listening port disappears on the host.
Linux: 5.15.0-1028-aws
Docker version 20.10.17, build 100c70180f
Any help would be highly appreciated!