Hello,
I’m trying to investigate intermittent issues where communication between two containers running in a stack deployed on a single-machine swarm sometimes fail. One container is running nginx
and and behaves as a reverse proxy. The other container is the upstream server.
Intermittently, 504 errors are being generated after a 60 second timeout. My best guess so far is that nginx is using a keep-alive connection that is was previously established to send a new request. I can confirm that the other side never receives that specific connection, but that other traffic does go through (using distinct connections). I believe the NAT state must have been lost/discarded somehow, but I do not know how to prove that hypothesis. Can you recommend tools that could help me achieve that?
I was able to create a scenario where I blocked traffic between the two containers using iptables in order to confirm that nginx would really produce the 504 error code in such a situation, which was verified.
Thanks for your help!