Java and Redis server communicate within the network. The web-application are sitting behind an NGINX reverse proxy.
After a certain period, the client browsers start to get a 502. The 502s come up randomly and get resolved when refreshed multiple times.
The solution was moving Redis container to another docker network - “Network B”. The Java container was given access to both “Network A” and “Network B”. This fixed the 502s immediately.
Generally, I understand that I’ve reduced the congestion in the “Network A”, by moving Redis out of it. But I don’t exactly understand the nuances.
Could someone please explain why? And also what are the best practices handling HTTP traffic and internal docker networking?
Thank you
Update:
Before switching the networks, the containers were restarted + recreated multiple times.
The logs showed no errors. Also the HTTP requests never reached the web-servers in the containers. Only NGNIX showed the 502s, which meant it was never even able to reach the web-servers.
Here is the simplified version of the docker-compose file:
You state the problem occurs after a certain period. And then it was “immediately” fixed? Maybe because you restarted the containers with the new networks and it just seems fixed?
Even when using a different Docker network, I don’t think that any “congestion” is avoided by that. So personally I don’t think this is really “the solution”.
I fully agree with @bluepuma77. “moving” a container from one network to another also requires recreating it which also means restarting which could “solve” some issues. But I would check the logs in all containers and I would enable verbose or debug logging wherever it is posisble. Gateway error can be returned when a target server is not running or not as expected.
Changing network can help mostly when you have multiple compose services with the same name on the same network so the proxy load balances among all when just one of some of them are actually listening on the required port.
You can share your config if you need more help and someone might be able to catch what is happening.
@bluepuma77@rimelek
Hey, thanks for the response. I forgot to mention that, so I updated my question. The containers were restarted + recreated multiple times before changing networks. I also added the docker-compose file to the question.
I can’t explain that, but I wouldn’t try to guess without a more verbose error message. When there is an HTTP 502, that is returned by a server and when a server returns that, it has to know why. When I wrote about error messages, I didn’t mean the container that couldn’t be reached but the proxy server that was supposed to reach it. Your nginx reverse proxy.
Your networks in the compose file are external. So there is no way to tell what else is on that network if we assume the reverse proxy was trying to reach another container.
I’m replying only now because I found it hard to connect the pieces together. Like what is network A and B and which is which in your compose file or which network is used by the proxy server and whether the compose file is the original setup or the fixed one. If you explained it, I missed it, but when you simplify things, it is better if you use the right names to refer to. After reading the updated post the multiple times, I understand today that is is the new compose file as you have two networks.
So if you can reproduce the issue again with the original setup and configure your reverese proxy to show you a verbose error message, I can’t promise, but at least there is a chance that we can give you a better answer.
I wouldn’t rule out a weird docker network error completely, but I wouldn’t jump to that conclusion yet either
Can you also tell more about you Docker environment?
We usually need the following information to understand the issue:
What platform are you using? Windows, Linux or macOS? Which version of the operating systems? In case of Linux, which distribution?
How did you install Docker? Sharing the platform almost answers it, but only almost. Direct links to the followed guide can be useful.
On debian based Linux, the following commands can give us some idea and recognize incorrectly installed Docker:
docker info
docker version
Review the output before sharing and remove confidential data if any appears (public IP for example)
nginx is known to cache resolved IPs indefinitely. The post shows how to mitigate the problem.
You might want to consider switching to traefik as a reverse proxy, as it uses the docker event stream to register/unregister the reverse proxy rules for containers.
The networks were created in the Docker. Here are the networks:
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
e3df3df02ed appnetwork bridge local
ad9a91a838b shared-network bridge local
6f9a2047a75 bridge bridge local
246cb50ee64 host host local
The issue didn’t reoccur since the networks have been shifted. So I can’t check the logs.
I spoke to a few network people and they suggest it could something with docker network’s port/IP exhaustion? Is that a possibility? If so, is there a way I can reset the networks every night via a cron-task?
Nginx in my server context is not a Docker container. It is managed by a server management software. And is installed on the bare metal server. The container ports are exposed by Docker and the Ngnix connects to these ports.
There are multiple things I have to point out. One was mentioned by @bluepuma77 already
You are using AlmaLinux which is not support by Docker officially. Even if a distro is based on an officially supported one it is not that and can be differences. Here are the supported distros: Install | Docker Docs
Your Docker client is v25.0.4 and the server is 26.1.3. Although the versions don’t have to be the same, but the best if they are the same. Also I don’t think any of the versions are supported as the current latest version is v28.1.1 and only v28 and v27 is mentioned in the documentation’s “Release notes” summary page: Release notes | Docker Docs
Your cgroupfs version is v1, which is a legacy version, but it is probably not related to your issue
I guess anything is possible, but I’m not a network guy myself, so I deal with issues when I met them, but I haven’t met this one. But I don’t see how it would be IP exhaustion if your container could start. Docker would not let you create a container with network when there is no available IP left. The same with ports, unless you mean dynamic ports for TCP communication, but if you have problems with the number of ports, something must be seriously wrong with your app, and I don’t think that would be the case. but again, not a network expert here.
If you are satisfied with how it works now, that’s okay, but if it was indeed the changed network that solved the issue, you will not be able to reproduce it, so if you want to make sure you know what fixed it, you could try to run another test project as it was before the fix. Or just wait and see if it ever ocures again. The topic wil be automatically closed after 30 month, so if you have the issue again after that, you can open a new topic (we can merge that to this one if needed).
So how do you refer to the services from there? Using the loopback IPs like 127.0.0.1:5002. Then why are your networks external? Normally you would have a compose project level network for internal communication between containers in the same project and one additional network for the reverse proxy container so that can access your web server in the compose project.
Be careful with external networks, because then you cannot always use the service names to refer to another container in the project, since it doesn’t matter in which project a service is as long as it is on the same network and requests to the containers with the same service name will be load balanced.