I’m completely baffled with this.
I’m running two identical Ubuntu (24.04 minimal) VMs (6 core, 4GB RAM), running only the docker runtime and a couple other utilities (git, samba, nano). Completely fresh install. I’ll call these two (identical) docker hosts Docker 1 and Docker 2. Each host runs about 17/18 containers (mix of wordpress, phpmyadmin, ulogger, gitea, bookstack, mariadb, etc) and has plenty of available RAM and seemingly no issues with CPU contention. Also, each VM guest, while on the same host machine, is on its own SSD.
The issue is, I’m seeing these random spikes/plateaus from SOME containers running on either host. Response times are pretty much identical, where it would spike to 11 seconds for a period and then return to normal. The spikes would also be consistently the same time - 11 seconds in this case, across BOTH hosts. Some other apps don’t seem to ever get the issue, such as Portainer and Gitea. It SEEMs to affect Wordpress sites mostly, but not exclusive to WP sites.
Here’s the weirder thing… sometimes after reboots or restarting containers, the issue will “jump” and start affecting another container, where it will emit the same 11-second spikes/plateaus. The behavior will show even for container apps that are extremely lightweight, such as uLogger or phpMyAdmin (i.e., unauth’d login screens). WTF?
In the screenshots below, I’m using Uptime Kuma and am making HTTP requests every minute, with a delay threshold of 48 seconds (default). The red lines below are from 502 errors reported by the reverse proxy.
So far I have tried:
-
Recreating the hosts - This started when running Ubuntu 20.04, and recreating the hosts with 24.04 made no change. The only difference is the delay spikes changed from 15 seconds to 11 seconds with the new host.
-
Swapping reverse proxies - I originally noticed the issue when using Nginx on Ubuntu 20.04, but noticed the same behavior when moving to Traefik (on both Ubuntu 20.04 and 24.04).
-
Pinging containers directly by port - When I was using Nginx originally, the ports were randomly assigned on the host, so they were available over HTTP. This made no difference, though I did not do too much testing. I abandoned this after noticing Traefik showed the same issues.
The only things common between the two hosts is the fact that Uptime Kuma is pinging these, though I have noticed random spikes when I wrote my own HTTP ping utility in C#, though I’m not 100% certain. UK is pinging every minute, but my utility is pinging every 5 seconds. I see random spikes upward of 20-40 seconds, but not consistently as indicated by UK.
It SEEMS like some sort of networking issue, but I do not know what, as everything is default and traefik is routing on internal container IPs.
Screenshots
(Since I’m new I’m allowed only 1 screenshot…)
Here’s an example of Roundcube running on Docker 1, however, there are similar spikes/plateaus for some other containers on the same host as well as Docker 2 (both are around 11 seconds for containers on both hosts)
Roundcube running on Docker 1:
Any insight is appreciated, as I’m completely baffled by what is going on here. I was hoping a re-build of the VM would help but apparently not. Thank you.