We have about 15 individual apache containers running on a server, and all was working fine for about two years. Yesterday something happened overnight that caused the docker daemon to restart. When the daemon restarted, three interesting things happened.
1 - The ports of the host seemed to be locked somehow. We could not connect to any of the containers with curl/wget. We ran checks, there was nothing else running on those ports. And restarting the docker daemon again seemed to fix this issue.
2 - 12 of them started back up normally. 3 of them started back up with an image ID (with no name) that seems to be what was there about 3 months ago. Now the image it should have booted up with still was there in docker images. But it wasn’t being used.
3 - All containers use the exact same compose file. They are all set to restart: unless-stopped. 13 of them came back up. Two did not. Why? All 15 were running before the daemon was restarted.
We tried to restart the daemon again. A different set of 3 came back with the strange outdated image.
We manually rmi the invalid image ID and re-ran the containers, and tried again. This time we had the ports lock again, and another set of 3 images were switched.
Has anybody else had this happen? Where are these rogue images being stored/accessed, and why does this happen? There are no unused images in docker images. We’ve run system prunes to clean up.
Is there some kind of limit to how many containers will restart gracefully after a daemon is restarted? Does a docker restart spike the CPU/memory and then crash the restart process? This was never a problem for two years when we only had 10. I wonder if those extra five we recently added has put Docker over the top of what it can handle.
Anybody have any ideas here?