Thanks for your reply. I have done some digging on the Docker GitHub, and it looks like the issue isn’t actually related to AWS. Instead, it seems like I may be running into a longstanding bug with Docker networking. Despite the issue being in the Windows repo, and mentioning accessing IPs on the host, there are people experiencing this both on Mac (and even one report from Linux) and also to non-host addresses.
If this turns out to be the issue, it would explain a lot of perplexing things about the problem:
- why restarting the container doesn’t fix the issue (because the problem is with Docker networking itself)
- why the host is still reachable from outside the container (ditto)
- why my team mates are seeing the same behavior in different networks/countries (we’re all on recent versions of Docker)
- why it seems to come and go (it gets worse with time until Docker restart)
- and why it seems to preferentially affect certain hosts (the problem has something to do with incomplete TCP handshakes, and so your connections will begin to fail to hosts you’ve exchanged more traffic with).
There are a number of comments on the issue providing steps to reproduce. While the problem seems to be affecting more people after release 4.5.0 (from Feb 2022), the original bug was filed in 2020. The original reporter discussed it with the Docker support team in December 2020, and they know what the problem is but have yet to provide a fix.
I have gone back to 4.5.0 for the time being, and we’ll see if the issue presents after a few days of Docker uptime. Other than using an earlier version, the only reliable solution discussed in the issue is to restart Docker itself. I launched version 4.5.0 one day ago and haven’t seen the problem yet, so fingers crossed that will solve the issue for me until a fix can be deployed.