Intermittent ConnectTimeoutError from within Docker, but only when accessing AWS SSM

I have a Docker container which I am starting locally using compose. It is added to a user-created network:

version: "3.9"

services:
  web:
    image: my-image:latest
    ports:
      - ${API_PORT:-5050}:5000
    networks:
      - flasknetwork
# ... etc

networks:
  flasknetwork:
    name: flasknetwork
    driver: bridge

My app uses AWS SSM Parameter Store from within the web container both on Fargate instances and locally. I’m accessing it with Boto3 from Python. For close to a year, multiple developers on my team, in different countries, have seen an intermittent connectivity issue. It crops up maybe a few times a day during continuous development, where for 10 minutes or so, calls to SSM will fail with this error:

botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: “https://ssm.us-east-2.amazonaws.com/

The ECS instances do not see the issue as far as I’m aware, this is only a problem when we’re accessing the endpoint from our home networks.

I have added a connectivity test which fetches 4 URLs right before the Boto3 call which would fail. The first 3 always succeed. The 4th only succeeds sometimes:

https://www.apple.com                       status_code=200 len=104235 0.0448sec
https://cognito-idp.us-east-2.amazonaws.com status_code=400 len=113    0.3786sec
https://xxx.dkr.ecr.us-east-2.amazonaws.com status_code=401 len=15     0.3859sec
https://ssm.us-east-2.amazonaws.com         status_code=404 len=29     0.3849sec

(Don’t be confused by the 40x status codes. Those are just because I haven’t sent a real, authenticated request. The key thing is that I received a timely response.)

This same request fails other times:

requests.exceptions.ConnectTimeout: HTTPSConnectionPool(
host='ssm.us-east-2.amazonaws.com', port=443): Max retries
exceeded with url: / (Caused by ConnectTimeoutError(
<urllib3.connection.HTTPSConnection object at 0xffff8e6af550>,
'Connection to ssm.us-east-2.amazonaws.com timed out.
(connect timeout=3)'))

I set the timeout to 3 seconds here, but it has also timed out when I let the connection wait for over 2 minutes. Unlike the first error, this is a direct HTTPS fetch with requests, so I’m not even using Boto3. But it’s still failing.

Some things we’ve tried:

  1. Restarting the Docker container sometimes seems to help, but other times doesn’t. It’s possible that it’s just the act of waiting that’s fixing the immediate problem.
  2. Reducing the number of calls we make to SSM. It’s now down to about 2/sec per user at the maximum, with effectively no other users cuncurrently hitting the API. So we’re never getting anywhere near the 40 requests/second limit. In looking at the logs, the most I can see is 12 requests in one minute. We’re just not using this very agressively, so it doesn’t seem possible that the problem is throttling. All of our calls are paginated calls to GetParametersByPath, and we are using WithDecryption=true.
  3. Changing the Boto3 retry method from Legacy to Standard. This is probably a good thing to do anyway, but has not fixed the problem.

The only reliable solution I’ve come up with is to wait. Eventually, the endpoint comes back and my application begins working again. But this is really an unacceptable level of service interruption, and I feel like I must be doing something wrong.

What would make only the SSM host become unreachable so often? I don’t see how this could be an issue with my Docker container if other URLs work just fine. But equally, if requests AND Boto3 are both failing, then it seems like it has to be either my container or the AWS endpoint itself. And obviously the us-east-2 SSM host isn’t constantly going down for minutes at a time.

I have tried pinging the endpoint during the problem, but from the host machine, outside of Docker. And the results are… strange:

PING ssm.us-east-2.amazonaws.com (52.95.21.209): 56 data bytes
64 bytes from 52.95.21.209: icmp_seq=0 ttl=229 time=89.665 ms
64 bytes from 52.95.21.209: icmp_seq=1 ttl=229 time=92.928 ms
64 bytes from 52.95.21.209: icmp_seq=2 ttl=229 time=89.970 ms
64 bytes from 52.95.21.209: icmp_seq=3 ttl=229 time=92.004 ms
64 bytes from 52.95.21.209: icmp_seq=4 ttl=229 time=93.007 ms
64 bytes from 52.95.21.209: icmp_seq=5 ttl=229 time=93.066 ms
64 bytes from 52.95.21.209: icmp_seq=6 ttl=229 time=93.358 ms
64 bytes from 52.95.21.209: icmp_seq=7 ttl=229 time=87.980 ms
64 bytes from 52.95.21.209: icmp_seq=8 ttl=229 time=92.416 ms
64 bytes from 52.95.21.209: icmp_seq=9 ttl=229 time=92.361 ms
64 bytes from 52.95.21.209: icmp_seq=10 ttl=229 time=88.709 ms
64 bytes from 52.95.21.209: icmp_seq=11 ttl=229 time=91.613 ms
64 bytes from 52.95.21.209: icmp_seq=12 ttl=229 time=93.175 ms
64 bytes from 52.95.21.209: icmp_seq=13 ttl=229 time=93.545 ms
Request timeout for icmp_seq 14
Request timeout for icmp_seq 15
Request timeout for icmp_seq 16
Request timeout for icmp_seq 17
64 bytes from 52.95.21.209: icmp_seq=18 ttl=229 time=89.668 ms
64 bytes from 52.95.21.209: icmp_seq=19 ttl=229 time=93.205 ms
64 bytes from 52.95.21.209: icmp_seq=20 ttl=229 time=92.234 ms
64 bytes from 52.95.21.209: icmp_seq=21 ttl=229 time=92.995 ms
64 bytes from 52.95.21.209: icmp_seq=22 ttl=229 time=93.140 ms
64 bytes from 52.95.21.209: icmp_seq=23 ttl=229 time=92.720 ms
64 bytes from 52.95.21.209: icmp_seq=24 ttl=229 time=93.945 ms
64 bytes from 52.95.21.209: icmp_seq=25 ttl=229 time=93.641 ms
64 bytes from 52.95.21.209: icmp_seq=26 ttl=229 time=93.599 ms
64 bytes from 52.95.21.209: icmp_seq=27 ttl=229 time=91.851 ms
64 bytes from 52.95.21.209: icmp_seq=28 ttl=229 time=90.349 ms
64 bytes from 52.95.21.209: icmp_seq=29 ttl=229 time=95.998 ms
64 bytes from 52.95.21.209: icmp_seq=30 ttl=229 time=93.568 ms
64 bytes from 52.95.21.209: icmp_seq=31 ttl=229 time=93.292 ms
64 bytes from 52.95.21.209: icmp_seq=32 ttl=229 time=93.491 ms
64 bytes from 52.95.21.209: icmp_seq=33 ttl=229 time=93.167 ms
Request timeout for icmp_seq 34
64 bytes from 52.95.21.209: icmp_seq=35 ttl=229 time=93.613 ms
64 bytes from 52.95.21.209: icmp_seq=36 ttl=229 time=91.564 ms
64 bytes from 52.95.21.209: icmp_seq=37 ttl=229 time=96.495 ms
64 bytes from 52.95.21.209: icmp_seq=38 ttl=229 time=93.870 ms
64 bytes from 52.95.21.209: icmp_seq=39 ttl=229 time=93.629 ms
64 bytes from 52.95.21.209: icmp_seq=40 ttl=229 time=93.487 ms
64 bytes from 52.95.21.209: icmp_seq=41 ttl=229 time=96.892 ms
64 bytes from 52.95.21.209: icmp_seq=42 ttl=229 time=91.220 ms
64 bytes from 52.95.21.209: icmp_seq=43 ttl=229 time=93.394 ms
64 bytes from 52.95.21.209: icmp_seq=44 ttl=229 time=91.774 ms
64 bytes from 52.95.21.209: icmp_seq=45 ttl=229 time=94.031 ms
Request timeout for icmp_seq 46
64 bytes from 52.95.21.209: icmp_seq=47 ttl=229 time=96.748 ms
64 bytes from 52.95.21.209: icmp_seq=48 ttl=229 time=93.024 ms
64 bytes from 52.95.21.209: icmp_seq=49 ttl=229 time=92.414 ms
64 bytes from 52.95.21.209: icmp_seq=50 ttl=229 time=96.475 ms
64 bytes from 52.95.21.209: icmp_seq=51 ttl=229 time=93.447 ms
64 bytes from 52.95.21.209: icmp_seq=52 ttl=229 time=92.959 ms
64 bytes from 52.95.21.209: icmp_seq=53 ttl=229 time=93.353 ms
64 bytes from 52.95.21.209: icmp_seq=54 ttl=229 time=93.371 ms
64 bytes from 52.95.21.209: icmp_seq=55 ttl=229 time=92.530 ms
64 bytes from 52.95.21.209: icmp_seq=56 ttl=229 time=94.401 ms
64 bytes from 52.95.21.209: icmp_seq=57 ttl=229 time=93.797 ms
64 bytes from 52.95.21.209: icmp_seq=58 ttl=229 time=92.076 ms
Request timeout for icmp_seq 59
64 bytes from 52.95.21.209: icmp_seq=60 ttl=229 time=91.602 ms
64 bytes from 52.95.21.209: icmp_seq=61 ttl=229 time=92.835 ms
Request timeout for icmp_seq 62
64 bytes from 52.95.21.209: icmp_seq=63 ttl=229 time=92.903 ms
64 bytes from 52.95.21.209: icmp_seq=64 ttl=229 time=93.302 ms
64 bytes from 52.95.21.209: icmp_seq=65 ttl=229 time=93.623 ms
64 bytes from 52.95.21.209: icmp_seq=66 ttl=229 time=93.638 ms
64 bytes from 52.95.21.209: icmp_seq=67 ttl=229 time=93.395 ms
64 bytes from 52.95.21.209: icmp_seq=68 ttl=229 time=92.432 ms

Those points where the pings time out are exactly when the container prints the “ConnectTimeoutError” to the Docker console. I don’t know what to make of this.

Is there a setting I have overlooked? Does anyone have other debugging ideas?

If the timeout happens also from your host, I don’t see how it is related to Docker. I wanted to suggest to ask on an AWS forum, then I found this so I guess you have already done that two months ago:

https://repost.aws/questions/QUpGBax4uuR82ubxwKjIZz8g/intermittent-connect-timeout-error-accessing-ssm

Unfortunately I can’t give you more ideas except that I tried to ping that hostname and it looks like I get timeout sometimes, but not as often as you:

PING ssm.us-east-2.amazonaws.com (52.95.21.209): 56 data bytes
64 bytes from 52.95.21.209: icmp_seq=0 ttl=220 time=121.813 ms
64 bytes from 52.95.21.209: icmp_seq=1 ttl=220 time=121.547 ms
64 bytes from 52.95.21.209: icmp_seq=2 ttl=220 time=122.071 ms
64 bytes from 52.95.21.209: icmp_seq=3 ttl=220 time=121.920 ms
64 bytes from 52.95.21.209: icmp_seq=4 ttl=220 time=122.074 ms
64 bytes from 52.95.21.209: icmp_seq=5 ttl=220 time=122.011 ms
64 bytes from 52.95.21.209: icmp_seq=6 ttl=220 time=121.953 ms
64 bytes from 52.95.21.209: icmp_seq=7 ttl=220 time=122.256 ms
64 bytes from 52.95.21.209: icmp_seq=8 ttl=220 time=121.772 ms
64 bytes from 52.95.21.209: icmp_seq=9 ttl=220 time=121.810 ms
64 bytes from 52.95.21.209: icmp_seq=10 ttl=220 time=121.619 ms
64 bytes from 52.95.21.209: icmp_seq=11 ttl=220 time=122.394 ms
64 bytes from 52.95.21.209: icmp_seq=12 ttl=220 time=121.942 ms
64 bytes from 52.95.21.209: icmp_seq=13 ttl=220 time=121.849 ms
64 bytes from 52.95.21.209: icmp_seq=14 ttl=220 time=121.976 ms
64 bytes from 52.95.21.209: icmp_seq=15 ttl=220 time=122.379 ms
64 bytes from 52.95.21.209: icmp_seq=16 ttl=220 time=121.983 ms
64 bytes from 52.95.21.209: icmp_seq=17 ttl=220 time=121.980 ms
64 bytes from 52.95.21.209: icmp_seq=18 ttl=220 time=121.927 ms
64 bytes from 52.95.21.209: icmp_seq=19 ttl=220 time=122.304 ms
64 bytes from 52.95.21.209: icmp_seq=20 ttl=220 time=121.853 ms
64 bytes from 52.95.21.209: icmp_seq=21 ttl=220 time=120.973 ms
64 bytes from 52.95.21.209: icmp_seq=22 ttl=220 time=122.048 ms
64 bytes from 52.95.21.209: icmp_seq=23 ttl=220 time=121.942 ms
64 bytes from 52.95.21.209: icmp_seq=24 ttl=220 time=121.751 ms
64 bytes from 52.95.21.209: icmp_seq=25 ttl=220 time=125.305 ms
64 bytes from 52.95.21.209: icmp_seq=26 ttl=220 time=122.202 ms
64 bytes from 52.95.21.209: icmp_seq=27 ttl=220 time=122.254 ms
64 bytes from 52.95.21.209: icmp_seq=28 ttl=220 time=122.135 ms
64 bytes from 52.95.21.209: icmp_seq=29 ttl=220 time=121.831 ms
64 bytes from 52.95.21.209: icmp_seq=30 ttl=220 time=121.926 ms
64 bytes from 52.95.21.209: icmp_seq=31 ttl=220 time=121.826 ms
64 bytes from 52.95.21.209: icmp_seq=32 ttl=220 time=121.760 ms
64 bytes from 52.95.21.209: icmp_seq=33 ttl=220 time=121.615 ms
64 bytes from 52.95.21.209: icmp_seq=34 ttl=220 time=121.842 ms
64 bytes from 52.95.21.209: icmp_seq=35 ttl=220 time=121.899 ms
64 bytes from 52.95.21.209: icmp_seq=36 ttl=220 time=121.954 ms
64 bytes from 52.95.21.209: icmp_seq=37 ttl=220 time=121.403 ms
64 bytes from 52.95.21.209: icmp_seq=38 ttl=220 time=122.560 ms
64 bytes from 52.95.21.209: icmp_seq=39 ttl=220 time=121.889 ms
64 bytes from 52.95.21.209: icmp_seq=40 ttl=220 time=122.105 ms
64 bytes from 52.95.21.209: icmp_seq=41 ttl=220 time=121.829 ms
64 bytes from 52.95.21.209: icmp_seq=42 ttl=220 time=122.184 ms
64 bytes from 52.95.21.209: icmp_seq=43 ttl=220 time=121.903 ms
64 bytes from 52.95.21.209: icmp_seq=44 ttl=220 time=122.012 ms
64 bytes from 52.95.21.209: icmp_seq=45 ttl=220 time=122.133 ms
64 bytes from 52.95.21.209: icmp_seq=46 ttl=220 time=122.037 ms
64 bytes from 52.95.21.209: icmp_seq=47 ttl=220 time=121.982 ms
64 bytes from 52.95.21.209: icmp_seq=48 ttl=220 time=121.973 ms
64 bytes from 52.95.21.209: icmp_seq=49 ttl=220 time=122.100 ms
64 bytes from 52.95.21.209: icmp_seq=50 ttl=220 time=121.878 ms
64 bytes from 52.95.21.209: icmp_seq=51 ttl=220 time=121.806 ms
64 bytes from 52.95.21.209: icmp_seq=52 ttl=220 time=121.689 ms
64 bytes from 52.95.21.209: icmp_seq=53 ttl=220 time=121.934 ms
64 bytes from 52.95.21.209: icmp_seq=54 ttl=220 time=122.304 ms
64 bytes from 52.95.21.209: icmp_seq=55 ttl=220 time=121.800 ms
64 bytes from 52.95.21.209: icmp_seq=56 ttl=220 time=121.437 ms
64 bytes from 52.95.21.209: icmp_seq=57 ttl=220 time=121.904 ms
64 bytes from 52.95.21.209: icmp_seq=58 ttl=220 time=121.484 ms
64 bytes from 52.95.21.209: icmp_seq=59 ttl=220 time=121.966 ms
64 bytes from 52.95.21.209: icmp_seq=60 ttl=220 time=122.030 ms
64 bytes from 52.95.21.209: icmp_seq=61 ttl=220 time=122.087 ms
64 bytes from 52.95.21.209: icmp_seq=62 ttl=220 time=121.739 ms
64 bytes from 52.95.21.209: icmp_seq=63 ttl=220 time=122.242 ms
64 bytes from 52.95.21.209: icmp_seq=64 ttl=220 time=121.904 ms
64 bytes from 52.95.21.209: icmp_seq=65 ttl=220 time=121.893 ms
64 bytes from 52.95.21.209: icmp_seq=66 ttl=220 time=121.840 ms
64 bytes from 52.95.21.209: icmp_seq=67 ttl=220 time=122.122 ms
64 bytes from 52.95.21.209: icmp_seq=68 ttl=220 time=122.187 ms
64 bytes from 52.95.21.209: icmp_seq=69 ttl=220 time=122.181 ms
64 bytes from 52.95.21.209: icmp_seq=70 ttl=220 time=121.626 ms
64 bytes from 52.95.21.209: icmp_seq=71 ttl=220 time=121.958 ms
64 bytes from 52.95.21.209: icmp_seq=72 ttl=220 time=121.841 ms
64 bytes from 52.95.21.209: icmp_seq=73 ttl=220 time=121.665 ms
64 bytes from 52.95.21.209: icmp_seq=74 ttl=220 time=123.574 ms
64 bytes from 52.95.21.209: icmp_seq=75 ttl=220 time=121.961 ms
64 bytes from 52.95.21.209: icmp_seq=76 ttl=220 time=122.190 ms
64 bytes from 52.95.21.209: icmp_seq=77 ttl=220 time=122.089 ms
64 bytes from 52.95.21.209: icmp_seq=78 ttl=220 time=122.101 ms
64 bytes from 52.95.21.209: icmp_seq=79 ttl=220 time=122.040 ms
64 bytes from 52.95.21.209: icmp_seq=80 ttl=220 time=121.556 ms
64 bytes from 52.95.21.209: icmp_seq=81 ttl=220 time=121.836 ms
64 bytes from 52.95.21.209: icmp_seq=82 ttl=220 time=121.870 ms
64 bytes from 52.95.21.209: icmp_seq=83 ttl=220 time=121.890 ms
64 bytes from 52.95.21.209: icmp_seq=84 ttl=220 time=121.833 ms
64 bytes from 52.95.21.209: icmp_seq=85 ttl=220 time=122.067 ms
Request timeout for icmp_seq 86
64 bytes from 52.95.21.209: icmp_seq=87 ttl=220 time=122.570 ms
64 bytes from 52.95.21.209: icmp_seq=88 ttl=220 time=121.706 ms
64 bytes from 52.95.21.209: icmp_seq=89 ttl=220 time=121.929 ms
64 bytes from 52.95.21.209: icmp_seq=90 ttl=220 time=121.995 ms
64 bytes from 52.95.21.209: icmp_seq=91 ttl=220 time=122.703 ms
64 bytes from 52.95.21.209: icmp_seq=92 ttl=220 time=121.861 ms
64 bytes from 52.95.21.209: icmp_seq=93 ttl=220 time=121.725 ms
64 bytes from 52.95.21.209: icmp_seq=94 ttl=220 time=121.969 ms
64 bytes from 52.95.21.209: icmp_seq=95 ttl=220 time=122.283 ms
64 bytes from 52.95.21.209: icmp_seq=96 ttl=220 time=122.152 ms
64 bytes from 52.95.21.209: icmp_seq=97 ttl=220 time=122.085 ms
64 bytes from 52.95.21.209: icmp_seq=98 ttl=220 time=121.916 ms
64 bytes from 52.95.21.209: icmp_seq=99 ttl=220 time=121.833 ms
64 bytes from 52.95.21.209: icmp_seq=100 ttl=220 time=121.867 ms
64 bytes from 52.95.21.209: icmp_seq=101 ttl=220 time=121.820 ms
64 bytes from 52.95.21.209: icmp_seq=102 ttl=220 time=121.947 ms
64 bytes from 52.95.21.209: icmp_seq=103 ttl=220 time=121.901 ms
64 bytes from 52.95.21.209: icmp_seq=104 ttl=220 time=121.851 ms
64 bytes from 52.95.21.209: icmp_seq=105 ttl=220 time=122.035 ms
64 bytes from 52.95.21.209: icmp_seq=106 ttl=220 time=121.859 ms
64 bytes from 52.95.21.209: icmp_seq=107 ttl=220 time=122.009 ms
64 bytes from 52.95.21.209: icmp_seq=108 ttl=220 time=122.104 ms
64 bytes from 52.95.21.209: icmp_seq=109 ttl=220 time=122.077 ms
64 bytes from 52.95.21.209: icmp_seq=110 ttl=220 time=121.714 ms
64 bytes from 52.95.21.209: icmp_seq=111 ttl=220 time=121.457 ms
64 bytes from 52.95.21.209: icmp_seq=112 ttl=220 time=121.934 ms
64 bytes from 52.95.21.209: icmp_seq=113 ttl=220 time=121.941 ms
64 bytes from 52.95.21.209: icmp_seq=114 ttl=220 time=121.932 ms
64 bytes from 52.95.21.209: icmp_seq=115 ttl=220 time=122.042 ms
64 bytes from 52.95.21.209: icmp_seq=116 ttl=220 time=121.673 ms
64 bytes from 52.95.21.209: icmp_seq=117 ttl=220 time=122.011 ms
64 bytes from 52.95.21.209: icmp_seq=118 ttl=220 time=122.324 ms
64 bytes from 52.95.21.209: icmp_seq=119 ttl=220 time=121.903 ms
64 bytes from 52.95.21.209: icmp_seq=120 ttl=220 time=121.835 ms
64 bytes from 52.95.21.209: icmp_seq=121 ttl=220 time=121.631 ms
64 bytes from 52.95.21.209: icmp_seq=122 ttl=220 time=121.922 ms
64 bytes from 52.95.21.209: icmp_seq=123 ttl=220 time=121.930 ms
64 bytes from 52.95.21.209: icmp_seq=124 ttl=220 time=122.127 ms
64 bytes from 52.95.21.209: icmp_seq=125 ttl=220 time=122.192 ms
64 bytes from 52.95.21.209: icmp_seq=126 ttl=220 time=121.971 ms
64 bytes from 52.95.21.209: icmp_seq=127 ttl=220 time=121.994 ms
64 bytes from 52.95.21.209: icmp_seq=128 ttl=220 time=121.899 ms
64 bytes from 52.95.21.209: icmp_seq=129 ttl=220 time=122.064 ms
64 bytes from 52.95.21.209: icmp_seq=130 ttl=220 time=121.984 ms
64 bytes from 52.95.21.209: icmp_seq=131 ttl=220 time=121.705 ms
64 bytes from 52.95.21.209: icmp_seq=132 ttl=220 time=121.966 ms
64 bytes from 52.95.21.209: icmp_seq=133 ttl=220 time=121.876 ms
64 bytes from 52.95.21.209: icmp_seq=134 ttl=220 time=121.590 ms
64 bytes from 52.95.21.209: icmp_seq=135 ttl=220 time=122.004 ms
64 bytes from 52.95.21.209: icmp_seq=136 ttl=220 time=121.868 ms
64 bytes from 52.95.21.209: icmp_seq=137 ttl=220 time=121.910 ms
64 bytes from 52.95.21.209: icmp_seq=138 ttl=220 time=121.894 ms
64 bytes from 52.95.21.209: icmp_seq=139 ttl=220 time=121.887 ms
64 bytes from 52.95.21.209: icmp_seq=140 ttl=220 time=121.953 ms
64 bytes from 52.95.21.209: icmp_seq=141 ttl=220 time=121.867 ms
64 bytes from 52.95.21.209: icmp_seq=142 ttl=220 time=121.907 ms
64 bytes from 52.95.21.209: icmp_seq=143 ttl=220 time=122.145 ms
64 bytes from 52.95.21.209: icmp_seq=144 ttl=220 time=121.715 ms
64 bytes from 52.95.21.209: icmp_seq=145 ttl=220 time=121.988 ms
64 bytes from 52.95.21.209: icmp_seq=146 ttl=220 time=121.977 ms
64 bytes from 52.95.21.209: icmp_seq=147 ttl=220 time=122.038 ms
64 bytes from 52.95.21.209: icmp_seq=148 ttl=220 time=122.033 ms
64 bytes from 52.95.21.209: icmp_seq=149 ttl=220 time=122.155 ms
64 bytes from 52.95.21.209: icmp_seq=150 ttl=220 time=121.829 ms
64 bytes from 52.95.21.209: icmp_seq=151 ttl=220 time=122.279 ms
64 bytes from 52.95.21.209: icmp_seq=152 ttl=220 time=121.916 ms
64 bytes from 52.95.21.209: icmp_seq=153 ttl=220 time=122.078 ms
64 bytes from 52.95.21.209: icmp_seq=154 ttl=220 time=122.014 ms
64 bytes from 52.95.21.209: icmp_seq=155 ttl=220 time=121.953 ms
64 bytes from 52.95.21.209: icmp_seq=156 ttl=220 time=122.003 ms
64 bytes from 52.95.21.209: icmp_seq=157 ttl=220 time=121.852 ms
64 bytes from 52.95.21.209: icmp_seq=158 ttl=220 time=121.941 ms
64 bytes from 52.95.21.209: icmp_seq=159 ttl=220 time=121.937 ms
64 bytes from 52.95.21.209: icmp_seq=160 ttl=220 time=122.339 ms
64 bytes from 52.95.21.209: icmp_seq=161 ttl=220 time=121.851 ms
64 bytes from 52.95.21.209: icmp_seq=162 ttl=220 time=122.182 ms
64 bytes from 52.95.21.209: icmp_seq=163 ttl=220 time=121.868 ms
64 bytes from 52.95.21.209: icmp_seq=164 ttl=220 time=121.706 ms
64 bytes from 52.95.21.209: icmp_seq=165 ttl=220 time=121.599 ms
64 bytes from 52.95.21.209: icmp_seq=166 ttl=220 time=121.676 ms
64 bytes from 52.95.21.209: icmp_seq=167 ttl=220 time=122.022 ms
64 bytes from 52.95.21.209: icmp_seq=168 ttl=220 time=122.012 ms
64 bytes from 52.95.21.209: icmp_seq=169 ttl=220 time=122.111 ms
64 bytes from 52.95.21.209: icmp_seq=170 ttl=220 time=121.996 ms
64 bytes from 52.95.21.209: icmp_seq=171 ttl=220 time=122.016 ms
64 bytes from 52.95.21.209: icmp_seq=172 ttl=220 time=121.959 ms
64 bytes from 52.95.21.209: icmp_seq=173 ttl=220 time=121.536 ms
64 bytes from 52.95.21.209: icmp_seq=174 ttl=220 time=121.912 ms
64 bytes from 52.95.21.209: icmp_seq=175 ttl=220 time=121.734 ms
64 bytes from 52.95.21.209: icmp_seq=176 ttl=220 time=121.820 ms
64 bytes from 52.95.21.209: icmp_seq=177 ttl=220 time=121.888 ms
64 bytes from 52.95.21.209: icmp_seq=178 ttl=220 time=121.952 ms
64 bytes from 52.95.21.209: icmp_seq=179 ttl=220 time=121.937 ms
64 bytes from 52.95.21.209: icmp_seq=180 ttl=220 time=121.990 ms
64 bytes from 52.95.21.209: icmp_seq=181 ttl=220 time=121.920 ms
64 bytes from 52.95.21.209: icmp_seq=182 ttl=220 time=121.830 ms
64 bytes from 52.95.21.209: icmp_seq=183 ttl=220 time=122.055 ms
64 bytes from 52.95.21.209: icmp_seq=184 ttl=220 time=121.789 ms
64 bytes from 52.95.21.209: icmp_seq=185 ttl=220 time=121.759 ms
64 bytes from 52.95.21.209: icmp_seq=186 ttl=220 time=121.980 ms
64 bytes from 52.95.21.209: icmp_seq=187 ttl=220 time=121.853 ms
64 bytes from 52.95.21.209: icmp_seq=188 ttl=220 time=121.581 ms
64 bytes from 52.95.21.209: icmp_seq=189 ttl=220 time=122.064 ms
64 bytes from 52.95.21.209: icmp_seq=190 ttl=220 time=122.161 ms
64 bytes from 52.95.21.209: icmp_seq=191 ttl=220 time=121.890 ms
64 bytes from 52.95.21.209: icmp_seq=192 ttl=220 time=122.114 ms
64 bytes from 52.95.21.209: icmp_seq=193 ttl=220 time=121.978 ms
64 bytes from 52.95.21.209: icmp_seq=194 ttl=220 time=122.202 ms
64 bytes from 52.95.21.209: icmp_seq=195 ttl=220 time=122.152 ms
64 bytes from 52.95.21.209: icmp_seq=196 ttl=220 time=121.921 ms
64 bytes from 52.95.21.209: icmp_seq=197 ttl=220 time=121.920 ms
64 bytes from 52.95.21.209: icmp_seq=198 ttl=220 time=121.978 ms
64 bytes from 52.95.21.209: icmp_seq=199 ttl=220 time=121.975 ms
64 bytes from 52.95.21.209: icmp_seq=200 ttl=220 time=121.663 ms
64 bytes from 52.95.21.209: icmp_seq=201 ttl=220 time=121.983 ms
64 bytes from 52.95.21.209: icmp_seq=202 ttl=220 time=122.505 ms
64 bytes from 52.95.21.209: icmp_seq=203 ttl=220 time=121.836 ms
64 bytes from 52.95.21.209: icmp_seq=204 ttl=220 time=121.879 ms
64 bytes from 52.95.21.209: icmp_seq=205 ttl=220 time=121.709 ms
64 bytes from 52.95.21.209: icmp_seq=206 ttl=220 time=121.856 ms
64 bytes from 52.95.21.209: icmp_seq=207 ttl=220 time=122.295 ms
64 bytes from 52.95.21.209: icmp_seq=208 ttl=220 time=121.903 ms
64 bytes from 52.95.21.209: icmp_seq=209 ttl=220 time=121.840 ms
64 bytes from 52.95.21.209: icmp_seq=210 ttl=220 time=122.003 ms
64 bytes from 52.95.21.209: icmp_seq=211 ttl=220 time=122.050 ms
64 bytes from 52.95.21.209: icmp_seq=212 ttl=220 time=121.797 ms
64 bytes from 52.95.21.209: icmp_seq=213 ttl=220 time=121.695 ms
64 bytes from 52.95.21.209: icmp_seq=214 ttl=220 time=122.314 ms
64 bytes from 52.95.21.209: icmp_seq=215 ttl=220 time=121.946 ms
64 bytes from 52.95.21.209: icmp_seq=216 ttl=220 time=121.965 ms
64 bytes from 52.95.21.209: icmp_seq=217 ttl=220 time=121.841 ms
64 bytes from 52.95.21.209: icmp_seq=218 ttl=220 time=121.717 ms
64 bytes from 52.95.21.209: icmp_seq=219 ttl=220 time=121.650 ms
64 bytes from 52.95.21.209: icmp_seq=220 ttl=220 time=121.916 ms
64 bytes from 52.95.21.209: icmp_seq=221 ttl=220 time=121.959 ms
64 bytes from 52.95.21.209: icmp_seq=222 ttl=220 time=121.999 ms
64 bytes from 52.95.21.209: icmp_seq=223 ttl=220 time=121.924 ms
64 bytes from 52.95.21.209: icmp_seq=224 ttl=220 time=121.889 ms
64 bytes from 52.95.21.209: icmp_seq=225 ttl=220 time=122.162 ms
64 bytes from 52.95.21.209: icmp_seq=226 ttl=220 time=121.979 ms
64 bytes from 52.95.21.209: icmp_seq=227 ttl=220 time=122.379 ms
64 bytes from 52.95.21.209: icmp_seq=228 ttl=220 time=121.689 ms
64 bytes from 52.95.21.209: icmp_seq=229 ttl=220 time=121.823 ms
64 bytes from 52.95.21.209: icmp_seq=230 ttl=220 time=121.899 ms
64 bytes from 52.95.21.209: icmp_seq=231 ttl=220 time=122.129 ms
64 bytes from 52.95.21.209: icmp_seq=232 ttl=220 time=121.773 ms
64 bytes from 52.95.21.209: icmp_seq=233 ttl=220 time=121.910 ms
64 bytes from 52.95.21.209: icmp_seq=234 ttl=220 time=121.925 ms
64 bytes from 52.95.21.209: icmp_seq=235 ttl=220 time=122.247 ms
64 bytes from 52.95.21.209: icmp_seq=236 ttl=220 time=121.886 ms
64 bytes from 52.95.21.209: icmp_seq=237 ttl=220 time=121.889 ms
64 bytes from 52.95.21.209: icmp_seq=238 ttl=220 time=121.861 ms
64 bytes from 52.95.21.209: icmp_seq=239 ttl=220 time=121.969 ms
64 bytes from 52.95.21.209: icmp_seq=240 ttl=220 time=121.242 ms
64 bytes from 52.95.21.209: icmp_seq=241 ttl=220 time=121.948 ms
64 bytes from 52.95.21.209: icmp_seq=242 ttl=220 time=122.073 ms
64 bytes from 52.95.21.209: icmp_seq=243 ttl=220 time=121.680 ms
64 bytes from 52.95.21.209: icmp_seq=244 ttl=220 time=121.996 ms
64 bytes from 52.95.21.209: icmp_seq=245 ttl=220 time=121.796 ms
64 bytes from 52.95.21.209: icmp_seq=246 ttl=220 time=121.677 ms
64 bytes from 52.95.21.209: icmp_seq=247 ttl=220 time=122.352 ms
64 bytes from 52.95.21.209: icmp_seq=248 ttl=220 time=122.057 ms
64 bytes from 52.95.21.209: icmp_seq=249 ttl=220 time=121.580 ms
64 bytes from 52.95.21.209: icmp_seq=250 ttl=220 time=121.943 ms
64 bytes from 52.95.21.209: icmp_seq=251 ttl=220 time=121.980 ms
64 bytes from 52.95.21.209: icmp_seq=252 ttl=220 time=121.594 ms
64 bytes from 52.95.21.209: icmp_seq=253 ttl=220 time=121.454 ms
64 bytes from 52.95.21.209: icmp_seq=254 ttl=220 time=121.817 ms
64 bytes from 52.95.21.209: icmp_seq=255 ttl=220 time=122.185 ms
64 bytes from 52.95.21.209: icmp_seq=256 ttl=220 time=121.535 ms
64 bytes from 52.95.21.209: icmp_seq=257 ttl=220 time=121.639 ms
64 bytes from 52.95.21.209: icmp_seq=258 ttl=220 time=121.993 ms
64 bytes from 52.95.21.209: icmp_seq=259 ttl=220 time=121.862 ms
64 bytes from 52.95.21.209: icmp_seq=260 ttl=220 time=121.965 ms
64 bytes from 52.95.21.209: icmp_seq=261 ttl=220 time=121.760 ms
64 bytes from 52.95.21.209: icmp_seq=262 ttl=220 time=121.996 ms
Request timeout for icmp_seq 263
64 bytes from 52.95.21.209: icmp_seq=264 ttl=220 time=121.536 ms
64 bytes from 52.95.21.209: icmp_seq=265 ttl=220 time=121.595 ms
64 bytes from 52.95.21.209: icmp_seq=266 ttl=220 time=121.730 ms
64 bytes from 52.95.21.209: icmp_seq=267 ttl=220 time=130.770 ms
64 bytes from 52.95.21.209: icmp_seq=268 ttl=220 time=121.986 ms
64 bytes from 52.95.21.209: icmp_seq=269 ttl=220 time=122.069 ms
64 bytes from 52.95.21.209: icmp_seq=270 ttl=220 time=124.811 ms
64 bytes from 52.95.21.209: icmp_seq=271 ttl=220 time=124.216 ms
64 bytes from 52.95.21.209: icmp_seq=272 ttl=220 time=124.448 ms
64 bytes from 52.95.21.209: icmp_seq=273 ttl=220 time=123.949 ms
64 bytes from 52.95.21.209: icmp_seq=274 ttl=220 time=123.368 ms
64 bytes from 52.95.21.209: icmp_seq=275 ttl=220 time=123.573 ms
64 bytes from 52.95.21.209: icmp_seq=276 ttl=220 time=122.047 ms

Thanks very much for your reply. You’re right, I did indeed post on the AWS forums. The answer wasn’t very helpful, sadly. :slightly_frowning_face: I am still experiencing this. The pings from outside the container do sometimes seem to drop at the same time as those inside during the repeated failed requests, but not always. And several people are experincing this in completely different regions with separate ISPs. No one is using a VPN.

Another interesting thing I’ve noticed, which might or might not be related, is that sometimes the container seems to stop accepting incoming requests from the host. My Flask app will be sitting there, apparently waiting for a request, and my browser will be spinning waiting to connect to localhost:5050 (the port my container runs on in the host’s network). And then, eventually, requests will just start working again. Does this give any insight into what might be wrong?? I’ll try to catch it again, open a shell and try to send a requests req while this is happening.

The other thing I’ve noticed is that, while apple.com and ECR always seem to have low ping times, SSM and Cognito can both take awhile. SSM is the bigger problem, and can be much slower, but pretty often I do see Cognito take .5–3 seconds to respond to connections from inside the container.

Not to me. As I see most of your issues are not related to containers at all but to the network between you and the destination server so I can’t help you with that. As a matter of fact, I have a network connectivity issue with my VPS as well, so I had to contact the VPS provider (sometimes takes 15 minutes to access it again). Maybe this one related to containers

or it may be just caused by the application itself. Unfortunately I have no way to repruduce the issue. I don’t use AWS and I have never had a similar issue locally.

1 Like

Thanks for your reply. I have done some digging on the Docker GitHub, and it looks like the issue isn’t actually related to AWS. Instead, it seems like I may be running into a longstanding bug with Docker networking. Despite the issue being in the Windows repo, and mentioning accessing IPs on the host, there are people experiencing this both on Mac (and even one report from Linux) and also to non-host addresses.

If this turns out to be the issue, it would explain a lot of perplexing things about the problem:

  • why restarting the container doesn’t fix the issue (because the problem is with Docker networking itself)
  • why the host is still reachable from outside the container (ditto)
  • why my team mates are seeing the same behavior in different networks/countries (we’re all on recent versions of Docker)
  • why it seems to come and go (it gets worse with time until Docker restart)
  • and why it seems to preferentially affect certain hosts (the problem has something to do with incomplete TCP handshakes, and so your connections will begin to fail to hosts you’ve exchanged more traffic with).

There are a number of comments on the issue providing steps to reproduce. While the problem seems to be affecting more people after release 4.5.0 (from Feb 2022), the original bug was filed in 2020. The original reporter discussed it with the Docker support team in December 2020, and they know what the problem is but have yet to provide a fix.

I have gone back to 4.5.0 for the time being, and we’ll see if the issue presents after a few days of Docker uptime. Other than using an earlier version, the only reliable solution discussed in the issue is to restart Docker itself. I launched version 4.5.0 one day ago and haven’t seen the problem yet, so fingers crossed that will solve the issue for me until a fix can be deployed.