Docker swarm mode routing mesh works intermittently

We are testing docker swarm mode and running docker version 17.04.0-ce, build 4845c56. We have a lot of services running in the cluster and lots of run and stop of services as well.

The cluster runs fine but now after a day or two the routing mesh starts to act strangely,

[ec2-user@ip-192-168-17-159 ~]$ curl localhost:30001/arrowPing.json
{“success”:true}
[ec2-user@ip-192-168-17-159 ~]$ curl localhost:30001/arrowPing.json
this one hangs
[ec2-user@ip-192-168-17-159 ~]$ curl localhost:30001/arrowPing.json
{“success”:true}
[ec2-user@ip-192-168-17-159 ~]$ curl localhost:30001/arrowPing.json
this one hangs

Any idea how we can diagnose this issue?

Hi. This sometimes happens when there are consumers of the /events endpoint in the swarm that are very slow. That can happen, for example, when containers generate a lot of text to stdout or have an interactive tty attached (among other things). If possible, avoid running containers with the -it option, and if possible, use “docker container create…; docker container start” where feasible, instead of “docker run”.

Also, depending on your usage of the HTTP routing mesh (HRM), you may also want to look at the kernel tunings of your underlying nodes, and tune the ARP neighbors table to be larger.

Thanks @bryceryan. I don’t think any container is doing that. However, we have a docker registry container that serves the rest of the swarm images. So I wonder if docker swarm routing mesh can handle large amount of traffic?