Timeouts when running large number of instances of web app in Python (tornado, flask)

Hello,
I’m running a web application based on Flask and Tornado in Python and I often reach timeouts when requesting my service. The app runs on the server with some other applications that generate high network load (all dockerized). To find out if this issue is related with Docker I did a simple test:

  1. created a simple hello-world application (Python, Flask, Tornado): https://gist.github.com/anonymous/a828e94693532477b167 - use “make build-docker” to create an image
  2. started 101 containers with my application (run_dockers.sh) - ports 8080-8180
  3. started one standalone (without Docker) instance of the app (“make run”)
  4. using Siege tool I provided high load on containers with ports 8080-8179 (“siege -c 1000 -r 100 -b -f list”)
  5. using ApacheBench I tested connection to the container on port 8180 (“ab -n 1000 -c 100 localhost:8180/”)
  6. using ApacheBench I tested connection to a standalone app on port 80.

When calling dockerized app I came into timeouts and benchmarking couldn’t be finished. I did it a few times, almost always with similar results.
When calling standalone app there were no such problems.
Screen: 6. on the left, 5. on the right
http://imgur.com/OhThlJP

My conclusion is: high network load on some Docker containers can cause networking problems on other containers but not on applications running on host. Is that a normal behavior? Any suggestions how to overcome this issue?

I’m running Docker version 1.7.0, build 0baf609

1 Like

I am also struggling with this issue - any news on this ?

1 Like

What happens if you use host networking does the issue go away?

Thanks for your response.
Host networking was one of the first things I tried.
It helped partially, I see less timeouts in my production environment, but when I test using ApacheBench (as mentioned before) results are better but still not as good as for standalone app (timeouts occur less often).

interesting, what version of the kernel is this? with a later kernel is the situation improved?

Sorry for late response.
Kernel version is 3.13.0-32-generic. I’ll do a kernel update, rerun tests and show the results here.

1 Like

Hello is there any news about this issue , we are encountering the same problem, so did you find a solution or even a work around?