Can concurrent docker containers speed-up computations?

As a disclaimer let me just say that I am a beginner with Docker and hence the question might sound a bit “dummy”.

I am exploring parallelization options to speed-up some computations. I’m working with Python, so I followed the official guidelines to create my first image and then run it as a container.

For the time being, I use a dummy program that generates a very large np random matrix (let’s say 4000 x 4000) and then finds how many elements in each row fall into a predefined range [min, max].

I then launched a second container of the same image obviously with a different port and name. I didn’t get any speed-ups in the computations which I was expecting somehow, since:

  • a) I haven’t developed any mechanism for the 2 containers to somehow “talk to each other” and share intermediate results
  • b) I am not sure if the program itself is suitable for speedups in such a way.

So my questions corresponding to a, b above are:

  1. Is parallelism a “feature” supported by docker deployments and in what sense? Is it just load sharing? And if I implement a load balancer, how does docker know how to transfer intermediate results from one container to the other?
  2. If the previous question is not “correct”, do I then need to write “parallel” versions of my programs to assign to each container? Isn’t this equivalent to writing MPI versions of my program and assign them to different cores in my system? What would be the benefit of a docker architecture then?

thanks in advance

Docker’s main purpose is not parallelisation but isolation and portability but you can use it to run processes in parallel as you can do it without Docker.

Multiple machines

You can run multiple instances of a container and load balance among them for better performance that can result faster applications but not because of parallel execution. You can also have a service on a machine which has a GPU used by a web application while the web application which you are communicating with does not have GPU. Of course it also works with other resources like CPU and memory, so you can have an application running on multiple machines at the same time and collect the results in your web application for example. Since those machines are operating in parallel you can have faster result without those machines communicating with eachother directly.

You could send the half of the matrix to one machine and the other half to an other machine. Both of them would return a number that you can add up and get the final result.

I don’t know if you are familiar with Apache Spark, but it does something like I described.

One machine

Let’s say you have only one machine. In that case all of your containers use the same resources, so it is possible that all of your calculations in different containers run on the same cpu core without running in parallel. If you have a process without Docker, you can use taskset on linux like:

taskset -c 2 bash -c 'while true; do echo "hello test"; done'

where taskset -c 2 means the bash process will run on the third cpu core (0, 1, 2, …). Since a Docker container can use all of your resources by default, you can do the same in a container:

docker run --init --rm -it ubuntu:20.04 taskset -c 2 bash -c 'while true; do echo "hello test"; done'

Note: The --init flag is important here so you can use CTRL+C to stop the container.

Docker has --cpuset-cpus to do the same

docker run --init --rm -it --cpuset-cpus 2 ubuntu:20.04 bash -c 'while true; do echo "hello test"; done'

If you have a third container you can use that to collect the results from the others as I described before. If you want two containers to communicate, you can use volumes or shared memory (IPC namespace)

Start process A in a container with 1 gigabyte shared memory

docker run --init --rm -it --name process-a --ipc=shareable --shm-size 1g ubuntu:20.04 bash -c 'while true; do date > /dev/shm/date.txt; sleep 1; done'

start process B using the shared memory of process A:

docker run --init --rm -it --name process-b --ipc container:process-a ubuntu:20.04 bash -c 'while true; do cat /dev/shm/date.txt; done'

You can combine it with cpu core selection. You will see the speed difference only if the communication doesn’t cost more than the calculation itself.

I have never run processes in parallel, let alone in Docker (this is something that my colleagues do), so my description may not be perfect, but I am sure the fact that you use multiple containers will not mean that those containers will run your processes in parallel, so you have to prepare your application to work that way.

1 Like

Simply put, a container is an isolated/constrained process on the host kernel, which shares the ressources with other processes on the kernel.

Does you application benefit from multiple executions on the same machine today?
The same will be true when you use replicas of your container on the same host.

Do you want to distribute the replicas over multiple machines/nodes?
This would definitly speed-up computation as the replicas would be running on different machines/nodes and use the ressources of that node. You would need to use a multi-node container orchestrator like swarm or kubernetes.

True for replicas in general:

  • your application is completly stateless: this works out of the box
  • your application is statefull:
    • session state: add an external state store like redis, memcached and integrate it into your code
    • database state: make sure your application is able to handle multiple instances of your application are modifying the database state at the same time!
    • file state: if different files are written from each instance → nothing to do; if same files should be modified by different instances: implement your own distributed locking (redis can be usefull to hold the lock)

Application support for replicas is something that must be part of the application code and is nothing that can be attached from outside by docker, kubernetes or a cloud platform.