Docker’s main purpose is not parallelisation but isolation and portability but you can use it to run processes in parallel as you can do it without Docker.
Multiple machines
You can run multiple instances of a container and load balance among them for better performance that can result faster applications but not because of parallel execution. You can also have a service on a machine which has a GPU used by a web application while the web application which you are communicating with does not have GPU. Of course it also works with other resources like CPU and memory, so you can have an application running on multiple machines at the same time and collect the results in your web application for example. Since those machines are operating in parallel you can have faster result without those machines communicating with eachother directly.
You could send the half of the matrix to one machine and the other half to an other machine. Both of them would return a number that you can add up and get the final result.
I don’t know if you are familiar with Apache Spark, but it does something like I described.
One machine
Let’s say you have only one machine. In that case all of your containers use the same resources, so it is possible that all of your calculations in different containers run on the same cpu core without running in parallel. If you have a process without Docker, you can use taskset
on linux like:
taskset -c 2 bash -c 'while true; do echo "hello test"; done'
where taskset -c 2
means the bash process will run on the third cpu core (0, 1, 2, …). Since a Docker container can use all of your resources by default, you can do the same in a container:
docker run --init --rm -it ubuntu:20.04 taskset -c 2 bash -c 'while true; do echo "hello test"; done'
Note: The --init
flag is important here so you can use CTRL+C to stop the container.
Docker has --cpuset-cpus
to do the same
docker run --init --rm -it --cpuset-cpus 2 ubuntu:20.04 bash -c 'while true; do echo "hello test"; done'
If you have a third container you can use that to collect the results from the others as I described before. If you want two containers to communicate, you can use volumes or shared memory (IPC namespace)
Start process A in a container with 1 gigabyte shared memory
docker run --init --rm -it --name process-a --ipc=shareable --shm-size 1g ubuntu:20.04 bash -c 'while true; do date > /dev/shm/date.txt; sleep 1; done'
start process B using the shared memory of process A:
docker run --init --rm -it --name process-b --ipc container:process-a ubuntu:20.04 bash -c 'while true; do cat /dev/shm/date.txt; done'
You can combine it with cpu core selection. You will see the speed difference only if the communication doesn’t cost more than the calculation itself.
I have never run processes in parallel, let alone in Docker (this is something that my colleagues do), so my description may not be perfect, but I am sure the fact that you use multiple containers will not mean that those containers will run your processes in parallel, so you have to prepare your application to work that way.