R mcmapply (parallized mapply) function broken with docker (linux)

I’m using docker in Ubuntu. Confirmed my OS is supported and is correctly installed (rootless).

Everything works swimmingly until I reach the line in my R code that exploits the parallel-processing version of the R function mapply (mcmapply). On the Linux instance I am using, I have 16 cores. In the R code, I run the function mcmapply explicitly setting the number of cores to 8. When the R program reaches the line that uses mcmapply, htop shows ALL 16 cores at full 100% utilization and the program freezes. I tried running the docker container explicitly constraining the CPU core utilization count to cores 0-7, which was successful. However, when running the R program in docker, the 8 cores reach 100% utilization and the program freezes.

Interestingly enough, in the same R code, there is a call to the parallelized version of the function lapply() (mclapply()) which works just fine. The only trouble is with the function mcmapply().

There are no esoteric messages in the docker logs or otherwise. How might you suggest I troubleshoot?

Many thanks.

Can you provide an example for others to reproduce?

Since you could limit the number of CPUs used by the container, isn’t this an R programming question? Have you run the exact same code on the same machine without Docker?

I’ve ran this code over a hundred times outside of Docker and it runs just fine. In fact, I’m presently running it on another machine to continue research while I sort out this single issue with Docker.

Here’s an observation I’ve recently noted that might be useful:

I’m running this on an AWS EC2 instance running Ubuntu 24.0.4 (this issue also persists in Ubuntu 22).
This instance has 16 cores (vCPUs).

When examining htop while running the program outside of Docker, I note that the R process takes a single core.
When the code reaches a multi-processing function (such as mclapply() or mcmapply()), I note that outside of Docker, 8 R processes are reported in htop (I am limiting the program to only utilize 8 cores).

When examining htop while running the program within a (single) Docker container, I note 16 R processes immediately present once starting the program within Docker (even before the R program reaches any function that exploits multi-processing).

When I run the function in question (mcmapply()) outside of Docker, I see the same previously reported behavior in htop, 8 R processes reported – a single process for each of the cores I specified in the program (8).

When I run the same function in question (mcmapply()) within the single Docker container, each of the 16 R processes then spawn 8 more, resulting in 128 R processes in htop. This is when the program grinds to an extremely slow speed and becomes unusable.

I’ve confirmed there to be no warnings in my docker info command and examined the corresponding logs. I’m also running this in rootless mode.

I’ve troubleshot by limiting the number of cores that Docker may utilize when spinning up the container (I tried 8 and 1). This reduces the number of R processes that are subsequently spawned, but when running mcmapply() the cores that R does utilize reach 100% capacity and the program again grinds to an extremely slow speed.

I’ll seek to provide a straightforward example to reproduce. Appreciate the consideration and time, of course!

Also, for what it’s worth running the container with the --privileged flag results in the same issue when running mcmapply() (16 cores X 8 specified multiprocessing cores = 128 R processes).

It would still require more knowledge about R than I have. This is when the importance of an example comes as asked by @bluepuma77. With that we could try and see what happens, but Docker will not run more processes or make the application use more resources, so the issue could be that mcmapply (or anything that runs in the container) cannot handle that it is running in a container. To solve this we would need to understand what it does exactly and how. That is when an R programmer could be more valuable. But maybe you can be that programmer.

What we can say is how Docker or containers in general work. Processes in the container will see all the resources on the host, so if mcmapply does something based on the amount of resources on the host while it is limited by the control groups assigned to the container, mcmapply can misbehave. If this is the case, it cannot be fixed in Docker, but the library has to be prepared for containers or have to be parameterized properly if possible.

How did you install Docker? Can you share the guide you followed?

After setting up the MRE (Minimum Requirements Example) using mcmapply(), I determined the issue to lie not with the mcmapply() function itself, but a step within the function it called that used the solve() function.

This function relies on BLAS (Basic Linear Algebra Subprograms), which are low-level routines for solving basic linear algebra problems. Unfortunately, the official R base images provided by Rocker include BLAS dependencies that have since been reported to have known issues.

To resolve, I used the rocker/r-ver:4.4.0 image as my base and included the following line in my Dockerfile:

RUN apt remove -y libopenblas0-pthread libopenblas0 libopenblas0-openmp libopenblas-openmp-dev libopenblas-dev

Everything works now as expected and I consider this issue closed. Perhaps I should rename this post title to something like “Beware libopenblas libraries in the official R base images.”

Many thanks for the support.

Relevant links as follows:

https://github.com/OpenMathLib/OpenBLAS/issues/2642

2 Likes