Docker Community Forums

Share and learn in the Docker community.

What is the recommended way to start a long-running containerized analysis in R from NodeJS?

I have a frontend through which users can submit analysis requests, which are inserted into a database. These analysis requests are received in a NodeJS server using Postgres notify/listen. Using Redis and a queueing solution like Bee or Bull, these analysis requests are queued. A typical analysis process is long and takes about 1 day. The analysis happens through a series of R scripts. Some R scripts use just one core, but some others can benefit from parallel execution and use multiple cores.

Currently, I use a largish EC2 instance (8 cores, 32GB RAM) to process the analysis request. To speed this up and optimize the use of resources, I am wondering if containerizing the R code could be helpful.

My approach is to use a small EC2 instance for hosting the NodeJS code that receives the analysis requests, queue it using Bull + Redis, and then create a container with R on AWS ECS. This approach has several advantages:

  1. I can create containers as the analysis requests come in. This way, the analysis requests start executing without having to wait for resources to free up.

  2. Since analysis requests happen randomly and there can be long periods of no use, I only incur the cost of running a smaller EC2 instance instead of a large one.

  3. The R code needing parallelization can be horizontally scaled to use several vCPUs (10-100) as needed on ECS leading to significant speed-up while only costing for the time-used.

Is this approach reasonable? If yes, what would be the recommended way to initiate the containerized R process from NodeJS (that can eventually be moved to AWS ECS or like). I have looked at NodeJS package dockerode, but I am not sure if that is the recommended approach, and whether it is scalable.

Thanks

Deploy NodeJS and R inside the same Docker container. NodeJS app will call the R script and, thereof communicating each other in local process stdout/stderr. Additionally, using TinyTeX(LaTeX), Pandoc and Rmarkdown to generate an analysis report PDF through R script. Oh Yeah!

On local development:
Consider you have installed R and NodeJS in prior. Go ahead, download and install R for your local development platform. And brew install node or nvm install node.

git clone https://github.com/victorskl/docker-nodejs-r.git
cd docker-nodejs-r

bash clean.sh
npm install
node spawn.js
The spawn.js demo to use the NodeJS built-in child_process to call Rscript command to run R file directly. To test using the r-script npm package:

node example.js
Note that you may encounter NodeJS stderr on the first (few) runs of example.js. This is because the ex-sync.R and ex-async.R requires some R packages as dependencies. If these dependent R libraries are not present in your R environment, by default, R invoked install.packages(…) on demands on the first encountered. When this event happens, the NodeJS script return with stderr; which is a bit mis-leading as the R script has no issue; except waiting for the dependencies to be resolved. To work around such issue, you can install the R dependencies eagerly in advanced as example in:

Rscript dependencies.R
On docker deployment:
This will take awhile to build the image as it will pull the R dependencies eagerly in advanced from dependencies.R using pacman during building the docker image.

To build docker image:

docker build -t dev_tesk .
To run the container from the created image:

docker run -it --rm -p 8888:8888 --name tesk dev_tesk
Open another terminal, go into the container and give a trial run:

docker exec -it tesk bash
pwd
ls -l

which Rscript
Rscript hello.R victor

nvm ls

which node
node -v
npm -v

node spawn.js

node example.js
To generate PDF report using TinyTeX through Pandoc and Rmarkdown

Rscript report_gen.R
And hit the http://localhost:8888. This will call the spawn process in server.js and print the hello.R output into console.log(…).

open -a Safari http://localhost:8888
To stop the container:

docker stop tesk
The downside is the docker image could be potentially huge as it requires installing R packages as well as NodeJS packages eagerly if the application stack is complex enough. To check the created docker image size:

docker images dev_tesk
And delete the image and container if desire.

docker rm tesk
docker rmi dev_tesk