I have a frontend through which users can submit analysis requests, which are inserted into a database. These analysis requests are received in a NodeJS server using Postgres notify/listen. Using Redis and a queueing solution like Bee or Bull, these analysis requests are queued. A typical analysis process is long and takes about 1 day. The analysis happens through a series of R scripts. Some R scripts use just one core, but some others can benefit from parallel execution and use multiple cores.
Currently, I use a largish EC2 instance (8 cores, 32GB RAM) to process the analysis request. To speed this up and optimize the use of resources, I am wondering if containerizing the R code could be helpful.
My approach is to use a small EC2 instance for hosting the NodeJS code that receives the analysis requests, queue it using Bull + Redis, and then create a container with R on AWS ECS. This approach has several advantages:
-
I can create containers as the analysis requests come in. This way, the analysis requests start executing without having to wait for resources to free up.
-
Since analysis requests happen randomly and there can be long periods of no use, I only incur the cost of running a smaller EC2 instance instead of a large one.
-
The R code needing parallelization can be horizontally scaled to use several vCPUs (10-100) as needed on ECS leading to significant speed-up while only costing for the time-used.
Is this approach reasonable? If yes, what would be the recommended way to initiate the containerized R process from NodeJS (that can eventually be moved to AWS ECS or like). I have looked at NodeJS package dockerode, but I am not sure if that is the recommended approach, and whether it is scalable.
Thanks