I have a frontend through which users can submit analysis requests, which are inserted into a database. These analysis requests are received in a NodeJS server using Postgres notify/listen. Using Redis and a queueing solution like Bee or Bull, these analysis requests are queued. A typical analysis process is long and takes about 1 day. The analysis happens through a series of R scripts. Some R scripts use just one core, but some others can benefit from parallel execution and use multiple cores.
Currently, I use a largish EC2 instance (8 cores, 32GB RAM) to process the analysis request. To speed this up and optimize the use of resources, I am wondering if containerizing the R code could be helpful.
My approach is to use a small EC2 instance for hosting the NodeJS code that receives the analysis requests, queue it using Bull + Redis, and then create a container with R on AWS ECS. This approach has several advantages:
I can create containers as the analysis requests come in. This way, the analysis requests start executing without having to wait for resources to free up.
Since analysis requests happen randomly and there can be long periods of no use, I only incur the cost of running a smaller EC2 instance instead of a large one.
The R code needing parallelization can be horizontally scaled to use several vCPUs (10-100) as needed on ECS leading to significant speed-up while only costing for the time-used.
Is this approach reasonable? If yes, what would be the recommended way to initiate the containerized R process from NodeJS (that can eventually be moved to AWS ECS or like). I have looked at NodeJS package
dockerode, but I am not sure if that is the recommended approach, and whether it is scalable.