Some background information:
I am building a workflow pipeline using Airflow and Docker to support iterative machine learning and check-pointing to a local server.
For machine learning we are experimenting with a few opensource ML engines/APIs such as:
Each of these have their own set of dependencies that are containerized in their own Dockerfiles.
I have also used an open source dockerized Airflow workflow engine. The main intention here is to use the user endpoint define their experiment with a specific tool and a specific input set to be executed on the local server.
- User A, trial 1 -> With Kaldi -> With Input Set A -> With Checkpoint A -> Run All and Persist
- User B, trial 5 -> With Tensorflow -> With Input Set A -> With Checkpoint A -> Run All
Each of the ML Tools have the need for CUDA. Is it better to have multi-stage builds in each ML tool or to separate the CUDA into its own container?
If I were to separate CUDA into its own container how can I get the ML Tool’s container to interact with the drivers installed in the CUDA container?