So, I am packaging a batch inference job (written in python) as a docker image to execute periodically. There are many dependencies of the job, which are captured in a requirement.txt
like this
tensorflow==2.12.0
polars==1.12.0
I have to build the environment using this requirement file, straightforward enough. But I have a design choice to make (prompting this question) between two approaches
- Build the environment (install all the packages after downloading from pip) during image building phase, then during container runtime (once a day) just run the
main.py
inside the container - Defer the environment building to the container runtime (just before
main.py
is run), which can be done asrequirement.txt
is available inside the container.
So far as I can see, each has its own advantage
- Quicker container runtime, as the environment is already built, saving the time to download and unpack all the pypi packages.
- Smaller image size, as the libraries taken together take a good amount of space
The job is not really latency sensitive, as in, I save about 4-5 minutes a day by prepackaging the dependencies instead of repeating at every run, but it does not practically matter.
But is there any other trade of I am missing, or does the docker community have any general recommendation, considering security and all? Any other parameter I should look at to make a decision or is there a best practice covering the scenario?