I am very new in docker and I am learning how to create an image for my app.
This is a python app, I have two issues I’m trying to fix with the image :
it takes ~1 hour to build,
the image size is ~2Gb
I read lot of blog post to try to fix this, this seems common issue with python app, so I assume I’m doing it wrong.
I use on prem gitlab though CI pipeline to build the image and store the image into gitlab registry.
First thing I tried was to use some docker cache to speed up process so I use the following config into gitlab pipeline :
Well, if you install a 2GB LLM library, then it may take long and take up space. So it kind of depends what kind of requirements you are installing. We don’t know the content of your app_requirements.txt file.
The one taking lot of time to compile is mpi4py
But I though that with multi stage docker file I would not compile the lib every time I modify the application ?
A Dockerfile itself does nothing. If you have no cache of the first stage and you have a copy that needs the previous stage, it will have to be built.
If you only change your app’s source code, your second stage copies that and invalidates the following layers, including the next copy that wants to copy from the builder stage. You could change the order of the copy instructions, copying the source code will happen after the COPY instruction that refers to the previous stage, so that will not be invalidated because of your source code.
but it was some time ago when I last used multi-stage builds, so I’m not 100% sure at the moment what happens with the COPY instruction that needs another layer when it is not there. Normally I would try but I don’t have time right now.
thank you for your feedback.
I did more testing, docker cache mechanism is working great. I had a config issue in my gitlab pipeline preventing cache to work. Since then just editing code, rebuild is quite fast.
For the size multi-stage helps a lot. But this doesn’t works for all kind of situation.