First docker image : long build & too large

Hello,

I am very new in docker and I am learning how to create an image for my app.
This is a python app, I have two issues I’m trying to fix with the image :

  • it takes ~1 hour to build,
  • the image size is ~2Gb

I read lot of blog post to try to fix this, this seems common issue with python app, so I assume I’m doing it wrong.

I use on prem gitlab though CI pipeline to build the image and store the image into gitlab registry.

First thing I tried was to use some docker cache to speed up process so I use the following config into gitlab pipeline :

build_docker_image:
  stage: build
  script:
    - docker pull $CI_REGISTRY_IMAGE:latest || true
    - docker build --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from $CI_REGISTRY_IMAGE:latest --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG --tag $CI_REGISTRY_IMAGE:latest -f ./Dockerfile_vad .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG
    - docker push $CI_REGISTRY_IMAGE:latest

Also I moved my dockerfile to multi stage as recommended on some website

FROM python:3.10.16-slim AS builder
 
WORKDIR /root
RUN set -eux; \
    apt-get update; \
    apt-get install -y --no-install-recommends \
      libopenmpi-dev \
      build-essential \
      git

ENV PATH=/root/.local/bin:$PATH
RUN pip install --upgrade pip

# Copier les fichiers de dépendances d'abord pour tirer parti du cache Docker
COPY app_requirements.txt .

# Installer les dépendances
RUN pip install --user --no-cache-dir -r app_requirements.txt

FROM python:3.10.16-slim

ENV PATH=/root/.local/bin:$PATH
COPY app/ /root/app/

COPY --from=builder /root/.local /root/.local

CMD ["python", "/root/app/server.py"]

Any thoughs on what I am doing wrong ?
regards

Well, if you install a 2GB LLM library, then it may take long and take up space. So it kind of depends what kind of requirements you are installing. We don’t know the content of your app_requirements.txt file.

Hello,

thank you for your feedback.

Indeed I use the following requirements :


aio_pika
aiohttp
mpi4py
numpy
omegaconf
soxr
starlette
torch
uvicorn[standard]

The one taking lot of time to compile is mpi4py
But I though that with multi stage docker file I would not compile the lib every time I modify the application ?

Best regards,

A Dockerfile itself does nothing. If you have no cache of the first stage and you have a copy that needs the previous stage, it will have to be built.

If you only change your app’s source code, your second stage copies that and invalidates the following layers, including the next copy that wants to copy from the builder stage. You could change the order of the copy instructions, copying the source code will happen after the COPY instruction that refers to the previous stage, so that will not be invalidated because of your source code.

but it was some time ago when I last used multi-stage builds, so I’m not 100% sure at the moment what happens with the COPY instruction that needs another layer when it is not there. Normally I would try but I don’t have time right now.

thank you for your feedback.
I did more testing, docker cache mechanism is working great. I had a config issue in my gitlab pipeline preventing cache to work. Since then just editing code, rebuild is quite fast.

For the size multi-stage helps a lot. But this doesn’t works for all kind of situation.

Thanks for your help