Reducing pull time for a multi-GB .venv layer in a multi-stage build

Hi,

I’m building a Python image with Docker BuildKit and a multi-stage Dockerfile.
The first stages create a virtual-environment (.venv) that ends up several GB in size.
In the final (runtime) stage I currently copy that entire venv:

# runtime stage
FROM python-base AS algo
…
COPY --from=env-algo /.venv/ /.venv/

Our infrastructure is such that we have to pull the image quite often, and the pull
time of that single huge layer is big.

Because Docker pulls layers in parallel, I experimented with manually splitting
the venv into ten “buckets”, each copied in its own layer:

COPY --from=env-algo /buckets/1 /.venv/
COPY --from=env-algo /buckets/2 /.venv/
…
COPY --from=env-algo /buckets/10 /.venv/

But this seems a bit hacky.

Is there a canonical or recommended way to break up a very large directory
across multiple layers, purely to speed up docker pull? Are there alternative best practices for this use-case?

Thank you

A layer is what you create on it, so if you have a single python requirement that you install on it, that will go to a single layer. You could install packages (vend ependencies) one by one, but if you are duing it in a single folder and copy the entire fodler at once, that will be single layer again. But you can copy subfodlers into the new stage or even individual files

So you don’t need “buckets”, just don’t copy the entire .venv to the new stage in a single COPY instruction. Copy the subfolders and files and you will have multiple layers again.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.