Hi,
I’m building a Python image with Docker BuildKit and a multi-stage Dockerfile.
The first stages create a virtual-environment (.venv) that ends up several GB in size.
In the final (runtime) stage I currently copy that entire venv:
# runtime stage
FROM python-base AS algo
…
COPY --from=env-algo /.venv/ /.venv/
Our infrastructure is such that we have to pull the image quite often, and the pull
time of that single huge layer is big.
Because Docker pulls layers in parallel, I experimented with manually splitting
the venv into ten “buckets”, each copied in its own layer:
COPY --from=env-algo /buckets/1 /.venv/
COPY --from=env-algo /buckets/2 /.venv/
…
COPY --from=env-algo /buckets/10 /.venv/
But this seems a bit hacky.
Is there a canonical or recommended way to break up a very large directory
across multiple layers, purely to speed up docker pull? Are there alternative best practices for this use-case?
Thank you