I want to clarify some caching behavior that I have observed but not found clearly documented. Consider a Dockerfile like:
FROM quay.io/centos/centos:stream8 as base RUN dnf install git FROM base AS build RUN dnf install make gcc RUN mkdir -p /opt/build && curl -L https://example.com/code.tar.gz | tar xz /opt/build RUN cd /opt/build && ./configure --with-pic RUN make -C /opt/build FROM base AS final COPY --from=build /opt/build/app /bin/app
When I build this locally with
DOCKER_BUILDKIT=1 docker build --build-arg BUILDKIT_INLINE_CACHE=1 --target=final ..., I see that changes to the flags given to
./configure in the
build stage cause the
RUN steps to be re-run but not the
dnf install and
curl steps. If I then push the image to a registry, clear the local cache with
docker builder prune, and run
docker build again with
--cache-from pointing at the pushed image, a change to the
./configure flags results in all the steps in the
build stage being re-run.
My understanding of what is happening is that when using the inline cache only the
/opt/build/app file is written into the docker image along with hashed metadata about the steps in the
build stage that generated it but no other data generated from the
build stage is written into the image. Thus when using
--cache-from, when docker detects a change in the
build stage, it needs to run all the steps again. However, when running the build locally, docker has an additional cache that matches steps based on their history and that can be used for the intermediate steps of a multi-stage build. These cached steps are not written into the image.
I believe these intermediate steps could be captured for future caching by building with
--target=build, pushing the
build image, and using
--cache-from twice, once for the
build image and once for the
final image. Also, I believe using
--cache-to=type=registry,mode=max could store all the intermediate layers.
Is this all correct?
My main motivation for asking is a discussion with a colleague. They would like the final three steps in the
build stage to be condensed into a single step to avoid writing unnecessary data into the image. For a single stage image, I agree that that is better practice for keeping the image small. However, here it seems that if we only target the
final stage then we do not end up with intermediate data written into the image any way and keeping the steps separated allows for some local caching which is convenient – for example it saves needing to
curl the source code every time that the
./configure flags are adjusted while debugging.
- Are there other nuances to caching and image size to consider here?
- Is there a way to inspect the inline cache to compare the two cases? Is that what
- Is using a build stage to split what would be one step into multiple for readability / caching a legitimate use case or is it overly relying on a detail of how inline caching works?
- I built the image with the steps combined and found that
docker image listreported the same size.
- Breaking out a step into multiple in a separate stage for better caching could be seen as a debugging trick that should be removed when the debugging is done.