I want to clarify some caching behavior that I have observed but not found clearly documented. Consider a Dockerfile like:
FROM quay.io/centos/centos:stream8 as base
RUN dnf install git
FROM base AS build
RUN dnf install make gcc
RUN mkdir -p /opt/build && curl -L https://example.com/code.tar.gz | tar xz /opt/build
RUN cd /opt/build && ./configure --with-pic
RUN make -C /opt/build
FROM base AS final
COPY --from=build /opt/build/app /bin/app
When I build this locally with DOCKER_BUILDKIT=1 docker build --build-arg BUILDKIT_INLINE_CACHE=1 --target=final ...
, I see that changes to the flags given to ./configure
in the build
stage cause the ./configure
and make
RUN
steps to be re-run but not the dnf install
and curl
steps. If I then push the image to a registry, clear the local cache with docker builder prune
, and run docker build
again with --cache-from
pointing at the pushed image, a change to the ./configure
flags results in all the steps in the build
stage being re-run.
My understanding of what is happening is that when using the inline cache only the /opt/build/app
file is written into the docker image along with hashed metadata about the steps in the build
stage that generated it but no other data generated from the build
stage is written into the image. Thus when using --cache-from
, when docker detects a change in the build
stage, it needs to run all the steps again. However, when running the build locally, docker has an additional cache that matches steps based on their history and that can be used for the intermediate steps of a multi-stage build. These cached steps are not written into the image.
I believe these intermediate steps could be captured for future caching by building with --target=build
, pushing the build
image, and using --cache-from
twice, once for the build
image and once for the final
image. Also, I believe using --cache-to=type=registry,mode=max
could store all the intermediate layers.
Is this all correct?
My main motivation for asking is a discussion with a colleague. They would like the final three steps in the build
stage to be condensed into a single step to avoid writing unnecessary data into the image. For a single stage image, I agree that that is better practice for keeping the image small. However, here it seems that if we only target the final
stage then we do not end up with intermediate data written into the image any way and keeping the steps separated allows for some local caching which is convenient – for example it saves needing to curl
the source code every time that the ./configure
flags are adjusted while debugging.
Some questions:
- Are there other nuances to caching and image size to consider here?
- Is there a way to inspect the inline cache to compare the two cases? Is that what
docker history
shows? - Is using a build stage to split what would be one step into multiple for readability / caching a legitimate use case or is it overly relying on a detail of how inline caching works?
Some notes:
- I built the image with the steps combined and found that
docker image list
reported the same size. - Breaking out a step into multiple in a separate stage for better caching could be seen as a debugging trick that should be removed when the debugging is done.