I’m building the following Dockerfile twice using buildx. For some reason I fail to understand the layers produced from the RUN command have different digests. The two executions of the build happen within seconds on the same host (Mac Intel, Docker Engine 24.06).
This is the Dockerfile:
RUN sh -c "echo hello"
These are the commands I’m using to run the build twice:
docker buildx build --build-arg BUILDKIT_STEP_LOG=1 --progress plain --no-cache --tag ca-test:1 .
docker buildx build --build-arg BUILDKIT_STEP_LOG=1 --progress plain --no-cache --tag ca-test:2 .
The resulting layer digest for the first build are:
While for the second build, they are (note the difference in the last layer):
Any help would be greatly appreciated.
Maybe the dive tool (link) can provide more insights on the layer and it’s content.
I’m familiar with
dive, but not an expert. As far as I could see, it doesn’t provide any additional information about the impact of this layer on the content of the image (which is no impact whatsoever …).
Using ^u to filter out unmodified files, this layer doesn’t seem to have any impact on the content of the image (as expected).
And BTW - I was able to reproduce the same phenomenon on a Ubuntu host.
You disabled the cache so your new layer created by the RUN instruction will always be different.
Isn’t the digest of the layer computed based on its contents? If so, there’s no difference between the content of the layers in the two builds. Also, using a different base image (e.g. alpine) does produce the same layer digest time after time again.
Docker builds are repeatable, but not reproducible.
If you build an image with the same Dockerfile twice, the content of the image can vary in package versions, and will vary in the data of files, and in the date of the image layer metadata.
For your Dockerfile example, if cache would be used, I would have expected that Docker detects no change and uses all image layers from the build cache
@meyay - I’m aware of the fact that the resulting image may vary between two builds because of timestamps or external dependencies such as packages. My question, however, was about layers digests - not image, and the dockerfile quoted above clearly doesn’t have any external dependencies.
This is even more strange considering the fact that slight changes to the dockerfile (different base image or copying a file to the root directory of the image) do produce the same layer digest.
As far as I know the cache works based on what the parent layer was and what the command was that generated it. That’s why running
apt-get update in a separate RUN instruction is not a good idea as the cache would never be invalidated unless a parent layer is also invalidated, since the command is the same. When you put it in front of
apt-get install and you change the package list, that invalidates the layer and apt update runs. I know, its image layer not filesystem layer, but these are related. Note that I’m not saying it works as I describe here, but this is how I imagine now. So you disable the cache, which also means you want to run the command in the RUN instructions again. That will result some output and Docker will not now what the output would be. The same command could produce a random output as well, so it will be stored in a different folder. After that Docker could make a hash and try to find the same content somewhere else and drop the temporary folder, but it seems this is not what is happening.
Can you show an example file? I can’t imagine how that would work, as a different base image means different content in the image so every output could make different changes even if the commands are the same.
When you copy a file, Docker can check the content before copying and the command is also the same Since the filesystem layers are mounted on top of eachother, it’s not a problem even when you mount it on top of totally different layers when the base images were different.