I recently experienced unexpected behaviour from docker import
and docker build
when starting with a static tarball.
The scenario can be reproduced with this series of commands:
echo hello >hello.txt
echo -e 'FROM hello\nENV foo=bar' >Dockerfile
tar -cf hello.tar hello.txt
docker import hello.tar hello
docker inspect -f '{{.Id}} {{.Created}}' hello
docker build .
docker build .
docker import hello.tar hello
docker inspect -f '{{.Id}} {{.Created}}' hello
docker build .
Explained:
- I begin with a tarball with a single file, created once and not updated.
- I use
docker import
to import this tarball as an image namedhello
. This new image has a particular Id (egsha256:...
) and Created timestamp. - I have a Dockerfile which uses this imported
hello
image as its base and adds a single layer which sets an environment variable. - Executing
docker build
once performs the necessary steps. Executingdocker build
a second time recognises the layers from the previous build and reports---> Using cache
as expected. - Re-importing the same tarball with the same contents and same file timestamps, results in an image with a new Id and new Created timestamp. Unexpected when the source tarball is unchanged.
- Re-running
docker build
does not leverage the build cache even though the base layer should have identical contents to the base layer used last time.
I am assuming the changed Id and Created timestamp of the re-imported image are responsible for breaking the build cache. I feel that docker import
should be deterministic, or at least accept an argument to specify a Created timestamp (or use the tarball timestamp) and hopefully lead to a consist sha256 Id hash (it is supposed to be a content-derived hash right?).
Once docker import
behaves deterministically, I presume docker build
would then use the build cache as expected.
Before I raise this as a GitHub issue, are my expectations out of sync with the Docker image system?