Docker save images: image tarball has symlinks between layers?


We’ve got an automated docker build process that runs every few hours, feeds in new data files to the build using --build-arg, and creates a new image, tags it, and it as a tarball via docker save $IMAGE:$TAG | gzip > image.tgz.

For our use case, we have to publish the image to our own registry without using the docker daemon, so we’ve hand-rolled our own solution that unpacks the tarball and publishes each layer one-by-one. The untarred layers in the image each have a layer.tar file in them.

In recent weeks, without any change to the underlying Dockerfile, we’ve noticed that one of those $LAYERNAME/layer.tar files symlink to a $OTHERLAYERNAME/layer.tar in a previous layer. Specifically, the final layer in the manifest.json file has a tar file that symlinks to a tar file a few layers prior to this final one (in the ordering in manifest.json).

Ex: (output truncated)

$ ls -al $(find . | grep tar)
-rw-r--r--  1 ubuntu  1718279092   2.0K Jun  3 22:32 ./77ff53ee0f3089958822132855087fc29fbf1c55aa87ace208bd4afd9c420898/layer.tar
lrwxr-xr-x  1 ubuntu  1718279092    77B Jun  4 16:03 ./96eace2f03af5a2b3a715dd77590c2cd33e35cb86af84461fd5440a48605cd29/layer.tar@ -> ../77ff53ee0f3089958822132855087fc29fbf1c55aa87ace208bd4afd9c420898/layer.tar

The Dockerfile does nothing special, we simply base off ubuntu:16.04, install python and some other libraries, copy in some data files, and finally set up an entrypoint script to setup a server to serve information from the data files. However, we have been unable to write Dockerfiles following the same pattern that replicate this symlinking behavior.


  1. Where in the docker source code does it decide if a layer should create a new layer.tar?
  2. How can I write a Dockerfile in such a way that I could replicate this scenario?
  3. Is this effect a facet of layer caching?
  4. Is there a production-ready solution out there for pushing a docker tarball layer-by-layer to a private registry without using the docker daemon?

This feels like bit of a necro but it’s bitten me in production today.

We use docker save and gpg to sign images before uploading the resulting docker image to AWS ECR.

Normally it works fine and a client can pull the image from the shared ECR via IAM and the docker cli. Then use GPG to verify the resulting tar produced by docker save. If it’s all good they can use docker load on an airgapped network.

Happy times usually.

Today we had a build that has symlinked layers (unexpectedly) and those symlinks make the tar file non deterministic and thus gpg signing the file fails to verify later - as when you run docker save from this image saved in ECR the symlink layer modification time is the time you run the docker save command not the image creation date at source.

Due to this the tar fails to be the same hashsum and the signed image won’t verify.

If there’s a way to disable docker build creating an image with symlinks, or docker save doesn’t use symlinks when exporting to tar I’d be so thankful.