Docker save images: image tarball has symlinks between layers?

Setup:


Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 516
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-1060-aws
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67GiB
Name: ip-10-60-5-65
ID: VJPH:S4RT:BP2J:D4AF:AF7L:ZJVP:QS7Y:YOSL:QPAC:EVMY:GCEW:JVA5
Docker Root Dir: /home/ubuntu/data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Scenario:

We’ve got an automated docker build process that runs every few hours, feeds in new data files to the build using --build-arg, and creates a new image, tags it, and it as a tarball via docker save $IMAGE:$TAG | gzip > image.tgz.

For our use case, we have to publish the image to our own registry without using the docker daemon, so we’ve hand-rolled our own solution that unpacks the tarball and publishes each layer one-by-one. The untarred layers in the image each have a layer.tar file in them.

In recent weeks, without any change to the underlying Dockerfile, we’ve noticed that one of those $LAYERNAME/layer.tar files symlink to a $OTHERLAYERNAME/layer.tar in a previous layer. Specifically, the final layer in the manifest.json file has a tar file that symlinks to a tar file a few layers prior to this final one (in the ordering in manifest.json).

Ex: (output truncated)

$ ls -al $(find . | grep tar)
-rw-r--r--  1 ubuntu  1718279092   2.0K Jun  3 22:32 ./77ff53ee0f3089958822132855087fc29fbf1c55aa87ace208bd4afd9c420898/layer.tar
lrwxr-xr-x  1 ubuntu  1718279092    77B Jun  4 16:03 ./96eace2f03af5a2b3a715dd77590c2cd33e35cb86af84461fd5440a48605cd29/layer.tar@ -> ../77ff53ee0f3089958822132855087fc29fbf1c55aa87ace208bd4afd9c420898/layer.tar

The Dockerfile does nothing special, we simply base off ubuntu:16.04, install python and some other libraries, copy in some data files, and finally set up an entrypoint script to setup a server to serve information from the data files. However, we have been unable to write Dockerfiles following the same pattern that replicate this symlinking behavior.

Questions:

  1. Where in the docker source code does it decide if a layer should create a new layer.tar?
  2. How can I write a Dockerfile in such a way that I could replicate this scenario?
  3. Is this effect a facet of layer caching?
  4. Is there a production-ready solution out there for pushing a docker tarball layer-by-layer to a private registry without using the docker daemon?

This feels like bit of a necro but it’s bitten me in production today.

We use docker save and gpg to sign images before uploading the resulting docker image to AWS ECR.

Normally it works fine and a client can pull the image from the shared ECR via IAM and the docker cli. Then use GPG to verify the resulting tar produced by docker save. If it’s all good they can use docker load on an airgapped network.

Happy times usually.

Today we had a build that has symlinked layers (unexpectedly) and those symlinks make the tar file non deterministic and thus gpg signing the file fails to verify later - as when you run docker save from this image saved in ECR the symlink layer modification time is the time you run the docker save command not the image creation date at source.

Due to this the tar fails to be the same hashsum and the signed image won’t verify.

If there’s a way to disable docker build creating an image with symlinks, or docker save doesn’t use symlinks when exporting to tar I’d be so thankful.