Docker Community Forums

Share and learn in the Docker community.

Docker save images: image tarball has symlinks between layers?

docker

(Pindroposakhi) #1

Setup:


Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 516
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-1060-aws
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67GiB
Name: ip-10-60-5-65
ID: VJPH:S4RT:BP2J:D4AF:AF7L:ZJVP:QS7Y:YOSL:QPAC:EVMY:GCEW:JVA5
Docker Root Dir: /home/ubuntu/data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Scenario:

We’ve got an automated docker build process that runs every few hours, feeds in new data files to the build using --build-arg, and creates a new image, tags it, and it as a tarball via docker save $IMAGE:$TAG | gzip > image.tgz.

For our use case, we have to publish the image to our own registry without using the docker daemon, so we’ve hand-rolled our own solution that unpacks the tarball and publishes each layer one-by-one. The untarred layers in the image each have a layer.tar file in them.

In recent weeks, without any change to the underlying Dockerfile, we’ve noticed that one of those $LAYERNAME/layer.tar files symlink to a $OTHERLAYERNAME/layer.tar in a previous layer. Specifically, the final layer in the manifest.json file has a tar file that symlinks to a tar file a few layers prior to this final one (in the ordering in manifest.json).

Ex: (output truncated)

$ ls -al $(find . | grep tar)
-rw-r--r--  1 ubuntu  1718279092   2.0K Jun  3 22:32 ./77ff53ee0f3089958822132855087fc29fbf1c55aa87ace208bd4afd9c420898/layer.tar
lrwxr-xr-x  1 ubuntu  1718279092    77B Jun  4 16:03 ./96eace2f03af5a2b3a715dd77590c2cd33e35cb86af84461fd5440a48605cd29/layer.tar@ -> ../77ff53ee0f3089958822132855087fc29fbf1c55aa87ace208bd4afd9c420898/layer.tar

The Dockerfile does nothing special, we simply base off ubuntu:16.04, install python and some other libraries, copy in some data files, and finally set up an entrypoint script to setup a server to serve information from the data files. However, we have been unable to write Dockerfiles following the same pattern that replicate this symlinking behavior.

Questions:

  1. Where in the docker source code does it decide if a layer should create a new layer.tar?
  2. How can I write a Dockerfile in such a way that I could replicate this scenario?
  3. Is this effect a facet of layer caching?
  4. Is there a production-ready solution out there for pushing a docker tarball layer-by-layer to a private registry without using the docker daemon?