Is image uploading to a (private) registry done "efficiently"?

I have a question about how uploading to a registry works. If I build an image from a base image, does the uploading process use the fact that it’s derived from a well-known base image (or just one it already has in its registry) and only upload (and eventually download) the pieces that are specific to the derived image?

Our group is working towards having an internal private registry, and I want to make sure everyone is clear on the benefits of this. I was under the impression that the image uploading and downloading process is more efficient than just exporting a tar and uploading and downloading that. I hope I’m not wrong.

I’m pretty sure the standard registry v2 setup is that it only uploads layers that haven’t already been uploaded for this (named) image.

If I have a Dockerfile like

FROM ubuntu:16.04
RUN echo 1 > /x

and docker build -t registry.example.com/image:1; docker push registry.example.com/image:1, it will push both the Ubuntu base layer(s, I think actually 5) plus the layer for the RUN command.

If I change the RUN command to RUN echo 2 >/x, docker build -t registry.example.com/image:2; docker push registry.example.com/image:2, the system will figure out that the Ubuntu base layers have already been uploaded and only upload the new tiny layer with the /x file.

If I build something totally different, docker build -t registry.example.com/other; docker push registry.example.com/other, my understanding is that it will push the entire underlying layer chain again under the different name.

Just so we’re clear, if I have 100 Dockerfile-based projects using the same registry, such that all of these images derive from a common base image (like “ubuntu:16.04” as above), the first time each one is built, they will all upload the base image layer to this registry, even though they all use the same base image?

I understand that subsequent builds with the same tag will just upload the layers that change, so the likely quite large base layer would not be uploaded. I just want to be clear on whether any optimization is done with unrelated images in the same registry.

As far as I understand it the main efficiency benefit of a registry is layer re-use (say you have a 1GB image and only 5MB of it changes, using docker save / docker load you’ll be schlepping around 1GB every single image “push” regardless). It has other benefits like authorization, auditing, possibility of using signed images (not sure the exact mechanics of how this works), etc.

There is also the matter of compression. I think docker push will gzip images by default when it uploads them. You’ll end up rolling your own layer for this if you avoid a registry.

No, they will only upload the layers that are unique to each image. You can see this if you make a Dockerfile like so and docker push to Docker Hub

FROM debian

RUN echo foo
RUN echo bar

When you docker build -t davidmichaelkarr/foobar .; docker push davidmichaelkarr/foobar this Dockerfile you should see some messages about “Image layer already exists, skipping” (or “Mounted from library/debian”). That’s because the image ID associated with debian already exists on Docker Hub. The local Docker and registry talk back and forth to figure this out and only send what has changed.

Obviously the first image you build and push will send the base layer but after that all subsequent pushes will re-use it.

1 Like