I’ve created a image, based on an image that is already published on dockerhub:
estebanmatias92/hhvm 3.7.0-fastcgi 8f2e7309ce69
This image has the following history:
$ docker history estebanmatias92/hhvm:3.7.0-fastcgi
IMAGE CREATED CREATED BY SIZE COMMENT
8f2e7309ce69 3 weeks ago /bin/sh -c #(nop) CMD ["hhvm" "--mode" "serve 0 B
2116df048ec4 3 weeks ago /bin/sh -c #(nop) EXPOSE 9000/tcp 0 B
1b07751941c9 3 weeks ago /bin/sh -c mkdir /var/run/hhvm 0 B
2e417278871a 3 weeks ago /bin/sh -c #(nop) COPY file:15df6ddfe60b3f2c5 500 B
5a0fdf66ef9a 3 weeks ago /bin/sh -c #(nop) COPY file:7027bf3a1a6376a39 1.925 kB
bcc40f95c372 3 weeks ago /bin/sh -c /usr/bin/update-alternatives --ins 5.879 kB
ef0eb6a48f67 3 weeks ago /bin/sh -c set -x && git clone git://gith 105.3 MB
1811eb2b8921 3 weeks ago /bin/sh -c #(nop) ENV HHVM_VERSION=HHVM-3.7.0 0 B
5a5c97cbfda7 3 weeks ago /bin/sh -c mkdir $PHP_INI_DIR 0 B
5df616860635 3 weeks ago /bin/sh -c #(nop) ENV PHP_INI_DIR=/etc/hhvm/ 0 B
3915977dff9c 3 weeks ago /bin/sh -c apt-get update && apt-get install 24.97 MB
ec59dd9636e9 3 weeks ago /bin/sh -c apt-get update && apt-get install 791.9 MB
0e3c9f825252 3 weeks ago /bin/sh -c #(nop) MAINTAINER "Matias Esteban" 0 B
bf84c1d84a8f 3 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
64e5325c0d9d 3 weeks ago /bin/sh -c #(nop) ADD file:085531d120d9b9b091 125.2 MB
My image uses FROM estebanmatias92/hhvm:3.7.0-fastcgi. I’ll leave out the details, as they don’t matter for this case. After I’ve built my image and try to push it to docker hub, it still pushes some of the base images layers:
The reason for this is that images can be private.
So: imagine you eavesdrop on someone and get access to the layer ids of their private content. If you could push an image to the hub pointing to these layer ids, you could then gain access to the layer themselves.
So, in this case, the hub requires you to upload the actual content, in order to prove that you do have access to it.
Next time you push a different tag of the same image though, the shared layers will not have to be pushed.
There are certainly optimizations that can (and will) be worked on in the future to not push, for example, layers that are known to be from public images, but basically the answer to your question is: the first time you push an image under your name, you have to prove you have the layers you want to link to.
This makes me wonder what happens during download: If I base my image on another public image, will users of my image then have to download my copy (if there is any…) of the layers of the public base image again, even if they already have them?
During download you should never have to download a resource that you already have (eg: if you have a layer with digest X, then it’s usable for any image that link to it without having to redownload it).
When these optimizations are going to be implemented?
On some cloud continuos integration systems you end up pushing again over and over all the layers.
Sometimes the layer that is being built is 10Mb and the base image is 600Mb…
That is a useless waste of bandwidth and time.
There is no reason why you would “push again over and over” all layers.
Once you push the base layers to a given image (first time pushing to “user/foo”), you never have to push them again.
There is still some problems with pushing Private Images to DockerHub. I have loads of tags and what I’m changing is only top layer that is max 5 MB. But every time push still pushes loads of layers.
Let me exemplify:
The optimizations now appear to be in place. Docker should no longer push base image layers that already exist on the registry. Instead, the client will perform what is referred to as a “cross-repository blob mount”, which makes a blob from a different repository available in the one being pushed.
I’m developing a repository from microsoft/windowsservercore.
When pushing my image, I see the big fat layers of microsoft/windowsservercore. being uploaded also.
Initially my repository was private and when I read this thread I switched it to public.
This is snapshot from the push
As you can see the last two layers are the big ones that it will try to push. Based on my understanding that shouldn’t happen from the first time and not just when updating the repository.
History is
IMAGE CREATED CREATED BY SIZE COMMENT
482229c83c20 53 minutes ago cmd /S /C #(nop) CMD ["cmd" "/S" "/C" "po... 41 kB
0a70c1d54c60 53 minutes ago cmd /S /C #(nop) ENV packagesPath=~/Packages 41 kB
eb4744e51270 53 minutes ago cmd /S /C #(nop) ENV apikey=mininugetserver 515 MB
4c19c500948d 53 minutes ago cmd /S /C #(nop) EXPOSE 80/tcp 281 MB
b63e883f08e3 53 minutes ago cmd /S /C powershell -NoProfile -NonIntera... 177 kB
ab445b6a0384 About an hour ago cmd /S /C powershell -NoProfile -NonIntera... 47.4 kB
9e55e4be25c6 About an hour ago cmd /S /C #(nop) ADD tarsum.v1+sha256:912b... 111 kB
1bb8e1cd2394 About an hour ago cmd /S /C #(nop) COPY file:1158ad9458093d9... 41 kB
fff7d3803a6a About an hour ago cmd /S /C #(nop) ADD dir:d3b205d629ebb7e85... 1.74 GB
6ad1e575a6a8 4 days ago cmd /S /C #(nop) MAINTAINER Alex Sarafian 7.68 GB