Reducing docker image size

I have a problem related to docker image size. We create distributable docker images using the ‘docker save …’ command. However, in my case we have a requirement of multiple docker images being based off a base image (say centos:7). Third party people would implement multiple docker images based off my base docker image and would be able to deploy that image independently.

However, each docker image ends up saving my base image that is based of centos:7 image. This is leading to each docker image that is being saved using ‘docker save…’ being fat.

Is there a way to package (save) only my files that would exclude the base layers ? How does one exclude the base layers from docker image ?

If you docker save all of the images at once, you will get only one copy of the base layers.

There’s not otherwise a standard way to export only the “top half” of an image’s layers. I’ve seen at least one mention of someone who had been able to manually edit the exported tar file; the format is documented and seems to have been reasonably stable across releases. The Docker engine was by and large able to cope with their edited files, though note that the error message if the “bottom half” of the layers are missing isn’t really helpful to the user (it will have the hex ID of the missing base layer and not its tagged name).

OK, this should be considered as an enhancement for docker. Feature should take in some kind of signature or md5sum of the base image to ensure validity of the only top layer being saved and later for re-import. It is just like maven dependencies. While building my own artifact in maven, I just mention my dependency with its version and I can build my own artifact independently.

This would be feature much needed, thanks.

The best analog to that specific thing is the FROM line in Dockerfiles, which have exactly this behavior: when you build your image, you specify something like FROM ubuntu:16.04 with the name and version of a base image, usually a well-known one on the public Docker Hub. In turn what you get out, in effect, is not just the jar file of the thing you’re building but also the jar files of all of your direct and indirect dependencies; and the thing you’re asking for is the ability to distribute only that final jar file even if the target system won’t have its prerequisites and won’t necessarily know how to get them.

This is complicated further by the named tags not pointing at static things. I know ubuntu:16.04 has changed at least once as a new set of patches have gone into the base image. If I say my layer depends on 47bcc53f74dc, that’s more specific, but I don’t think the Docker registry API will let you get an arbitrary hash without knowing something about where it came from, and there’s no guarantee that that specific hash value hasn’t been overwritten.

[To be clear, I’d find a standard way to do this to be useful as well, but I control a base image that’s several gigabytes in size, so I can usually say pretty definitively that either you have that base image or you don’t have our product installation.]

OK I just mentioned md5sum as an example but that need not be. If docker can take docker repository url, base image name, tag/version etc to help me create the image containing only the upper layer of my stuff, I am fine with that approach as well. Give it a thought and see if that is possible. While loading this upper layer image only, it must do prior check of dependent lower layer.