Copy entire layer from previous build stage in multi-stage build

Hi to all,

This is my first time posting here, so I’m sorry if this isn’t the relevant category for this post, but I couldn’t find one more suitable for it.

What I’m currently experiencing is a difficulty reducing the final docker image size for an application that I’m developing. With the multi-stage build system some issues are resolved around the “builder pattern”, but since my application is built with C++ on top of several open-source libraries (most of which don’t play nicely when trying to compile statically) I need to make sure that I use only what is needed in the final image for distribution.

Initially I was just downloading the source code repositories and archives and compiling the libraries, but the image size became too large, so I added commands to delete them and the image became smaller, but it can become even smaller if only the relevant files are in it.

Since for some libraries I need the latest version of them (which isn’t present in the apt-get repositories) I have to download them and have the dependency dev libraries from apt-get to be able to compile them. But once they are compiled, the dev libraries aren’t needed. Sure I could have a command to delete the dev libraries after that, but the way the layers work, it won’t result in a smaller image size, or if I have all commands on 1 line, I won’t be able to benefit from the caching mechanism.

What I would like to do is use a multi-stage build where the first build phase has all the development libraries and compiles all the needed opensource libraries. After that I have a build phase where only the runtime versions of the libraries are installed + the compiled libraries from the previous stage are copied over. Since some libraries generate tons of files, I can’t easily trace them to generate a relevant COPY command and since I don’t see a way to copy an entire layer (maybe if this feature was added this would’ve been possible - https://github.com/moby/moby/issues/32100) I don’t see what options I have in front of me to achieve this. A COPY of the entire library or include folders won’t work since it will copy the development libraries as well (if I’m not mistaken, or if the case is otherwise, please correct me). Maybe a linux command that is run with the RUN commands to note the new files in a text file and later on be used for copying from the previous build phase (which I don’t think is possible since copying from the previous build phase requires a docker command).

Any tips, comments and feedback would be welcome as I currently think I’ve reached a dead end, but would like to make my docker image as small as possible, since if some of the functionalities discussed above were present, this would be achievable.

Two suggestions:

  • You can COPY an entire directory tree or glob pattern; or
  • You can RUN tar in the builder image, then ADD that tar file in the runtime image.

A probably-works answer would be to make install each package in the builder image, and then in the runtime image COPY /usr/local/lib/*.so* and /usr/local/share and some other things (but not /usr/local/lib/*.a or /usr/include, if that’s what you’re worried about excluding).

Many packages support installing into an alternate place (make install DESTDIR=/install PREFIX=/usr will install things into /install/usr/..., and set up absolute paths so they think things are installed in /usr if that’s relevant) which can be helpful; this trick is how many .deb packages get built. If you went down the tar path you could manually pick out which files or directories should or shouldn’t be included (it’s been a while but I remember Debian package creation needing some curation on runtime/development/data packages too).

Are you worried about the installed header files and static libraries, or the original source tree? The original source tree is much larger usually but if you make install it’s easy to just not include it, and you’ll get most of the size benefits.

Thanks @dmaze

Yes, I was actually worried about all the bloat files that don’t get used, since there would be package files from apt-get as well as my installed libraries.

Masking the files which are to be copied will definitely get rid of some files that aren’t needed + the fact that libraries can be installed into separate folders sounds like it would definitely work.

I’ll give it a shot and post a sample snippet if it works so that other people in my situation might use it if it is helpful.

I managed to shave a lot from the size of the docker image with the above suggestions. Some things to note are:

  1. Put your delete commands along with the apt-get commands

  2. Find out additional folders that can be cleared such as doc, tmp, etc.

  3. When copying over libraries, take note that usually libraries have 3 so files - 1 for example .so.4.2.0.13, another .so.4 and a third .so - the second and third are symlinks to the first, but are copied as files, so if you find out the exact names, you can copy just one and make the other 2 symlinks to it, thus saving 3 times the space.