Relative impact of combining multiple RUN commands?

I noticed in “Docker in Practice”, they suggest that you might get more compact images if you selectively combine multiple “RUN” commands into single commands. For instance, if you’re installing 20 packages through apt-get, instead of doing each on its own RUN line, just specify them all on a single “RUN”. This saves space because Docker creates a separate layer for each “RUN”.

I can see that this is possible, but how significant are these savings? Obviously, there is value in creating those layers, for various reasons, but this might be a pragmatic option if storing and sending numerous image files is having an impact on overall system performance.

However, if the space savings are negligible, then it’s moot. Does anyone have an idea of the relative space savings from this kind of change?

1 Like

Although this question has been asked a while ago, I am still searching for a precise answer.

How mucch space you save by using a single RUN instruction instead of multiple RUN instructions depends on the exact case. It could be small or a lot. It is one way among others to optimize the image size, not to mention that since each RUN instruction is a separate temporary container, if you set a vaiable in the command not with the ENV or ARG instructions, those will not be available in the next RUN instruction.

Also the number of possible layers are limited. If you have a lot of RUN instructions, that means a lot of layers and if someone uses your image as a base image, and their image is also used by someone else as a base image, news layers are added until they cannot create new layers due to the limits of the Overlay filesystem.

You could also have some files generated in a layer in build time, which you don’t want to distribute. If you run a delete command in a new RUN instruction, the file will not be deleted, just hidden and anyone could find it who is really looking for it or just accidentally find it when looking for something else but the search pattern matches that file too on the filesystem.

So it is not just about saving space and optimizing the image size, although saving space also means if your image is smaller, you can pull and push it faster so your CI/CD pipelines can be faster which also could mean paying less for the service.

So using a single RUN instruction is often a good practice, and you generally want to do everything to optimize your images if you get it for free without downsides. It doesn’t mean you should put everything into a single RUN instruction to have a single layer, since there are layers that you very rarely rebuild, and some other laers that you rebuild almost every time or actually every time so using a new RUN instruction can make things faster since lower layers can be loaded from the image build cache…

So use a single RUN insruction for a dingle task, which includes doing something and cleaning up, so you have nothing in your layers that is not needed, but use a new RUN instruction whet it is required for using the image build cache.