Docker Community Forums

Share and learn in the Docker community.

Replacing Large Files Without Increasing Image Size

I have a situation where I have a large file in my docker image that gets replaced with each new version of the image. So say there is an image_a which comes from a base image. Then image_b is from image_a. And image_c is from image_b. Various changes are made in each image, but of particular interest is one large file I have that gets replaced.

What ultimately happens is that when I replace the file, the new file size is added to the new image. This happens with each successive image I create. The simplest way I can show this is like this. Say you have three docker files:

#Dockerfile_a
FROM ubuntu:latest
WORKDIR /mydata
RUN dd if=/dev/urandom of=data.dat bs=1M count=100

#Dockerfile_b
FROM image_a:latest
WORKDIR /mydata
RUN dd if=/dev/urandom of=data.dat bs=1M count=100

#Dockerfile_c
FROM image_b:latest
WORKDIR /mydata
RUN dd if=/dev/urandom of=data.dat bs=1M count=100

As you can see. Each file builds on the previous. And each effectively replaces a large file. I don’t need to have the previous version of the file. If you then run the following:

docker build -t image_a -f Dockerfile_a .
docker build -t image_b -f Dockerfile_b .
docker build -t image_c -f Dockerfile_c .

Then do a “docker images”, you can see that each file is more than 100MB larger than the previous. Clearly the filesystem overlay is just adding the new file (and size) to each image. In many scenarios that is probably what you want, but in this case I don’t want that. I don’t want to have each version grow in size like this. I really want to replace the file so that images stay relatively the same size.

Is there a way to solve this problem? Each version of my images are growing much more than I want them to. And this isn’t going to be sustainable long-term. Maybe there is a better way to handle this. Any advice would be appreciated.

Hmm … can’t you just add a “RUN rm -f /mydata/image_a” inside your Dockerfile to get rid of the old stuff ?
Also overwriting should not lead to “piling up” … if the same name is used …