How to keep Docker Image for downloading small?

I have a Dockerfile that downloads a large 2GB tar via curl, untars it and linked to a volume in docker-compose. This volume is then consumed by another service within the compose.

However, because of this, the image size of the Dockerfile is 2GB, and the volume is also 2GB. If I’m understanding it correctly, it’s doubling the size taken in my filesystem.

Is it possible to have a downloader service that stores data directly in a volume without persisting on the image?


Excerpt of my project

Dockerfile

FROM ubuntu:22.04
WORKDIR /project
...
RUN ...
    && curl FILENAME -o $(basename "$FILE_NAME") \
    && tar -xf $(basename "$FILE_NAME") \
    && rm $(basename "$FILE_NAME")
    ...

docker-compose.yml

  myservice:
    build:
      context: .
      dockerfile: Dockerfile
    volumes:
      - myvol:/project

  myotherservice:
    ...
    volumes:
      - myvol:/otherproject

  volumes:
    myvol:

Your Dockerfile is already optimzied to not retain the downloaded file in an image layer.

Sure, modify your entrypoint script to do it, and you should be good.

I have a languagetool image, that downloads huge language models based on environment variables. You can check out the entrypoint script here: https://github.com/meyayl/docker-languagetool/blob/main/entrypoint.sh

1 Like

Thanks meyay! I got it!

Instead of downloading with Dockerfile, we instead download it via ENTRYPOINT the shell script.
The catch is that the ENTRYPOINT only executes when a container is created. This means that the size of the image is independent of the download!

This does mean that ENTRYPOINT executes again when a container is created, starting another download. We can avoid that by checking if the file already exists (which should, given that the volume used is the same).

You are on the right track!

Minor correction:
This does mean that ENTRYPOINT executes every time a container based on the image is started , starting another download. We can avoid that by checking if the file already exists (which should, given that the volume used is the same).

This is true regardless whether a container is fresh created and started, or an existing container is stopped and restarted.

I see! i didn’t know that, thanks for the correction and insight!! :slight_smile: