Reduce Docker Image Size

Expected behavior

Image with less size

Actual behavior

Resulting Image with size of 2.65 GB

Additional Information

---- Dockerfile

FROM python:3.8-slim-buster
WORKDIR /app
COPY . .
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
RUN rm -rf /root/.cache
CMD ["python", "src/test_script.py"]

---- requirements.txt

transformers
torch

---- test_script.py (minimal)

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from typing import List
import torch
import re

Steps to reproduce the behavior

  1. docker build (either locally or in AWS ECR)
  2. docker images (the size was 2.65 GB)

Its messed up when you are doing cdk deploy and the docker push stage takes forever to complete and then the stack to be created.

Is there anyway to minimize the size of the image which will then speed up the deployment? I even tried multi stage docker builds, but could not make it work. Any suggestions would be appreciated!

Since you copy the whole context into the image which we don’t know anything about. To actually be able to reproduce it, we should know what is in your build context. I guess you do something before docker build which creates a large amount of data in the context which you copy into the image.

If this is what happens, I suggest you use a .dockerignore file (similar to .gitignore) so then

COPY . . 

willl not mean “everything” only everything except what is listed in the dockerignore file. Or, you can say nothing, except what yoi list in that file.

*
!/want/this/folder1/
!/want/this/folder2/

I also edited your post to use code blocks. This way your message can be more clear for us. Please, use code blocks next time. (</> button)

Thank you, and sorry I did not include the context before.

Please find it here!

Some questions:

  • Are there any files or folders hidden?
  • How you run docker build exactly? Could you share the command?
  • How large the removed /root/.cache folder’s content was? Do you know that content will remain in your image? It is just hidden from the following layers.

You can try something like this:

FROM python:3.8-slim-buster
WORKDIR /app
COPY . .
RUN pip install --upgrade pip \
 && pip install -r requirements.txt \
 && rm -rf /root/.cache
CMD ["python", "src/test_script.py"]

This way the installation and the deletion of the cache would be in the same layer, so the cache would be actually and phisically removed. Of course, it woul slow the build process down when you change the requirements and try to rebuild, since it would always upgrade pip. Which by the way I would not do this way, since you have no controll over which pip version will be the end result. This is not really related to your issue, but It can lead to other problems, so I would do this way:

FROM python:3.8-slim-buster
WORKDIR /app
COPY . .
RUN pip install pip==22.1.1 \
 && pip install -r requirements.txt \
 && rm -rf /root/.cache
CMD ["python", "src/test_script.py"]

Use your preferred pip version of course so you always know what to expect from the build.

1 Like
  • No files or folders are hidden.
  • docker build -t test . (from container/)
  • I did not see the size of the /root/.cache folder.

Sure, I will try with the way you mentioned here, and see if it decreases the image size.

I think it worked, and now it is half less.

  • In local docker: 1.87 GB
  • In ECR: 845.22 MB

This is nice. Thank you. Could we even decrease more by using multi-stage builds? Coz, when I tried that I could not utilise the pip dependency from one stage to another.

I’ve not read all details, but how come things are different on your local machine and on AWS ECR? Are you building the image on two locations? (That’s where a registry like Docker Hub comes in: build once, deploy wherever you want. But maybe this is just testing?)

Also:

If you want to know, then use your first Dockerfile and then add some RUN du -sh /root/.cache or something like that? (And then watch the output of that command on first run when creating the image.)

And finally:

What exactly did you try, what exact error messages did you get? Please don’t expect us to repeat instructions that you may already have tried.

I gues it is just because the local version is still where the cache was removed in a different layer. So it is just for us to compare the sizes. Am I right @smanjil ?

I agree, but I think common mistakes could be copying to a wrong folder, not copying everything or using a different python version in the target stage or using a debian based image in the source and an alpine based in the target. The solution is not always the same, so we need to know more about what you tried…

However, I don’t think you could get much smaller image since you only install python packages and you already removed the cache. I would rather check the size of the base image, build the new image and check the new size of the folder of the python packages. If the size is not as big as you would expect, you can try multistage build.