FROM python:3.8-slim-buster
WORKDIR /app
COPY . .
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
RUN rm -rf /root/.cache
CMD ["python", "src/test_script.py"]
---- requirements.txt
transformers
torch
---- test_script.py (minimal)
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from typing import List
import torch
import re
Steps to reproduce the behavior
docker build (either locally or in AWS ECR)
docker images (the size was 2.65 GB)
Its messed up when you are doing cdk deploy and the docker push stage takes forever to complete and then the stack to be created.
Is there anyway to minimize the size of the image which will then speed up the deployment? I even tried multi stage docker builds, but could not make it work. Any suggestions would be appreciated!
Since you copy the whole context into the image which we donāt know anything about. To actually be able to reproduce it, we should know what is in your build context. I guess you do something before docker build which creates a large amount of data in the context which you copy into the image.
If this is what happens, I suggest you use a .dockerignore file (similar to .gitignore) so then
COPY . .
willl not mean āeverythingā only everything except what is listed in the dockerignore file. Or, you can say nothing, except what yoi list in that file.
*
!/want/this/folder1/
!/want/this/folder2/
I also edited your post to use code blocks. This way your message can be more clear for us. Please, use code blocks next time. (</> button)
How you run docker build exactly? Could you share the command?
How large the removed /root/.cache folderās content was? Do you know that content will remain in your image? It is just hidden from the following layers.
This way the installation and the deletion of the cache would be in the same layer, so the cache would be actually and phisically removed. Of course, it woul slow the build process down when you change the requirements and try to rebuild, since it would always upgrade pip. Which by the way I would not do this way, since you have no controll over which pip version will be the end result. This is not really related to your issue, but It can lead to other problems, so I would do this way:
This is nice. Thank you. Could we even decrease more by using multi-stage builds? Coz, when I tried that I could not utilise the pip dependency from one stage to another.
Iāve not read all details, but how come things are different on your local machine and on AWS ECR? Are you building the image on two locations? (Thatās where a registry like Docker Hub comes in: build once, deploy wherever you want. But maybe this is just testing?)
Also:
If you want to know, then use your first Dockerfile and then add some RUN du -sh /root/.cache or something like that? (And then watch the output of that command on first run when creating the image.)
And finally:
What exactly did you try, what exact error messages did you get? Please donāt expect us to repeat instructions that you may already have tried.
I gues it is just because the local version is still where the cache was removed in a different layer. So it is just for us to compare the sizes. Am I right @smanjil ?
I agree, but I think common mistakes could be copying to a wrong folder, not copying everything or using a different python version in the target stage or using a debian based image in the source and an alpine based in the target. The solution is not always the same, so we need to know more about what you triedā¦
However, I donāt think you could get much smaller image since you only install python packages and you already removed the cache. I would rather check the size of the base image, build the new image and check the new size of the folder of the python packages. If the size is not as big as you would expect, you can try multistage build.