Is it duplicated by writing copy twice in Dockerfile?

I am beginner in Docker with Python, I saw the example in Try Docker Compose Step 2
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
EXPOSE 5000
COPY . .
CMD [“flask”, “run”]
It first copys the requirements.txt (COPY requirements.txt requirements.txt), after installing the packages then copy all the thing in current host directory into container (COPY . .), is it duplicated with the first coopy command? Will it be bettter if I removed the first copy requirements.txt and replace by COPY . ., if not, why?

In a Dockerfile, you can use the COPY instruction multiple times to copy different files or directories into the container. However, you should be careful not to copy the same file or directory multiple times, as this can lead to unintended consequences.

For example, if you copy a file into the container with the COPY instruction and then copy it again with a different name, the file will be duplicated in the container. This can cause problems if you are relying on the file being unique or if you are trying to keep the size of the container image small.

On the other hand, if you are copying different files or directories into the container and they happen to have the same name, this will not cause any issues as long as they are located in different directories.

In general, it is a good idea to avoid duplication in a Dockerfile to keep the image size small and to ensure that you are not inadvertently overwriting or modifying important files in the container.

Yes, the requirements.txt would be copied twice, but that’s fine. It will be on two different layers, but the first copy is only for the python requirements so when any other file changes in the build context, but not the requirements, that layer will not be invalidated so the build will run faster. Running pip install can take time, which is completely unnecessary when you just changed a shell script in your project. One duplicated txt files is not a big problem when you can build the project much faster and use the build cache so you can also push and pull your image faster, you can move the files from the root of the build context to a new folder, except the requirements.txt, so you can still copy everything els ein one step without duplication, but I don’t think it is necessary in this case.

1 Like

I’m a bit confused with “so when any other file changes in the build context, but not the requirements, that layer will not be invalidated so the build will run faster” & “use the build cache so you can also push and pull your image faster”, can you explain a little bit more about this? Or any documentation I can reference to the build cache and the procedure of building an image, thank you.

When you build an image, every instruction (RUN, COPY, CMD…) will start a container in the background to save it as an image. The next instruction will use the previous instruction’s image as base image and so on.

docker build . -t imagename

means the same as

docker build "$(pwd)"  -t imagename

where $(pwd) is the current directory and that will be the build context.

COPY file.txt /dir/file.txt

means you will copy file.txt from the build context. Docker will recognize when you change a file and runs every instruction which would be affected by the changed file. Since you copy only one file, the requirements.txt, if you don’t change that, Docker will not run pip install again. Copying files could be faster than running pip install.

If you want to know better how Docker build works, I have a tutorial:

This is about docker build without buildkit and buildkit makes the build even faster, but it can help you to understand what happens and what the layers are. You can also read the documentation

and the best practices

1 Like