Unexpected ADD and COPY behavior during build - Doesn't detect existing files

Edit 2016-07-06: this behavior has been reported previously in this github issue: Subsequent COPY instructions re-adds all files in every layer instead of only the files that have changed

I want to use Docker for continuous integration and deployment for a Ruby on Rails project.

Here is the situation represented with an example. I have a GIT repository with the project source files. I build the first working image for my project and name it webproject:base.

Later, I find a bug in my code and update the repository with a fix (only few files are changed), and I want to rebuild and redeploy the new corrected image. So, I write a new Dockerfile based on the webproject:base image wich contains the original project files, add the project files again and perform some buildsteps (install gems, precompile assets, database migrations). This image is tagged as webproject:latest.

Obtained behavior:
webproject:latest image has the size of webproject:base image plus the size of all project files (modified or not), modified assets and gems.

Expected behavior:
webproject:latest image should have the size of webproject:base image plus the size of modified project files on the last build (changed source files plus modified assets and gems)

Previous experience:
This same procedure was used before on a vagrant machine with docker 1.6 and worked as expected. Sadly I don’t have the way to reproduce this because that VM was deleted. It seems that in the current version the files that are added into the new layer are not checked to be different to the ones on the existing image layer. I thought that the copy on write strategy was implemented to take “smart” decisions on what file to copy before writting changes into the new layer.

Steps to reproduce
You can use this script:

#!/bin/sh
echo "FROM busybox:latest
ADD . /root" > Dockerfile-base

echo "FROM webproject:base
ADD . /root" > Dockerfile-latest

echo "FROM webproject:base
ADD a.txt /root" > Dockerfile-expected

echo "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890" > a.txt
echo "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890" > b.txt
echo "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890" > c.txt
echo "### Building webproject:base ###"
echo "################################"
docker build -t webproject:base -f Dockerfile-base .
echo
echo "0987654321098765432109876543210987654321098765432109876543210987654321098765432109876543210987654321" > a.txt
echo "### Building webproject:latest ###"
echo "##################################"
docker build -t webproject:latest -f Dockerfile-latest .
echo
echo "### Building webproject:expected ###"
echo "####################################"
docker build -t webproject:expected -f Dockerfile-expected .
echo
echo "### history webproject:latest ###"
echo "#################################"
docker history webproject:latest
echo
echo "### history webproject:expected ###"
echo "#################################"
docker history webproject:expected

Script output

### Building webproject:base ###
################################
Sending build context to Docker daemon 9.216 kB
Step 1 : FROM busybox:latest
 ---> 47bcc53f74dc
Step 2 : VOLUME /root
 ---> Running in 13e3c09ae419
 ---> bd5c997e48c2
Removing intermediate container 13e3c09ae419
Step 3 : ADD . /root
 ---> 73a1f5ac0a52
Removing intermediate container 12b19fde3be8
Successfully built 73a1f5ac0a52

### Building webproject:latest ###
##################################
Sending build context to Docker daemon 9.216 kB
Step 1 : FROM webproject:base
 ---> 73a1f5ac0a52
Step 2 : ADD . /root
 ---> dfb2ebe37522
Removing intermediate container 3c610e6d258e
Successfully built dfb2ebe37522

### Building webproject:expected ###
####################################
Sending build context to Docker daemon 9.216 kB
Step 1 : FROM webproject:base
 ---> 73a1f5ac0a52
Step 2 : ADD a.txt /root
 ---> b01d50e31d36
Removing intermediate container 19fa4da7a371
Successfully built b01d50e31d36

### history webproject:latest ###
#################################
IMAGE               CREATED                  CREATED BY                                      SIZE                COMMENT
dfb2ebe37522        Less than a second ago   /bin/sh -c #(nop) ADD dir:103e9f2a9707ebbee79   1.771 kB            
73a1f5ac0a52        1 seconds ago            /bin/sh -c #(nop) ADD dir:4da2fc6eb9d2d27fde5   1.771 kB            
bd5c997e48c2        1 seconds ago            /bin/sh -c #(nop) VOLUME [/root]                0 B                 
47bcc53f74dc        10 weeks ago             /bin/sh -c #(nop) CMD ["sh"]                    0 B                 
<missing>           10 weeks ago             /bin/sh -c #(nop) ADD file:47ca6e777c36a4cfff   1.113 MB            

### history webproject:expected ###
#################################
IMAGE               CREATED                  CREATED BY                                      SIZE                COMMENT
b01d50e31d36        Less than a second ago   /bin/sh -c #(nop) ADD file:16db2646b53a0e9c58   101 B               
73a1f5ac0a52        1 seconds ago            /bin/sh -c #(nop) ADD dir:4da2fc6eb9d2d27fde5   1.771 kB            
bd5c997e48c2        1 seconds ago            /bin/sh -c #(nop) VOLUME [/root]                0 B                 
47bcc53f74dc        10 weeks ago             /bin/sh -c #(nop) CMD ["sh"]                    0 B                 
<missing>           10 weeks ago             /bin/sh -c #(nop) ADD file:47ca6e777c36a4cfff   1.113 MB

Docker version:

Client:
 Version:      1.11.1-rc1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   c90c70c
 Built:        Tue Apr 26 05:21:04 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.1-rc1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   c90c70c
 Built:        Tue Apr 26 05:21:04 2016
 OS/Arch:      linux/amd64

lsb_release -a

Distributor ID:	Ubuntu
Description:	Ubuntu 16.04 LTS
Release:	16.04
Codename:	xenial

docker info

Containers: 4
 Running: 2
 Paused: 0
 Stopped: 2
Images: 83
Server Version: 1.11.1-rc1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 362
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 4.4.0-22-generic
Operating System: Ubuntu 16.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.842 GiB
Name: gbisheimer-ubuntu
ID: UTLF:TQRB:3HTK:4BAV:6SDR:SZ7J:V2W4:UXJA:NTKR:AWNH:ZHCH:ZC27
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Username: gbisheimer
Registry: https://index.docker.io/v1/
WARNING: No swap limit support

How are you verifying these images’ actual size? I don’t see any details about that. In docker images it will show “VIRTUAL SIZE” which is the size as if CoW didn’t exist. In most cases they’re actually a fair bit smaller than that due to CoW like you mention.

BTW, CoW doesn’t do anything “smart” :wink: – it’s just a property that some union filesystems have which allows them to, well, copy on write. By using this you can have a section of the filesystem which appears to be its own independent copy until someone (a container) actually attempts to change it and a new mutable layer gets “layered” on top by copying the old version and applying the requested modification.

I used docker history command as shown in the script output. Also used dockviz but it gives the same information as the history command presented in a fancy way.

I’ve also browsed the docker folder where image layers are stored and found the files in duplicated in there too.

The images have the size shown by the docker history command, no doubt about that because I see the layer sizes when I push the image to the registry.

Have you looked at the github issue I mentioned? apparently this “BUG” is happening because I use AUFS storage and is not present with OVERLAY storage for example, but I haven’t tested it yet.

I’ve read about the CoW feature, but it seems that it doesn’t apply here.

Didn’t see the linked issue, good call out.

As noted by Tonis in the issue, it seems to be intended behavior in AUFS’s case, not necessarily a bug. To gain the effect of using the differs from other filesystems in AUFS would lose some other major benefits of that storage driver.

Why not split your build up into a few sub-components and add things in piecemeal instead of one large COPY operation?

I thought that the different storage drivers docker uses had pros and cons regarding performance and portability to other systems, never thought that they have an impact on image build process.

Given the same input files and Dockerfile, shouldn’t Docker output the same image layers during the build process (same size, same SHA, etc.) so that you can share images images built on different computers? Otherwise, when working with a CI system, all the machines that are used as CI workers to build the images for testing, deployment, etc. should have the same storage driver configuration. Is that correct?

In the end the final image / filesystem you get should be the same between storage drivers. You can easily pull an image built with aufs and run it on overlay. It’s how the diff is calculated between the layers that differs from storage driver to storage driver.