Our compiled app is large (> 1GB). I am using a command as follows to copy the application into the container:
COPY .\dist .\app
This is great, but the layer created is large.
I decided that I would create 2 docker files, a base docker file which would get run weekly and the main dockerfile. The base one would be the FROM of the main one.
So I have the COPY line above in both dockerfiles.
I was under the impression that this would result in an incremental copy of only the files that have changed between the base and the current. Which would result in much smaller layers.
But this doesn’t seem to be the case. Am I missing something or is it not supposed to work like this?
COPY uses the host where the build is running as the source. In your case Docker will compute a hash for the contents of the dist directory. If there’s a match for that hash in the layer cache, then the layer gets re-used and the copy doesn’t need to happen. That just saves time, not disk space - ultimately you will have a layer with the full contents of dist, whether it came from the layer cache or the build process.
You can make better use of the cache by breaking that one COPY line into multiple lines - copying the least-changed files first, so subsequent builds will come mostly from the cache. But even if you can break your COPY across ten layers to speed up the build, the combined storage of those ten layers will be the same as your one layer.
Splitting across two Docker images doesn’t work because the layers only get reused if there’s a cache hit for all the input to that layer - i.e. if your two Dockerfiles differ before the COPY command, then they can’t share the COPY layer from the cache.