RUN Command Invalidates Cache Between CI Builds and Local Development Builds

Hi all,

I have placed all of the OS and App version details below, but I would like to provide my issue as someone might already experienced what i am seeing and either provide a solution or explanation.

TLDR; Docker seems to adding this |0 /bin/sh -c cp -f to the RUN command locally but the build server doesn’t `/bin/sh -c cp -f (notice no weird pipe and 0) which I believe is invalidating the RUN cache

We have a CI build system (TeamCity) which pulls a Dockerfile and resources into a remote server (TeamCity agent) when a new commit is pushed.

The build server executes docker build as expected and creates layers which are eventually pushed to a local Artifactory Docker Registry. This CI build server utilises Docker 18.09.2. This is an old version I know but as a risk adverse enterprise any new technologies go through strict security scanning before being rolled out (not much I can do).

Our local development environments can by anything we want them to be, in this case I am using Manjaro (Arch Linux) and I have Docker version 19.03.12 installed.

When I Pull down the CI built images from the internal Artifactory Docker Registry and attempt to simply rebuild or build against the latest release in an attempt to utilise Docker Caching, I am getting some very unexpected behaviour.

Steps

  1. Pull latest release from registry to local dev
  2. docker build --cache-from VALUE -t VALUE:new_version
  3. It says using the cache all the way down, including the COPY (using cache) until it gets to the RUN command (which is right after the COPY), which all of a sudden invalidates the cache and re-runs - see step 4/5 below
Step 1/5 : FROM dea-titan.docker.internal.cba/app/etl:1.0.0
 ---> 2f3fe2b77ff9
Step 2/5 : ENV PYTHONPATH=/backend
 ---> Using cache
 ---> 475c396f2919
Step 3/5 : COPY . /backend/ada
 ---> Using cache
 ---> 3d96697ff485
Step 4/5 : RUN cp -f /backend/ada/docker_resources/log4j.properties.template /opt/titan/spark/conf/log4j.properties     && rm -rf /backend/ada/docker_resources
 ---> Running in 0736291d7a24
Removing intermediate container 0736291d7a24
 ---> 9bed932cd57b
Step 5/5 : WORKDIR /backend/ada
 ---> Running in dacb031ae57b
Removing intermediate container dacb031ae57b
 ---> db7dbb128e5e
Successfully built db7dbb128e5e
Successfully tagged dea-ada.docker.internal.cba/ada/worker:0.0.0.0-dev

I have read lots of issues with COPY and ADD etc invalidating the cache BUT I cannot find anything which would indicate the RUN command invalidating the cache.

On further investigation, I compared the history of both the cache-from version of the container against the local built version and it seems Docker is converting the RUN command into a different format which causes it think its changed (when it hasn’t)

Docker seems to adding this |0 /bin/sh -c cp -f locally but the build server doesn’t `/bin/sh -c cp -f. It is the only thing that seems to be different

The following is the docker file causing the issue

# ENV VALUES INJECTED VIA docker_build in MakeFile
FROM artifactory/docker/registry/app/etl:1.0.0

ENV PYTHONPATH=/backend

# Utilises .dockerignore please check for blacklist and whitelisting when making updates
COPY . /backend/ada

RUN cp -f /backend/ada/docker_resources/log4j.properties.template /opt/titan/spark/conf/log4j.properties \
    && rm -rf /backend/ada/docker_resources

WORKDIR /backend/ada

Issue Type: Bug / Clarification

OS Version/Build:
5.7.14-1-MANJARO
NAME=“Manjaro Linux”
ID=manjaro
ID_LIKE=arch
BUILD_ID=rolling
PRETTY_NAME=“Manjaro Linux”
ANSI_COLOR=“32;2;24;144;200”
LOGO=manjarolinux

CI Build App Version (Docker):
Client:
Version: 18.09.2-ce

Local Build App Version (Docker):
Client:
Version: 19.03.12-ce
API version: 1.40
Go version: go1.14.5
Git commit: 48a66213fe
Built: Sat Jul 18 01:33:21 2020
OS/Arch: linux/amd64
Experimental: false

Server:
Engine:
Version: 19.03.12-ce
API version: 1.40 (minimum version 1.12)
Go version: go1.14.5
Git commit: 48a66213fe
Built: Sat Jul 18 01:32:59 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.3.4.m
GitCommit: d76c121f76a5fc8a462dc64594aea72fe18e1178.m
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.18.0
GitCommit: fec3683

Docker-Compose - docker-compose version 1.26.2

The following is my Dockerfile. What you see in the history are other images I am inheriting from. Yes, some of those have been poorly written or are also inheriting from other images themselves making it look like I have a lot of layers when in-fact, all i have are 3.

If you look at my screenshot, the last 3-4 layers are of interest and 1 in particular which is casing my issue

Ahs anyone got any ideas?

Just to make sure, I am only referring to the last 3 layers (we use a lot of inheritance)

Cheers

Ajay