Best practices for git clone, make, etc. via dockerfile RUN?

I am making an image with a large amount of git cloned tools. One long example is this

RUN git clone git://github.com/samtools/samtools.git
RUN cd samtools
RUN autoheader && autoconf -Wno-syntax && ./configure
RUN make
RUN make install
RUN cd /

To what extent should I further shorten it via && and multi-line \? I wish to reach a good compromise between human readability and size/performance, but also worried if some formats may break the commands. An example would be greatly appreciated.

Every RUN command adds a new intermediate layer, so this will definitely increase the size of the image. Here you see an example how to write it with newlines.
Another point is the git clone command. At least you should use git clone --depth 1, otherwise you download the whole history of > 2000 commits from samtools. Or you could download one of the releases instead. If you clone the repo you always get the most recent code at the time you build the image and have no control over the version it contains.

2 Likes

In addition to layering concerns, it’s also important to understand that each intermediary layer is normally (i.e. by most builders AFAIK) created by running a dedicated intermediary container and committing its topmost layer.

The context of those intermediary layers matters; for instance and relevant to your example, the working directory is never configured to be changed on creation of these intermediary containers, meaning it is not “retained” across multiple RUN directives.

Tested with both the Docker builder and podman/buildah:

$ podman build .
STEP 1: FROM alpine
STEP 2: RUN pwd
/
ac36f4543fc151e9eeef30a9864f692281df22c7e2e7b709148423fb5a8dc157
STEP 3: RUN cd /tmp
4608e642cb1020836ab9d60e0f107c7ba3cd5fd7b1a92db19d052a17c2b708fb
STEP 4: RUN pwd
/
STEP 5: COMMIT
bc2d71134f5fb6ecb061ddbb53fb13051b8dc8ca87a7abb43f6800a87fd43f92
$ docker build .
Sending build context to Docker daemon  2.048kB
Step 1/4 : FROM alpine
 ---> b7b28af77ffe
Step 2/4 : RUN pwd
 ---> Running in 08d24b65bffd
/
Removing intermediate container 08d24b65bffd
 ---> c682655eb3d2
Step 3/4 : RUN cd /tmp
 ---> Running in 463d071f12a4
Removing intermediate container 463d071f12a4
 ---> 416fc500d873
Step 4/4 : RUN pwd
 ---> Running in 52339084ede3
/
Removing intermediate container 52339084ede3
 ---> 612d915ffdb2
Successfully built 612d915ffdb2

So, I don’t believe your example Dockerfile yields a build procedure as you intend (i.e. you’re not actually running build processes with samtools as the working directory).

2 Likes

The “best practice” is to not create Docker images that have build-time tools like compilers and source code in them. The way to avoid this is by using multi-stage builds. The idea is that you build whatever you need in a temporary Docker images and then copy only what you need for production into your final image.

It looks something like this:

FROM ubuntu:latest as builder

RUN apt-get update && apt-get install -y \
    build-essential \
    zlib1g-dev \
    libbz2-dev \
    liblzma-dev \
    autoconf \
    git \
    wget

WORKDIR /tmp

# Install HTSLIB
RUN wget https://github.com/samtools/htslib/releases/download/1.9/htslib-1.9.tar.bz2 && \
    tar -vxjf htslib-1.9.tar.bz2 && \
    cd htslib-1.9 && \
    make && \
    make install

# Create production image
FROM ubuntu:latest

# Copy the compiled assets that we need to run from the builder image
COPY --from=builder /usr/local/lib/libhts.so /usr/local/lib/libhts.so
COPY --from=builder /usr/local/lib/libhts.so.2 /usr/local/lib/libhts.so.2
COPY --from=builder /usr/local/lib/libhts.a /usr/local/lib/libhts.a
COPY --from=builder /usr/local/bin/htsfile /usr/local/bin/htsfile
COPY --from=builder /usr/local/bin/tabix /usr/local/bin/tabix
COPY --from=builder /usr/local/bin/bgzip /usr/local/bin/bgzip

WORKDIR /app
COPY myapp/ .

CMD ['myapp']

You can read more about it here: https://docs.docker.com/develop/develop-images/multistage-build/

~jr

1 Like