I am new to creating docker images and have put together a Dockerfile that creates an image that works but the final image size is 600MB+ and would like somebody that is more advance than I to advise if there is anything I can do to reduce this. I have read many blogs about various strategies to do so and have gone the Python virtual environment route. I am really concerned about build times as I will not be building often but would like to see the final image size be a bit leaner than what it is.
What I am doing is building an image with a python application, fava which is a web gui front-end to the accounting program beancount. These two python applications alone are easy enough and the fava team even provide a Dockerfile based on alpine to build a light image. However, the issue I have is I want to extend this by including an extension to this python application (smart_importer) which provides some machine learning features that will automate aspects of the transaction importing process. This extension depends on numpy, scipy, and scikit-learn and this is where the extra weight comes from. I originally tried to extended the alpine approach that the fava team had but installing scipy on alpine fails horribly which I cannot resolve. Using python slim I can build a fairly small (<200mb) final image with just fava and beancount…but as I said this balloons with the introduction of the needed dependencies of smart_importer.
Here is the Dockerfile I have currently, are there any changes I can easily make to get the final image size down that I am not seeing? Greatly appreciate any pointers one might have.
FROM python:slim AS base
FROM base as builder
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
#WORKDIR /install
COPY ./library-dependencies.txt /tmp/library-dependencies.txt
COPY ./requirements.txt /tmp/requirements.txt
#ENV PATH="/install:${PATH}"
RUN buildDeps='build-essential gcc gfortran python3-dev' \
&& apt-get update \
&& apt-get install -y $buildDeps --no-install-recommends \
&& cat /tmp/library-dependencies.txt | egrep -v "^\s*(#|$)" | xargs apt-get install -y \
&& pip3 install --upgrade pip \
&& CFLAGS="-g0 -Wl,--strip-all -I/usr/include:/usr/local/include -L/usr/lib:/usr/local/lib" \
pip3 install \
# --prefix="/install" \
--no-cache-dir \
--compile \
--global-option=build_ext \
--global-option="-j 6" \
-r /tmp/requirements.txt \
&& apt-get purge -y --auto-remove $buildDeps \
&& rm -rf /var/lib/apt/lists/* \
&& rm -r \
/tmp/requirements.txt \
/tmp/library-dependencies.txt
FROM base
COPY --from=builder /opt/venv /opt/venv
COPY ./library-dependencies.txt /tmp/library-dependencies.txt
RUN apt-get update \
&& cat /tmp/library-dependencies.txt | egrep -v "^\s*(#|$)" | xargs apt-get install -y \
&& apt-get install -y libgomp1 --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
ENV PATH="/opt/venv/bin:$PATH"
ENV BEANCOUNT_FILE ""
ENV FAVA_OPTIONS ""
EXPOSE 5000
CMD fava --host 0.0.0.0 $FAVA_OPTIONS $BEANCOUNT_FILE
requirements.txt
# numeric packages needed for smart_importer
Cython==0.28.5
numpy==1.15.1
scipy==1.1.0
scikit-learn
#fava
fava
smart_importer
library-dependencies.txt
libopenblas-dev
liblapack-dev
libxml2-dev
libxslt1-dev
zlib1g-dev