Docker Community Forums

Share and learn in the Docker community.

Reduce image size of image that contains numpy, scipy, scikit-learn

I am new to creating docker images and have put together a Dockerfile that creates an image that works but the final image size is 600MB+ and would like somebody that is more advance than I to advise if there is anything I can do to reduce this. I have read many blogs about various strategies to do so and have gone the Python virtual environment route. I am really concerned about build times as I will not be building often but would like to see the final image size be a bit leaner than what it is.

What I am doing is building an image with a python application, fava which is a web gui front-end to the accounting program beancount. These two python applications alone are easy enough and the fava team even provide a Dockerfile based on alpine to build a light image. However, the issue I have is I want to extend this by including an extension to this python application (smart_importer) which provides some machine learning features that will automate aspects of the transaction importing process. This extension depends on numpy, scipy, and scikit-learn and this is where the extra weight comes from. I originally tried to extended the alpine approach that the fava team had but installing scipy on alpine fails horribly which I cannot resolve. Using python slim I can build a fairly small (<200mb) final image with just fava and beancount…but as I said this balloons with the introduction of the needed dependencies of smart_importer.

Here is the Dockerfile I have currently, are there any changes I can easily make to get the final image size down that I am not seeing? Greatly appreciate any pointers one might have.

FROM python:slim AS base

FROM base as builder

RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

#WORKDIR /install
COPY ./library-dependencies.txt /tmp/library-dependencies.txt
COPY ./requirements.txt /tmp/requirements.txt
#ENV PATH="/install:${PATH}"

RUN buildDeps='build-essential gcc gfortran python3-dev' \
    && apt-get update \
    && apt-get install -y $buildDeps --no-install-recommends \
    && cat /tmp/library-dependencies.txt | egrep -v "^\s*(#|$)" | xargs apt-get install -y \
    && pip3 install --upgrade pip \
    && CFLAGS="-g0 -Wl,--strip-all -I/usr/include:/usr/local/include -L/usr/lib:/usr/local/lib" \
        pip3 install \
#       --prefix="/install" \
        --no-cache-dir \
        --compile \
        --global-option=build_ext \
        --global-option="-j 6" \
        -r /tmp/requirements.txt \
    && apt-get purge -y --auto-remove $buildDeps \
    && rm -rf /var/lib/apt/lists/* \
    && rm -r \
	/tmp/requirements.txt \
        /tmp/library-dependencies.txt

FROM base
COPY --from=builder /opt/venv /opt/venv
COPY ./library-dependencies.txt /tmp/library-dependencies.txt

RUN apt-get update \
    && cat /tmp/library-dependencies.txt | egrep -v "^\s*(#|$)" | xargs apt-get install -y \
    && apt-get install -y libgomp1 --no-install-recommends \
    && rm -rf /var/lib/apt/lists/*

ENV PATH="/opt/venv/bin:$PATH"
ENV BEANCOUNT_FILE ""
ENV FAVA_OPTIONS ""
EXPOSE 5000
CMD fava --host 0.0.0.0 $FAVA_OPTIONS $BEANCOUNT_FILE

requirements.txt

# numeric packages needed for smart_importer
Cython==0.28.5
numpy==1.15.1
scipy==1.1.0
scikit-learn

#fava
fava
smart_importer

library-dependencies.txt

libopenblas-dev
liblapack-dev

libxml2-dev
libxslt1-dev
zlib1g-dev