Dependency issues: Docker container run from another Python script (in Win 10)

Ok, this may take a bit to explain. I have some Python code, called agl_main.py, that uses a libraries called elevation and rasterio. There are some complicated dependencies that have pushed me to run this code in Docker under Ubuntu (see Dockerfile and requirements_agl.txt below).

This will all be integrated into another set of scripts running on Windows (ultimately on Azure). It would be difficult to pull all of that into Docker. So, from this script I call agl_docker_start, which uses Python on Whales to run the container. elevation can be pip installed in windows but I’m pretty sure rasterio cannot.

If I leave the rasterio import intact in agl_main.py I get the first error/traceback below. That is on the Windows side. I assume this is the Python dependencies checker doing it’s job. But agl_main does not actually run in Windows. It runs in the Linux container.

If I comment out the rasterio import, the code on Windows runs but the code in the container errors out (second error/traceback).

Any ideas how to resolve this? Is there a way to tell Python to ignore the dependencies on the Window side? This may not be a Docker issue, per se, but I’m hoping someone has experience in this area and can make some suggestions.

Thanks.


Dockerfile

FROM python:3.9-buster

WORKDIR ./

RUN apt-get update
RUN apt-get install -y software-properties-common
RUN apt-get install -y gdal-bin
RUN apt-get install -y libgdal-dev
RUN export CPLUS_INCLUDE_PATH=/usr/include/gdal
RUN export C_INCLUDE_PATH=/usr/include/gdal
RUN apt-get install -y python-gdal

COPY requirements_agl.txt ./
RUN pip install -r requirements_agl.txt

COPY agl_main.py ./
COPY utilities.py ./

CMD ["/bin/sh", “-c”, “python agl_main.py > /tmp/shareddata/output.log 2>&1”]

End of Dockerfile

requirements_agl.txt

DateTime
make
setuptools
numpy
pygdal==2.4.0.10
elevation
rasterio
pandas
typing
docker

End of requirements_agl.txt

agl_main imports

import os
import logging
import pandas as pd
import math
from typing import List
import datetime as dt
from itertools import takewhile
import elevation
import rasterio

End of agl_main imports

agl_docker_start function

def agl_docker_start(win_path, docker_path):
from python_on_whales import docker
logging.info(’%s %s’, win_path, docker_path)
docker.run(
“agl_image”,
name=“agl_container”,
volumes=[(docker_path, win_path)],
detach=True,
)
(I don’t know how to get the indentation to show up correctly)

End of agl_docker_start function

Error message when elevation and rasterio uncommented in agl_main.py

Traceback (most recent call last):
File “C:\Users\TracyTD\OneDrive - Truth Data Insights\TD\Software\AGL\AGL\g1000_preprocessing.py”, line 13, in
from agl_main import agl_docker_start
File “C:\Users\TracyTD\OneDrive - Truth Data Insights\TD\Software\AGL\AGL\agl_main.py”, line 10, in
import rasterio
ModuleNotFoundError: No module named ‘rasterio’

End of error message

Error message when rasterio import commented out

Traceback (most recent call last):
File “//agl_main.py”, line 529, in
agl_main()
File “//agl_main.py”, line 464, in agl_main
elev_list_rect, agl_list_rect = compute_agl(clipfile, points_df_list[index])
File “//agl_main.py”, line 253, in compute_agl
top, bottom, left, right, height, width, band1 = get_raster_data(c_file)
File “//agl_main.py”, line 204, in get_raster_data
dataset = rasterio.open(geotif)
NameError: name ‘rasterio’ is not defined

End of error message