Docker Community Forums

Share and learn in the Docker community.

Run python-tika with Java 8 in docker

I’ve a django site that parses pdf using tika-python and stores the parsed pdf content in elasticsearch index. it works fine in my local machine. I want to run this setup using docker. However, tika-python does not work as it requires java 8 to run the REST server in background.

my dockerfile:

FROM python:3.6.5

WORKDIR /site
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
EXPOSE 9200
ENV PATH="/site/poppler/bin:${PATH}"
CMD ["python", "manage.py", "runserver", "0.0.0.0:8000"]

requirements.txt file :

django==2.2
beautifulsoup4==4.6.0
json5==0.8.4
jsonschema==2.6.0
django-elasticsearch-dsl==0.5.1
tika==1.19
sklearn