Ocrmypdf docker compose

thequicksphinxrises · August 15, 2023, 11:48pm

I am having trouble finding the best way to integrate ocrmypdf into an existing docker compose project.

I currently have a docker compose project with several containers: nginx, php fpm, mariadb, redis, tika, and gotenberg. All of these each pull from the official docker hub images. I like the separation of each service into its own container for development, management, and logging.

I tried using the official recommended ocrmypdf docker file from “jbarlow83/ocrmypdf” and calling the web service wrapper which uses a flask wsgi server, but the performance was not good. The ocrmypdf container was only able to handle one request at a time from the php fpm container. The official documentation suggests this web service wrapper is only for demonstration or development purposes.

How should I call ocrmypdf from my php fpm container ?

As another work around, I was also able to mount the docker.sock file in the php fpm container, then use exec() in php, to call docker exec ocrmypdf, but this feels very sloppy, and I think it’s unsafe to expose the docker.sock file to the php fpm container.

Should I try to combine the ocrmypdf and php fpm containers by installing ocrmypdf into the php fpm container ? Or should I build yet another web front-end for ocrmypdf by installing nginx and php fpm inside of the ocrmypdf container ?

Has anyone successfully integrated an ocrmypdf container into their existing docker compose project ?

I also checked the largest project that I’m aware of that uses ocrmypdf, paperless-ngx. It looks like paperless puts everything into one “webserver” container, including: python, nodejs, and ocrmypdf. Paperless then manually installs the dependencies for all of it’s processing. I would prefer to keep ocrmypdf in a separate container, and to use the official docker image.

meyay · August 16, 2023, 6:48am

Coincidently I know ocrmypdf. As far as I remember, the container is meant to be used for a one shot operation and then terminates again. There is no long running backend process.

You would need to create your own api service on top of ocrmypdf, in order to call it from your application.

Otherwise, there would be no way around either embedding ocrmypdf in your image, or bind the docker.sock to your php fpm container.

I have an ocrmypdf batch image that keeps the container running. It would require volume binds, instead of exposure of the docker.sock. It uses inotify watches to process files in a specific folder and stores the processed files in another: https://hub.docker.com/r/meyay/ocrmypdf-batch

Note: I just created the image out of curiosity, I don’t actively maintain it. Though, you can check the Github project used to build that image, in case you want to adopt it: GitHub - meyayl/ocrmypdf-batch: OCRmyPDF Docker image with batch processing based on iNotify

Topic		Replies	Views
Docker Development Question - Separate PHP instance General	11	7143	March 7, 2017
Some questions about docker compose Compose	3	316	March 15, 2024
Dockerizing my production server General docker	1	1007	September 1, 2017
Best use case for nginx + fpm webapp General docker-compose , volumes	1	511	May 21, 2024
Adapting Compose file to use images pushed to Docker Hub or other repo Compose	0	621	June 22, 2021

Ocrmypdf docker compose

Related topics