Ocrmypdf docker compose

I am having trouble finding the best way to integrate ocrmypdf into an existing docker compose project.

I currently have a docker compose project with several containers: nginx, php fpm, mariadb, redis, tika, and gotenberg. All of these each pull from the official docker hub images. I like the separation of each service into its own container for development, management, and logging.

I tried using the official recommended ocrmypdf docker file from “jbarlow83/ocrmypdf” and calling the web service wrapper which uses a flask wsgi server, but the performance was not good. The ocrmypdf container was only able to handle one request at a time from the php fpm container. The official documentation suggests this web service wrapper is only for demonstration or development purposes.

How should I call ocrmypdf from my php fpm container ?

As another work around, I was also able to mount the docker.sock file in the php fpm container, then use exec() in php, to call docker exec ocrmypdf, but this feels very sloppy, and I think it’s unsafe to expose the docker.sock file to the php fpm container.

Should I try to combine the ocrmypdf and php fpm containers by installing ocrmypdf into the php fpm container ? Or should I build yet another web front-end for ocrmypdf by installing nginx and php fpm inside of the ocrmypdf container ?

Has anyone successfully integrated an ocrmypdf container into their existing docker compose project ?

I also checked the largest project that I’m aware of that uses ocrmypdf, paperless-ngx. It looks like paperless puts everything into one “webserver” container, including: python, nodejs, and ocrmypdf. Paperless then manually installs the dependencies for all of it’s processing. I would prefer to keep ocrmypdf in a separate container, and to use the official docker image.

Coincidently I know ocrmypdf. As far as I remember, the container is meant to be used for a one shot operation and then terminates again. There is no long running backend process.

You would need to create your own api service on top of ocrmypdf, in order to call it from your application.

Otherwise, there would be no way around either embedding ocrmypdf in your image, or bind the docker.sock to your php fpm container.

I have an ocrmypdf batch image that keeps the container running. It would require volume binds, instead of exposure of the docker.sock. It uses inotify watches to process files in a specific folder and stores the processed files in another: https://hub.docker.com/r/meyay/ocrmypdf-batch

Note: I just created the image out of curiosity, I don’t actively maintain it. Though, you can check the Github project used to build that image, in case you want to adopt it: GitHub - meyayl/ocrmypdf-batch: OCRmyPDF Docker image with batch processing based on iNotify