I am having trouble finding the best way to integrate ocrmypdf into an existing docker compose project.
I currently have a docker compose project with several containers: nginx, php fpm, mariadb, redis, tika, and gotenberg. All of these each pull from the official docker hub images. I like the separation of each service into its own container for development, management, and logging.
I tried using the official recommended ocrmypdf docker file from “jbarlow83/ocrmypdf” and calling the web service wrapper which uses a flask wsgi server, but the performance was not good. The ocrmypdf container was only able to handle one request at a time from the php fpm container. The official documentation suggests this web service wrapper is only for demonstration or development purposes.
How should I call ocrmypdf from my php fpm container ?
As another work around, I was also able to mount the docker.sock file in the php fpm container, then use exec() in php, to call docker exec ocrmypdf, but this feels very sloppy, and I think it’s unsafe to expose the docker.sock file to the php fpm container.
Should I try to combine the ocrmypdf and php fpm containers by installing ocrmypdf into the php fpm container ? Or should I build yet another web front-end for ocrmypdf by installing nginx and php fpm inside of the ocrmypdf container ?
Has anyone successfully integrated an ocrmypdf container into their existing docker compose project ?
I also checked the largest project that I’m aware of that uses ocrmypdf, paperless-ngx. It looks like paperless puts everything into one “webserver” container, including: python, nodejs, and ocrmypdf. Paperless then manually installs the dependencies for all of it’s processing. I would prefer to keep ocrmypdf in a separate container, and to use the official docker image.