I have a more conceptual question on how to transfer an existing project into a docker environment.
My application currently runs every 1-2 hours periodically by cron: fetches files from an ftp, and runs an application thereafter that processes/imports the files.
Question: when migrating this app to a docker container, should I:
1.) divide the ftp-sync bash script and the application-code that processes the data into different docker containers?
2.) trigger the cronjob on the host system to spin up the docker container, then shut down after import finished? Or launch a docker container that runs continuously, and itself has a crontab dependency, so the cronjob is defined inside the container.
Thanks for your insight!
Generaly speaking, containers should be self contained.
Though, it depends on the technical details and your design objective:
Question to ask yourself about the technial details:
- Does the script require further tools to perform its action?
- Are those tools part of the main application?
- If the script and application are in different containers, how is the downloaded data going to get to the application? How will the script trigger the application for post-processing?
Question to ask yourself about the design objective:
- Is it more important that the solution has a minimum footprint on the ressources or is self contained?
- Is it important to make the solution portable (setting it up with ease on a different host/another environment)?
Depending on those answers the suggested outcome might be different.
Let me answer some of the questions:
- The script does not require further tools, apart from ftp/lftp dependency to download the files.
- I would anyway download the files to a shared /tmp folder to the host. Thus, sharing files between a “script-container” and an “application-container” would not be a problem.
- Those tools are not required for the main application (which only processes the files and converts them to another format).
- It’s not important to scale the container. It should run as a singleton, non clustered, no swarm. Once running, I do not plan to move the container to another host on a frequent basis.
To me it feels better to include both the script and the application into one container. Because they belong together, and downloading the files without processing them makes neither sense, nor is the application functional without prior downloaded files.
Regarding the cronjob topic: I’m still not sure if it’s better to include the crontab into a constantly running container, and only spin up the container from the host.
The advantage of placing the contrab into the container would be that the scheduled cronjob definition would be also documented in code (I’m deploying though gitlab pipelines). I see no further advantages here, so it’s the question if that is reason enough to include also the cron into the container.
The disadvantage is that I’d have to include the additional crontab dependency into the container, with a more complicated “crontab-service” entrypoint script. Plus the container would always run idle in background, while it only actively runs the import every 2 hours for some minutes.
So then it’s probably best I include both ftp-script and processing-application in one container, as I never want to scale, and both are not useful without the other part.
Regarding crontab inside/outside: the only advantage I could see about including the crontab inside the container is that the cronjob would then be documented in git (I use gitlab-ci for deployment and also for version control).
Disadvantage would be that I’d have to include the crontab dependency additionally into my image, and launch a more complex entrypoint script that keeps the crontab background service in the foreground of the container.
Moreover, the container would always be running and consume memory, even though the real application only executes every 2h for a few minutes.
Sound like a single container it is and everything else is a adesign decision Technicaly either approach is possible.
A container is more or less nothing else that one or more fenced processes in an isolated execution environment - it’s the processes that consume cpu/ram. The ressource overhead for keeping the container running is barely noticable. Yep, the entrypoint script will be more complex.