Stategies for building Dockerfiles to be reused across environments

Hey guys,

One of the benefits of docker is building environments that are similar in testing, development, and production. One issue is obvious, many build artifacts and dependancies differ between these things. This leads to the likely necessity of building a new image for testing/development/production.

Question:

  • What strategies are people using to put things into the base image
  • How much, and in what languages, can be done with runtime environment variables?
  • What patterns have emerged? And of course, it can be language dependant.

Observations:

  • Maybe the production container is a good base image? Since it will likely be as minimal as possible, with the exception of having a real webserver vs. something more basic - and things can be added on to an image, whereas parts canā€™t really be taken away.
1 Like

What do you mean with ā€œas minimal as possibleā€ and ā€œparts taken awayā€? A docker container should be a wrapper around one single process! So for instance, if you have an nginx/apache, nodejs and a database, youā€™d run 3 containers :slight_smile:

Thanks for the repsonse. I wasnā€™t very clear :slight_smile: . Iā€™m fully in agreement and understanding with the one process per container philosphy. However, being a engineer whose responsibilities involve both development and operations Iā€™m working on the whole development/test/deploy pipeline.

For a single application you would likely have to build 3 different (but overlapping) dockerfiles to represent the different dependancies, test, development, production.

Iā€™m not convinced that most people containerize their application server, just their web server. Anyways :wink:

Most applications have different environment specific dependancies, like lib files, gem files, tar files, libraries, etc. A production container should be secure and small, that means only the dependancies that are necessary. Test needs more, we likely have testing framework dependancies, maybe added parts of the application for better debugging, etc.

I donā€™t know if Iā€™m explaining this well, but Iā€™m asking IF people are making different dockerfiles for each environment, HOW they are doing, and if not WHY they arenā€™t. :smiley:

The alternative I see to having different dockerfiles is you pull all your dependancies in at runtime based on environment variables, but the problem with that is thats provisioning and provisioning is slow when you would rather boot a container and have it ready to run.

If Iā€™m constructing a continuous delivery pipeline, I see rebuilding Docker images for each environment as an unnecessary risk.

I donā€™t want any hard-coded, environment-specific data going into my application or container, but instead that stuff should be abstracted and pulled at deploy / run-time based on the context.

Can you give more concrete examples of the setup you have and the issues youā€™re facing?

I get that, but there are issues with that. Iā€™ll give you an example.

I have a rails app, it has dependancies A B and C in all environments. In Test it has an addition dependency D. In development it has dependancies A B C D but also installs documentation for these depedencies. In production rails uses Passenger/Unicorn, but in development I use rack.

Iā€™m asking, how you manage this with dockerfiles. Iā€™ll tell you the ways I see that can be done.

I get that, but that would limit me from downloading dependencies A B C D, or at least their documentation. The app dependencies should be build with the environment in mind, so this leads to the second options.

Download dependencies at runtime based on variables set at the time of docker run with -e RAILS_ENV=test. But that leaves download and install every time we create a new container for this app. That slowdown makes it pretty difficult to do orchestration. So whatā€™s the solution?

I was thinking you could make a base dockerfile D1, that creates a mostly standardized image of your app. Then a dockerfile D2 that inherits from D1, for testing and D3 inheriting from D1 for production.

I supposed each image could have EVERYTHING needed for all environments. But that seems wrong.

Iā€™m not being concise, sorry about that! Is it making sense?

You could mount environment specific dependencies as a volume. Your application just knows that the dependencies are located in - lets say - ā€œ/home/myapp/dependenciesā€, but dependent on your environment you can change whats in there.

Iā€™m not convinced that most people containerize their application server, just their web server. Anyways

Well applications servers are about to go south anyway, right? Iā€™ve worked once with tomcat and I donā€™t want to do that again :smiley:

(I typed quite a bit, then lost it all when dockerhub timeout on me! Iā€™ll be brief this time. Yeah Iā€™ll write in a editor first, but Iā€™ve wasted some time alreadyā€¦)

Iā€™ve iterated over a hierarchy of Python images, like

base --> data --> modeling --> visualization --> extra

each image adds some more libs (both Python and linux) on top of the prev image.

ā€˜baseā€™ contains (in additional to thing you expect) vim, debugger, profiler, py.test (running unit tests).

ā€˜extraā€™ contains documentation generation systems (Sphinx) and other things that do not really pertain to ā€˜runningā€™ the programs.

Say project A needs image ā€˜modelingā€™. I define a script to launch this image for development. Important: ā€˜docker runā€™ has many options you can use to customize the environment without building a customized physical image. I map the source code volume into the container, so that I get the benefit of interactive editting both inside and outside of container (using IDE and other heavy weights) and testing right away.

Meanwhile, I define a script to build a production image and create production launch scripts and so on. The production image starts with ā€˜modelingā€™, but also freezes my source code in the image (for the case of Python, I donā€™t need to install). Paths, directory structures and so on guarantee the dev and production environments are really identical, except that for dev, the source code is a mapping of ourside, whereas for production, the source code is embedded inside.

So, ā€˜baseā€™ enters production images whereas ā€˜extraā€™ does not.

I may have project B based on image ā€˜dataā€™, projct C based on image ā€˜visualizationā€™, and so on.

Of course this single lineage of images canā€™t serve all my projects, but it supports quite a few.

I value a smaller number of images much more than a large number of images individually customized and optimized. I do tweak the images often, so the number matters alot I guess. While trying to avoid bloat, I donā€™t obsess with removing a couple hundred megabytes of extra size, if that necessitates maintaining a separate image.