Newbie conceptual question regarding installing on run vs pre-installed packages

Very new user here, sorry if it’s been asked before, I couldn’t find previous posts (and not 100% sure what keywords to search)… So I have 3 related conceptual questions that I hope someone can confirm or correct.

1: So it appears to me that docker images on the hub are essentially debian systems with pre-installed packages.
For example, the Python image is functionally similar to someone doing

sudo apt install python3-pip

on a fresh debian system, and then ‘pushing’ it onto the hub.

2: Alternatively, one can also ‘pull’ clean debian base and add

RUN sudo apt install python3-pip

to the dockerfile. Hence, this should be functionally identical to case 1 (besides some other miscellaneous processes in installing and setting up python environments).

3: So one can either pull and re-build the image each time the container is run (in the dockerfile with FROM python:3), for each instance to have the most up-to-date packages, OR one can save the image in which python3 is already installed, and always run a container from that image, so eventually the packages will be outdated, but “interior” compatibility is guaranteed forever?

This almost sounds like you believe every image bases on debian. Some do, most don’t.

Images are a point in time snapshot of an application, its depdencies, if required configuration files and entrypoint scripts.

Its up to you whether you want to

  • pull the latest python3 image whenever you feel the urge and live with it that containing packages are from the build date of the image

  • pull the latest debian image whenever you feel the urge and build your own custom image to run everything at the latest possible version (you are aware that the python3 Dockerfile does way more than your simple command, aren’t you?).

If you let docker run pull the image impicitly during the first usage, updates for the images won’t be pulled automaticly. Though, an explicit docker pull does pull updates if they exist, e.g. an image where the same tag is updated weekly (latest!?), an explicit pull every week would pull the new image additionaly and modify the tag to point to the new image. Though, an existing container will still continue to depend on the old image. If you want the new image to be used, the old container needs to be destroyed and recreated using the new image.

3 Likes

A good question actually. It took me some time to wrap my brain around the Docker image concept also.

1: Well… yes and no. Debian as base is something developers like, since they are used to it. But we want our images to be small and most of the time we don’t need the full power of Debian. That is why, for example Python, comes in flavours like slim, alpine and windowscore. Other popular are buzybox and there is quite a lot that uses scratch that basically is an empty docker base where you can install whatever you want.

Conceptually, you are right. You run the apt command, but not on a fresh debian system. You need to do it on a running container, then snapshot it as meyay mentions. There is no way to take a “physical” debian instance, like one running in AWS or Azure, and push it as an image.

So 2 is how you do it.

3: The up to date-ness… Many developers has NIBM or NBBM and don’t trust other developers to know what they are doing. Taking the Python case again. The Docker images (all flavours) was up on Docker Hub about one hour BEFORE the official release of the next version of Python. This is often the case with all the official docker images. So somewhere, someone is making sure that the latest and greatest version of the application is pushed to the hub. Oh, they also monitor the upstream image, like debian or windowscore, so that if any urgent security issues is raised, they make sure that they rebuild it. It has happened that images or layers get evicted from the hub since they contain unsecure code.

The “I need the latest version always” is not a good practice when developing. Consider a change in the underlying infrastructure to be the same as any change in the code. If I push something to a git repo and rebuild it, running the tests and getting errors, I do NOT want to investigate if it is the code that I changed or the image change that did create the error. I only update the image version on a project that has no build or test errors and rebuild it. In extream, I even use the sha-number to lock down a specific image, since version tags can be moved.

1 Like

Thank you for the explanations, meyay and ovelindstrom! I have a bit more confidence about what I’m doing now.