How the non-OS containers work?

I have just been using docker for some time, but haven’t got over the “how the hell it does its magic” stage. Mostly, I run a container from, say, ubuntu or debian image, and then happily install all kinds of servers and tools in the running container. Typically, my container exposes some port to the host (database or web server), and shares a directory with the host so that there can be either some place where to exchange the files, or opportunity to keep project files (ie specific web app resources like large files) directly accessible at the host.

But I feel I have wrong idea in my mind how the docker does its magic. I use them somewhat like virtual machines. But “Operating Systems” is just one of the image categories! So, when it comes to docker images like “Programming Languages”, for example, “python”, I am at loss how to think about those environments. I have all kinds of questions, like - is it just like virtualenv? Can I update the python version that’s in the container, or that’s not how it should be done? Can I get shell access to the container running python image, or there’s just no shell to run?

Currently I am working on Node.js+nginx project. So I created this Dockerfile to build an image:

FROM debian

WORKDIR /home

RUN apt-get update

COPY init.sh /home/
RUN /home/init.sh

COPY run.sh /home/

CMD ./run.sh

Here’s init.sh:
#!/bin/bash

apt-get -y install wget nginx

wget -qO- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash
source /root/.bashrc
nvm install 0.10

cd /etc/nginx/sites-enabled/
ln -s /home/host/NodeNginx.config

And here’s run.sh:
#!/bin/bash

service nginx start
source /root/.bashrc
cd /home/host/nodejs-server/
node server.js

I have build an image with name node-nginx, and run the container with this command from a directory where there’s a “shared” subdirectory that contains my project files:
$ docker run -dit --name=NodeNginx -p 8887:8887 -v shared:/home/host node-nginx

This setup works nicely, but I have no idea if I use the Docker as it is meant to be used. Maybe there’s a non-OS image with Node.js and non-OS image with nginx and I should run two containers, respectively, with Node.js app and nginx? Perhaps there exists some guide on “how to do Docker properly”, where these kind of topics are covered?

There is not that much "magic to it :-), but I am not quite sure what you mean by “non-OS containers”. The host provides, to the Docker engine, an API can that is sort of like a hardware abstraction layer. If you care to work at that level, interacting directly wiht (abstract) device, you may in principle do so, but almost everbody prefers to put a layer at the bottom of the stack that provides a more famliar, and higher level, OS layer, as a base layer (“FROM debian” in your case).
When explaining to Docker newbies, my audience is familiar with Windows. So I can refer to the layers as very similar to DLLs. There is one essential restriction (which is essential to the isolation mechanisms): Whatever you put into another layer, it can call functions in the layers below itself ONLY (beyond what it provides itself). No mechanism is provided to reference any other DLL or executable or whatever: Any reference to an entry point must be defined in the layers of the stack.
Furthermore: When you create a new layer, the SHA (i.e. checksum) of its contents is calculated. The contents is not only the code and data; it includes a table of the layers below it, with their SHAs. If any of lower layers were modified, their SHA would change, and wouldn’t be recognized as the one required for your new layer. A stack of layers is absolutely and forever immutable, it cannot be updated. So you cannot update the Python in your container; you have to create a new stack, a new image, with the new Python, and that new image will have a different ID. The only absolute ID of any image is the SHA of its top layer - the “tag” is like a symbolic link to the SHA, and like any symbolic link, it may be moved. So you can move your link to a new image to make it appear like the old one, but that should be considered bad practice, especially in a Docker environment where people expect immutability: A given image should always give exactly the same output, given the same input. The “latest” tag is an explicit exception to this.
If you make a new image with the new Python, but based on the same debian base layer, that “DLL” is common to the two images. (In your case, where you have not specified a tag for the debian, “latest” is assumed, so it could have been updated, but let us assume that it isn’t.) Then no more disk space is needed for that debian base layer. If you retrieve images from a registry, it is cached in your local Docker engine, and it exists in a single copy, no matter how many of your (different) images are based on it. Even in RAM there is a single copy when you are running multiple containers: Lower layers are strictly read-only and can be used by multiple active containers.
Even if it has data structures that may be modified, these are moved up to the top layer when you create the image. Or, to be more precise: The data segment in any layer, all the way up to the top, accumulates the data structures from the layers below, but in a “!azy” copy-on-write manner: As long as two or more simultaneously running containers just read the data structures, they all refer to the same copy in RAM. As soon as a container tries to modify any data, before the modification is done, a copy is made of that RAM page, and the corresponding entry in the page table of the container is updated to that private copy.
The file system is essentially treated the same way. When you start out making a new layer, the entire file system with its index pages and all is initially inherited “as is”. When the build process adds files (or otherwise changes the file system), copies private to the new layer is made of all modified file system pages, while sharing all unmodified pages with the layer(s) below. Even at runtime, the container inherits the entire file system from the image, initially sharing all of it with all other running containers based on the same image. It if writes to this file system, private copies are made, so that one container’s write to a file is not visible to another container, even when running from the same image. When the container terminates, private file system page copies are abandoned; the next run of the same image will see a “virgin” file system.
The Docker engine does provide “volumes”, shared among arbitrary containers, and persisting across runs. Those are explicity declared as volumes. Any “anonymous” file system use is private and non-persistent.
Back to image building:
We have decided on a fairly strict discipline with regard to images: We do not build every image from the bottom up, but define a local “hihger level” base image with the OS and a set of common, basic tools for software development. This is a common base layer for “everybody”. On this we build e.g. a gcc layer with a compiler and stable libraries, build tools etc. This makes a base layer for any gcc based activity.
We have a tree of base images: One branch with Python tools, one with C/C++ tools, one with documentation tools… When some project requires, say, a newer gcc version, we must make a new branch, but whatever is below gcc in the tree is unmodified.
We really have a “forest”, not a tree: Our standard base layer is Ubuntu, but some projects require CentOS. Even then: All CentOS projcets share the same CentOS base layer, and possibly even higher layers.
The sharing of (higher level) base layers is obviously good for the file system, cache and RAM load, but the essential thing for us is that it is a strong force to homogenize our tool sets. Adding a project specific tool on top of an already defined Ubuntu + general tools + gcc + general development tools + … the extras are added and the new top layer is created while the developer waits, in a matter of second or minutes.
Furthermore: It is OK to build on that gcc version from 2018q3, isn’t it? If you need a newer version, and have to rebuild another branch and maybe a lot of sub-branches, it will take a lot more time to get in place! … We never refuse version updates, but when there is no real reason for a version change, developers prefer to get the new image now and stick to the familiar version.

I guess that it shines through: We are explicitly using Docker as a mechanism for managing “version hell”, and for homogenizing our tool sets across projects. Also, we have a need for reestablishing a tool set when we have a support request relating to a two year old software release; then we cannot let every developer create his own new images whenever he feels like. Other uses of Docker may have very different requirements, where our goals of having well defined and managed tool sets, persistent over several years, re not at all relevant.

1 Like

Thank you for the elaborate explanation. It really helps me to get my mind around the way how Docker is built, and shows the paths where deeper learning is required.

So, I understand, you say that at the base of basically every Docker image there is an OS image. I was wondering how, for instance, the “Hello world” image works — how does it get access to the host’s standard output, but now I see it has a Git repo where the 1st line of Dockerfile.build reads FROM debian:stretch-slim. And the same is true for all python images — they all are based on one or another OS image. So this gives me some peace now.