Docker images and containerd

Looking at the lower levels beneath dockerd, I can use tools like ctr and nerdctl to interact with Docker’s containerd and see containers. I can also interact directly with runc to see containers.

But I cannot see images. It’s like images are handled by dockerd outside of containerd.

$ sudo nerdctl -a /var/run/docker/containerd/containerd.sock namespace list
NAME    CONTAINERS    IMAGES    VOLUMES    LABELS
moby    22            0         0
$ docker image list | wc -l
41

I can, outside Docker, use ctr to pull an image and that one is visible. But I cannot see Docker’s images with ctr.

It isn’t clear to me how it works. Where are the images?

containerd is just a container daemon. Doesn’t care about images. nerdctl gives you an interface to manage containerd and it has its own folder. If you want to know where it is, you can just guess and try /var/lib/containerd since Docker uses /var/lib/docker by default and you would be right. The other and more interesting solution is searching for the image id on the filesystem:

id=$(nerdctl image inspect nginx --format '{{ .ID }}' | cut -d: -f2)
find / -name $id

When you run nerdctl referring to the containerd socket, you just talk to containerd. Docker not just has its own folder for images, it has multiple supported storage drivers. When you change that storage driver, Docker can’t see the images stored by the old storage driver.

You could say “Wait… `docker container inspect can tell me what the storage driver is, so nerdctl should be able to get that information from the container”.
In fact containerd does not need that information and it is not part of the “isolated environment”. Docker needs the storage driver, because the way it creates the filesystem for a container (for example merging multiple layers) depends on it. If you have a filesystem, you don’t actually need Docker or containerd to create a container. You could just use unshare to isolate your process using Linux kernel namespaces. This is actually what a container is. Of course Docker does much more and using Docker much easier than unshare.

So I guess, containerd knows only about containers and nothing about volumes and images. nerdctl does, but it only knows the images it created, because it knows where the metadata is stored by itself.

If you want to understand more, list the containerd-shim processes when you have one Docker container and one nerdctl container running.

ps -axo command | grep containerd-shim
/usr/bin/containerd-shim-runc-v2 -namespace moby -id 18c1ee585f073d7cd7a224903e195e9d087ad7dc49398d9c573ce39d1cebf24f -address /run/containerd/containerd.sock
/usr/bin/containerd-shim-runc-v2 -namespace default -id d3a64d9a57e2424c0e3e183cd4f27b774a69594eddc1c490f4d58e46b2ef8ec7 -address /run/containerd/containerd.sock

Then use the “search for the id” approach:

find / -name 18c1ee585f073d7cd7a224903e195e9d087ad7dc49398d9c573ce39d1cebf24f | grep -v cgroup
/var/lib/containerd/io.containerd.runtime.v2.task/moby/18c1ee585f073d7cd7a224903e195e9d087ad7dc49398d9c573ce39d1cebf24f
/var/lib/docker/image/overlay2/layerdb/mounts/18c1ee585f073d7cd7a224903e195e9d087ad7dc49398d9c573ce39d1cebf24f
/var/lib/docker/containers/18c1ee585f073d7cd7a224903e195e9d087ad7dc49398d9c573ce39d1cebf24f
/run/docker/runtime-runc/moby/18c1ee585f073d7cd7a224903e195e9d087ad7dc49398d9c573ce39d1cebf24f
/run/docker/containerd/18c1ee585f073d7cd7a224903e195e9d087ad7dc49398d9c573ce39d1cebf24f
/run/containerd/io.containerd.runtime.v2.task/moby/18c1ee585f073d7cd7a224903e195e9d087ad7dc49398d9c573ce39d1cebf24f
find / -name d3a64d9a57e2424c0e3e183cd4f27b774a69594eddc1c490f4d58e46b2ef8ec7 | grep -v cgroup
/var/lib/nerdctl/1935db59/containers/default/d3a64d9a57e2424c0e3e183cd4f27b774a69594eddc1c490f4d58e46b2ef8ec7
/var/lib/containerd/io.containerd.runtime.v2.task/default/d3a64d9a57e2424c0e3e183cd4f27b774a69594eddc1c490f4d58e46b2ef8ec7
/run/containerd/runc/default/d3a64d9a57e2424c0e3e183cd4f27b774a69594eddc1c490f4d58e46b2ef8ec7
/run/containerd/io.containerd.runtime.v2.task/default/d3a64d9a57e2424c0e3e183cd4f27b774a69594eddc1c490f4d58e46b2ef8ec7

note: I filtered out the cgroup folder from the result

The last line of each result is where containerd has its metadata (not nerdctl which used /var/lib/containerd)
Notice the namespaces in the path: moby, default

Docker:

/run/containerd/io.containerd.runtime.v2.task/moby/18c1ee585f073d7cd7a224903e195e9d087ad7dc49398d9c573ce39d1cebf24f

Nerdctl:

/run/containerd/io.containerd.runtime.v2.task/default/d3a64d9a57e2424c0e3e183cd4f27b774a69594eddc1c490f4d58e46b2ef8ec7

You can look into the files with cat or anything you like to see what it contains, but the most important part is that it only contains the merged path of the Docker containar’s filesystem. If you have jq on your machine, you can get that path

Docker:

jq --raw-output .root.path  /run/containerd/io.containerd.runtime.v2.task/moby/18c1ee585f073d7cd7a224903e195e9d087ad7dc49398d9c573ce39d1cebf24f/config.json

Path:

/var/lib/docker/overlay2/f93847579465f15f1595d55107004e3f5a4da5cd864fef3a73d9d552b3600a56/merged

Nerdctl:

jq --raw-output .root.path /run/containerd/io.containerd.runtime.v2.task/default/d3a64d9a57e2424c0e3e183cd4f27b774a69594eddc1c490f4d58e46b2ef8ec7/config.json

Path:

rootfs

Now it shows only “rootfs”, because it is relative to the config.json. You can list the files there:

ls /run/containerd/io.containerd.runtime.v2.task/default/d3a64d9a57e2424c0e3e183cd4f27b774a69594eddc1c490f4d58e46b2ef8ec7/rootfs/
bin  boot  dev  etc  home  lib  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

This was actually the first time I compared docker and containerd (and nerdctl) so almost everything I wrote here was new to me too, but I hope I could explain how it works.

This is useful, thank you. I have been trying to absorb it. I think what this illustrates is the Docker images are managed by the storage driver of dockerd and not by the “snapshotter” of containerd.

One piece I can’t seem to tie together is the relationship between an image ID and its storage. In my case I am using zfs. I have an image (an example chosen because it has 2 layers)

"grafana/alpine:3.15.4": "sha256:704e787900f7abf272c8646b254e00bae2eebde1636fc0a0eded994fb2899835",

And its image JSON

/var/lib/docker/image/zfs/imagedb/content/sha256/704e787900f7abf272c8646b254e00bae2eebde1636fc0a0eded994fb2899835`

which has the layer details

"diff_ids": [
   "sha256:4fc242d58285699eca05db3cc7c7122a2b8e014d9481f323bd9277baacfa0628",
   "sha256:7cc603b59fd5d5e4b223c2f5201791692443aff16f493fe2a2a0fa43279cdb85"
]

I have not been able to locate the image datasets for those layers - I can’t find any reference from those layer shasums pointing to datasets.

From working backwards from a container, I can find the image dataset. First inspect to get the dataset

"GraphDriver": {
    "Data": {
        "Dataset": "dockerpool/8947b34454ee07a033f3a2bb6bb0c06c51e5025b0e0db34b359879526d66e731",
        "Mountpoint": "/var/lib/docker/zfs/graph/8947b34454ee07a033f3a2bb6bb0c06c51e5025b0e0db34b359879526d66e731"
    },
    "Name": "zfs"
},

Then check the datasets’ origins (excerpted from zfs get origin <dataset>)

293 # zfs get origin dockerpool/8947b34454ee07a033f3a2bb6bb0c06c51e5025b0e0db34b359879526d66e731
dockerpool/8947b34454ee07a033f3a2bb6bb0c06c51e5025b0e0db34b359879526d66e731
dockerpool/8947b34454ee07a033f3a2bb6bb0c06c51e5025b0e0db34b359879526d66e731-init@163923693
dockerpool/8947b34454ee07a033f3a2bb6bb0c06c51e5025b0e0db34b359879526d66e731-init 
dockerpool/5b4528453764f0516b41da51d05ce14e0aac0c3922d312c1e49d970ca8d15fd1@46025618  -

So the image dataset is:

dockerpool/5b4528453764f0516b41da51d05ce14e0aac0c3922d312c1e49d970ca8d15fd1 

But I cannot find anything that ties the sha of the image to this dataset. Been finding/grepping in /run and in /var/lib/docker.

I’m missing some detail but not sure what. When I run a container off that image, how does it locate the image data from its shasum ? Putting it another way, givein an image shasum, how to I find its image dataset?

Sorry, I haven’t read the whole post yet, but to answer to the question at the end (since I only have time for that for now)

In case of Docker, the image hash is the sha256 hash of the metadata file. Here I wrote about it:

Until I can respond your question in details, I hope it helps.

The image ID is indeed the sha256 hash of the metadata file. I proved that to myself like this

$ docker image inspect grafana/alpine:3.15.4 | jq -r '.[].Id'
sha256:704e787900f7abf272c8646b254e00bae2eebde1636fc0a0eded994fb2899835

$ TOKEN=$(curl -s --user '<user>:<pass>' 'https://auth.docker.io/token?service=registry.docker.io&scope=repository:grafana/alpine:pull' | jq -r '.token')
$ $ curl  -sL  "https://registry-1.docker.io/v2/grafana/alpine/blobs/sha256:704e787900f7abf272c8646b254e00bae2eebde1636fc0a0eded994fb2899835"  -H "Authorization:Bearer $TOKEN" | sha256sum
704e787900f7abf272c8646b254e00bae2eebde1636fc0a0eded994fb2899835  -

I know from backtracing the container’s dataset (previous post) that the dataset’s container is

dockerpool/5b4528453764f0516b41da51d05ce14e0aac0c3922d312c1e49d970ca8d15fd1

What I am uinable to work out is how to map image 704e787900f7abf272c8646b254e00bae2eebde1636fc0a0eded994fb2899835 to dataset 5b4528453764f0516b41da51d05ce14e0aac0c3922d312c1e49d970ca8d15fd1

You can see it in docker image inspect:

        "GraphDriver": {
            "Data": {
                "Dataset": "dockerpool/5b4528453764f0516b41da51d05ce14e0aac0c3922d312c1e49d970ca8d15fd1",
                "Mountpoint": "/var/lib/docker/zfs/graph/5b4528453764f0516b41da51d05ce14e0aac0c3922d312c1e49d970ca8d15fd1"
            },
            "Name": "zfs"
        },

But where is that relationship in the filesystem (i.e under /var/lib/docker) ?

(edit) I can find the relationship between a container and its mount at /var/lib/docker/image/zfs/layerdb/mounts/<container id>/mount-id but I can’t find a simlar mapping for an image id to its dataset.

It seems I didn’t understand your question, because I missed the fact that you had ZFS which you clearly stated before. I configured ZFS for Docker and tried to find the relation between the image and the dataset. This is what I found:

docker image inspect nginx --format '{{ .GraphDriver.Data.Dataset }}'
zpool-docker/5cb914513dc1d32f7aa6534889c6fc3a15162c4048d4e572e89343c71918a442
grep -r "5cb914513dc1d32f7aa6534889c6fc3a15162c4048d4e572e89343c71918a442" /var/lib/docker
/var/lib/docker/image/zfs/layerdb/sha256/5447829daae2b442a3fca68bd8302ccf47579abdc05d70330c7d606aa8276dab/cache-id:5cb914513dc1d32f7aa6534889c6fc3a15162c4048d4e572e89343c71918a442

So that dataset hash is in:

/var/lib/docker/image/zfs/layerdb/sha256/5447829daae2b442a3fca68bd8302ccf47579abdc05d70330c7d606aa8276dab/cache-id

I am not sure how that has was calculated.

Was I finally able to understand you? :slight_smile: I hope this is still relevant to you.

Yes you understand perfectly :slight_smile: And you also illustrate the same break as me…

Following your nginx example:

  • image id = fa5269854a5e615e51a72b17ad3fd1e01268f278a6684c8ed3c5f0cdce3f230b
  • dataset = fdd2151ccf704475e4604e3172b96c4787ef395f613f83c25af31831a9132ddf
  • layerdb = 5a730c5c122e844d0933a20b7b2fda1fc3bd33987869dbd2b59396d80b0bcfca

The layerdb name is not the image-id; the file is in /var/lib/docker/image/zfs/layerdb/sha256

Contrast with a container

  • container id = 86e94db298c05f4028f7160bc4132a79397ffe1b7f5861635f44069f225ab256
  • dataset = 13489c445bea602d3762a74e5aacf97739277aab10f59f804644731ff2772af1
  • layerdb = 86e94db298c05f4028f7160bc4132a79397ffe1b7f5861635f44069f225ab256

The layerdb name is the container-id; the file is in /var/lib/docker/image/zfs/layerdb/mounts.

So, like you, I also don’t understand what the image layerdb hash 5a730c5c122e844d0933a20b7b2fda1fc3bd33987869dbd2b59396d80b0bcfca is. I would have expected it to be the image id - that would make sense to me. Whatever it is, it’s the same each time the image is created. I tried it on 2 hosts and it’s the same (yours is different - I guess your nginx image different to mine; I just pulled latest to do a quick test - I got image id fa5269854a5e615e51a72b17ad3fd1e01268f278a6684c8ed3c5f0cdce3f230b).

If I can work this out I’ll post here to let you know.

It doesn’t have to be. An image can have multiple layers. A container has only one layer besides tha image layers. The image id comes from the content of the metadata file of the image. You can change the metadata file, then change the hash and change every file or folder which contained that hash, but the filesystem layers will not change. I believe the image layer IDs are calculated from the content of that layer. Some layers have parent layers. Their IDs are in the “parent” file. The “cache-id” is the id which can be found in the folder of the storage driver. In case of overlay2, it is /var/lib/docker/overlay2. In case of zfs, it can be found in the result of

zfs list

If you interested in the structure of the Docker filesystem more in general, I can recommend this article which explains it more deeply than my github repo:

it is not about zfs though.