Docker Community Forums

Share and learn in the Docker community.

DockerRegistry internals questions

This is my first post to the docker Forums.
My name is Brian Fennell,
I am a DevOps engineer - I am new to Docker but I am very familiar with Linux and programming many different languages.

Currently I am trying to understand DockerRegistry. We have a GoCD process which bakes new docker images for each software build and each image is placed in our DockerRegsty which we host ourselves. The good news is that our developers love the modularity. The bad news is that the size of the DockerRegistry keeps growing and we are trying to identity retention and purging policies.
I am new to Docker and DockerRegistry, so I am using google and direct examination to understand the interfaces and internals. The Docker and DockerRegistry was installed before I joined the team.

Here is what I have so far:
My boss asked me if I can identify all the files in the docker registry look at the size and date-of-creation of them. I have determined that the API does not offer this level of detail, but the raw file system under the registry does. The trick now is understanding enough of the internals to link the raw “blob” files back to the relevant metadata (so we don’t try to purge something that is in use or is needed for production).

Any help from the experts on this forum would be greatly appreciated.
If there is another Forum I should be using for these types of questions please direct me there.

My findings so far:

Version:
server01.example.net ::: Red Hat 7.2 ::: Docker Server Version 1.12.6 ::: Docker Registry 2.6.1
server02.example.net ::: Red Hat 7.2 ::: Docker Server Version 1.12.1 ::: Docker Registry 2.6.1

Open Question: where is the source code for DockerRegistry 2.6.1?

server01 is primary and server02 is a cold fail-over so I continued digging on server 01.

Research on server01.example.net

ScriptA
ls -1trd /nfs/dir1/dir2/registry/*
ls -1trd /nfs/dir1/dir2/registry/*/*
ls -1trd /nfs/dir1/dir2/registry/*/*/*

OutputA
/nfs/dir1/dir2/registry/v2
/nfs/dir1/dir2/registry/v2/blobs
/nfs/dir1/dir2/registry/v2/blobs/sha256
/nfs/dir1/dir2/registry/v2/repositories
/nfs/dir1/dir2/registry/v2/repositories/aaaa01
/nfs/dir1/dir2/registry/v2/repositories/aaaa02
[ . . . ]

ScriptB
ls -1d /nfs/dir1/dir2/registry/*/r*/*/*/*

OutputB
/nfs/dir1/dir2/registry/v2/repositories/aaaa01/bbbb01/_layers
/nfs/dir1/dir2/registry/v2/repositories/aaaa01/bbbb01/_manifests
/nfs/dir1/dir2/registry/v2/repositories/aaaa01/bbbb01/_uploads
/nfs/dir1/dir2/registry/v2/repositories/aaaa01/bbbb02/_layers
/nfs/dir1/dir2/registry/v2/repositories/aaaa01/bbbb02/_manifests
/nfs/dir1/dir2/registry/v2/repositories/aaaa01/bbbb02/_uploads
/nfs/dir1/dir2/registry/v2/repositories/aaaa02/bbbb01/_layers
/nfs/dir1/dir2/registry/v2/repositories/aaaa02/bbbb01/_manifests
/nfs/dir1/dir2/registry/v2/repositories/aaaa02/bbbb01/_uploads
/nfs/dir1/dir2/registry/v2/repositories/aaaa02/bbbb02/_layers
/nfs/dir1/dir2/registry/v2/repositories/aaaa02/bbbb02/_manifests
/nfs/dir1/dir2/registry/v2/repositories/aaaa02/bbbb02/_uploads
[ . . . ]

Observations:

Observation 1:
Linux executable sha256sum produces 64 hexadecimal digits cryptographic
checksums (also called digests or hashes) of any file of any length
(number of bytes).
The Linux documentation manual page (man page) on sha256sum says:
"sha256sum - compute and check SHA256 message digest [ . . . ]
The sums are computed as described in FIPS-180-2 [ . . .. ]"
Docker Registry uses a “content addressable storage” approach,
where a hexadecimal sha256 digest is used to directly
store and retrieve files indexed by the hexadecimal sha256 digest.
experimentally it can be shown that the following script
successfully converts the sha256 digest to a full-path-filename.
The “data” file located under this name has the same hexadecimal
sha256sum as its address.

$ cat blob-full-filename.bash
#!/bin/bash
sed 's#^\(..\)\(.*\)$#/nfs/dir1/dir2/registry/v2/blobs/sha256/\1/\1\2/data#'

Observation 2:
The Linux “file” command (Copyright Ian F. Darwin, Toronto, Canada, 1986-1999)
can be used to identify the type of “data” using either rules-of-thumb
(for text of an identifiable type or format) or an embedded file
type identification number (also called a “magic-number”). Both use
the first 512 bytes of a file and a list of rules to identify the file type.

Random sampling shows primarily two types of files stored in the
content-addressable store: 1) tar archives 2) JSON
The tar archives contain one or more files with both file data and
file-metadata (such as creation date, file permissions, file ownership).
The tar archives may be treated as a serialized filesystem.
The JSON files contain structures data serialized as text, represented as
the would be in JavaScript, usually nested JavaScript Objects
(similar to Java Maps or Python dicts or Perl hashes or Awk
associative arrays) and arrays, containing string and numeric
primitives. The JSON Objects may be treated as a set of
Key/Value pairs. The JSON Arrays may be treated as an ordered
set of values.

Observation 3:
Range of Number of files in Directory named like:
/nfs/dir1/dir2/registry/v2/repositories/aaann/bbbbmm/_manifests/tags/latest/index/sha256
is 1 to 10 (in the data-set I have to analyze - our internal corporate Docker Registry).
This may be treated as a Set of 1 to N
Open Question: what is the index in terms of Docker and Docker Registry use cases?
Open Question: Is the Set Ordered or Unordered - if Ordered, how is the order determined?

Ovservation 4:
JavaScript Objects may or may not contain a reference to a named Class
(perhaps more correctly called a Prototype in JavaScript).
JSON Objects do not usually have a Class Name contained within.
These Objects can, however, sometimes, be identifed using duck-typing.
Treating the Object as a set of Key/Value pairs, converting this to a Set of Keys,
and enforcing an order on the Set of Keys (by sorting) can yeild a kind of
fingerprint of the Object Prototype - An Object with a given set of Keys can be
said to be of the same type as another Object with the same set of Keys.
If the Object can get and/or set the Values associated with these Keys then other
Objects with can Keys can exhibit the same getting and setting behaviors.
If it walks like a duck and it talks like a duck then it is a duck.
If it can get and/or set all the same Values for all the same Keys then
it is of the same type (can be treated like it has the Class Name or
Prototype Name).

Using duck-typing:
I have discovered at least two types of JSON objects in the “blobs” Object Store:
1) architecture,config,container,container_config,created,docker_version,history,os,rootfs
2) config,layers,mediaType,schemaVersion
Question: What are the proper names for these types?
Question: What are these objects used for in terms of Docker and Docker Registry use cases?

I have some more but perhaps you can see where I am going with this.
I hope someone can clarify for me some of these questions.

Brian Fennell

Still haven’t found the dockerregistry source code. would be very grateful if someone could point me in the right direction for that. Perhaps this would be an easier question to answer - how do I upgrade Docker in place on Red Hat 7.2 in production? We are running Docker Server Version 1.12.1. After we upgrade Docker then we will try to upgrade Dockerregistry from 2.6.1 to the current version.

I found the answer to my own question: