What are the best practices (if any) for coupling a docker image registry with a SCM service (such as bitbucket)?
I know a docker registry can live in artifactory, but how can I assure that the two are as tightly coupled as possible? with hub.docker.com you can view the dockerfile right there which is very nice.
For example I am worried that without due-diligence from developers the latest Docker image in the registry will not reflect the current state of the Dockerfile in SCM or vice-versa. I know it can be done manually without much effort but is there a defacto way of doing this (maybe jenkins or similar)?
Also I would think it is imperative that we can always trace back to the base Dockerfile (that we made) for any image in the registry.
Set up some sort of automated build system (“continuous integration” in the current trendy term). Docker is sufficiently mainstream at this point that any of the cloud-based or locally-installed CI systems can do it.
In your Dockerfile, use a LABEL to record the source of the build. That probably includes the commit hash from distributed source control (git, Mercurial), the branch name if relevant, any release tags if present, and possibly details like the timestamp of the last commit. docker history and docker inspect should be able to show these.
When you docker push your images, push them at least twice, with the commit hash and with the branch name as the “version” part (quay.io/mycorp/imagename:123abc7, quay.io/mycorp/imagename:dmaze-test). If release tags are readily available, the CI system should push images with these tags too.
Make sure the Dockerfile is committed to source control, of course, and try to have a stable path to get any external dependencies that may be present.
Now you can go both ways: given an arbitrary commit, if your CI system built it, you can docker run the image it built; and if you have an image, you can find where exactly in source history it came from, and git checkout or hg up to that specific version, and docker build a near-identical copy of it yourself.
So going off what you said -
Use LABEL as it provides a way to connect the image to the source commit and build.
We will connect our Dockerfile pull requests to jenkins that will kick off a docker build which will then run tests on our new image after its build to verify the changes. From there the rest of the CI system (maybe jenkins) will docker push our new images to the registry.
Could you elaborate more on why we should push the image with multiple names? My first thought is how much clutter that would create. I get the reasoning of tracking the hash/branch but there must be a better way to do that or I must be missing something.
Mostly just because the commit hashes are great for reproducibility and horrible for everything else: you can’t tell if commit 1a2b3c4 is newer or older than commit 9876543. The registry (can be configured to) shares layers across images (especially with the same owner/name) so this doesn’t cost you space really, just an extra name, and the “master” or “default” build will move along with the current mainline source.
The other end of this, which you hint at and I don’t have a good answer to, is that this does produce a lot of images and cleaning them up might be clever. I can readily imagine a policy like “keep only release build images beyond 30 days” but I don’t know how you’d implement it.
I think I understand your argument now. You’re thinking similar to release versioning. I think what I need is one tag for the hash, one tag for latest, and one tag for the build number/some way to tell version, and maybe a branch if we decide.