Private registry not working after deleting an image

I have the following setup / workflow:

  1. My private registry with my own self-signed SSL certificate
  2. I Build an image and push it to the registry
  3. I pull the image on another host
  4. I delete the image from the registry from the registry using the API
  5. I call the registrys garbage collector
  6. I try to pull the image from the registry
  7. I try to push the image (not changed) to the registry again
  8. I try to pull the image on the other host again

Steps 1 - 6 are working fine, step 4 is kinda working (i guess) but steps 7 - 8 are not working as expected.

Expected results:
4. Image is completely removed from the registry
7. I push the image to the registry during this process some layers (which I deleted in step 4) get pushed.
8. I can pull the image again

Actual results:
4. The image is still present in the registry but has no tags. It seems like this is the way the registry handles image deletion…
7. I can push the image but it says ‘Layer already exists’ for every layer
8. I cant pull the image anymore, docker says: Error response from daemon: manifest for registry.swarm/waterfall:latest not found: manifest unknown: manifest unknown

Notes:

After step 8 failed, checking the registry shows me that my image has a tag called “latest” but I cant retrieve the manifest for it using the API (basically the same error as ‘docker pull’ returned)

Pushing the image won’t solve the problem.

Workaround:
The only way to fix the image in the registry is rebuilding the image on the host and pushing the new version.

Second case:
If I skip step 5 the pull in step 8 kinda works. It starts to pull the image but at some layer docker fails and says “retrying in x seconds”. After a few minutes docker managed to pull the whole image. but fails with unexpected EOF (nothing more)

Is this the intended behavior or am I doing something wrong?

1 Like

Managing a local private registry via the API, especially deleting image layers and metadata is … problematic, even docker’s own documentation hint at that. You haven’t shown how exactly you deleted the image and I suspect that you may have only deleted the image manifest and not the actual layers (blobs) … I assume you are aware you need to do both using the hash (sha256) references returned for each of them ? When you ran the garbage collection, did it clearly show that the manifest and layers were included as candidates ? (as you know these are only ‘marked’ for deletion by the api and aren’t actually removed until the gc completes and the gc will only remove layers and manifests if no references to them exist). Also, in my experience the api will continue to report that the image is still present even after a successful deletion … not sure why, … as I said above, this whole process is a bit sketchy at best.

2 Likes

I face the same problem:

In my case, I run the following API command to delete the image:

curl -k -u username:password -X DELETE "https://private.docker.regsitry.domain/v2/repo_name/image_name/manifests/layer_digest" -H "Accept: application/vnd.docker.distribution.manifest.v2+json"

The command removes the link between the layer and its associated blob (but will not delete the layer data). That is why the command needs to be run as many times as the number of layers plus one more time for the manifest (since each one has a different digest).

To remove the blobs data, the garbage collection command need to be run afterward:
docker exec -it registr-container-name /bin/registry garbage-collect --delete-untagged=true --dry-run=false /etc/docker/registry/config.yml

Even then, the image name will be available in the registry catalogue when you run the following command:
curl -k -X GET -u username/password https://private.docker.regsitry.domain/v2/_catalog

To clean the catalogue, both the blobs and image directories must be manually deleted from within the registry container

But for some reason, some data related to the layer or its blob persist in the registry container and that’s why pushing the image again is impossible unless the registry container volume is purged.