Can't push docker manifest (complains about two other image digests being required prior to pushing)

I have what I would consider a relatively simple GitHub workflow file:

  create_manifest_docker_files:
    # needs: [build_amd64_dockerfile, build_arm64_dockerfile]
    env:
      IMAGE_REPO: ${{ secrets.aws_ecr_image_repo }}
      AWS_ACCOUNT_ID: ${{ secrets.aws_account_id }}
      AWS_REGION: ${{ secrets.aws_region }}
      AWS_ACCESS_KEY_ID: ${{ secrets.aws_access_key_id }}
      AWS_SECRET_ACCESS_KEY: ${{ secrets.aws_secret_access_key }}
    runs-on: self-hosted
    steps:
     - uses: actions/checkout@v2
     - name: Login to Amazon ECR
       id: login-ecr
       uses: aws-actions/amazon-ecr-login@v1
     - name: Create docker manifest
       run: docker manifest create $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO:latest --amend $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO:latest-amd64 --amend $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO:latest-arm64
     - name: Push the new manifest file to Amazon ECR
       run: docker manifest push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO

Whenever this workflow runs via GitHub Actions, I see the following error:

Run docker manifest push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO
  docker manifest push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO
  shell: /usr/bin/bash -e {0}
  env:
    IMAGE_REPO: ***
    AWS_ACCOUNT_ID: ***
    AWS_REGION: ***
    AWS_ACCESS_KEY_ID: ***
    AWS_SECRET_ACCESS_KEY: ***
failed to put manifest ***.dkr.ecr.***.amazonaws.com/***:latest: manifest blob unknown: Images with digests '[sha256:a1a4efe0c3d0e7e26398e522e14037acb659be47059cb92be119d176750d3b56, sha256:5d1b00451c1cbf910910d951896f45b69a9186c16e5a92aab74dcc5dc6944c60]' required for pushing image into repository with name '***' in the registry with id '***' do not exist
Error: Process completed with exit code 1.

I’m not quite sure I actually understand the problem here. The previous step, “Create docker manifest” completes successfully with no problem, but the “Push the new manifest file to AWS ECR” step fails with the error above.

When looking in AWS ECR, I only have two images – latest-amd64 and latest-arm64. Neither of their Digests are the values that the error message above is putting out.

When exporting those same environment variables to my CLI session and running those commands manually, everything works fine:

root@github-runner:/home/ubuntu/docker-runner# docker manifest create $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO:latest --amend $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO:latest-amd64 --amend $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO:latest-arm64
Created manifest list [obfuscated-from-stackoverflow].dkr.ecr.us-east-1.amazonaws.com/[obfuscated-from-stackoverflow]:latest
root@github-runner:/home/ubuntu/docker-runner# docker manifest push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO
sha256:e4b5cc4cfafca560724fa6c6a5f41a2720a4ccfd3a9d18f90c3091866061a88d

My question is – why would this work from the CLI itself but not from the GitHub Actions workflow? I have some previous runs that show this working perfectly fine with the workflow contents above, but now it’s failing for some reason. Not quite sure if the issue here is within my ECR repository or if it’s something locally messed up on the GitHub runner.

When just simply switching the runner to an arm64 version (I have two EC2 instances that runs docker build), then it works perfectly fine. So it’s definitely got to be something going on with the local docker configuration, but just not sure what.

Any help would be greatly appreciated.

This is the point I guess. I don’t use manufest push, but I almost used it instead of buildx, and if I a remember correctly, you need to push the images first to the registry and after that you can push a manifest. If you get different digests than you can find in the registry, your manifst must be wrong, but I don’t know how that could happen. Can this be some kind of cache on GitHub?

update:

A new idea: If docker manifest create uses your local tags to identify digests and you deleted the old image from the registry and pushed a new version which has the same digest as your local version from command line, that could explain the issue. How do you push the image to the registry? Is it in your GitHub workflow?

Thanks for the reply. Yep, I push it through the GItHub workflow. Is there any way to “purge” this process? Considering it works on my arm64 runner, I’m wondering if there’s a way to just start from a clean slate other than blowing than ec2 instance and redeploying a new one.

Basically, the arm64 runner builds and pushes the image with the latest-arm64 tag, and then the latest-amd64 one pushes. Finally, once they’re both pushed, the amd64 ec2 instance is basically creating and pushing the manifest.

Since I don’t use GitHub for CI/CD, I don’t know. From docker’s point of view, maybe deleting the build cache could help, but if I was right with method detecting digests, you just need to download the latest image before creating manifests.

Wait, don’t you build an image for the amd64 CPU? Are you saying you build only for arm64 and try to push that image as amd64? That would not work. You need to build for both architectures.

Nope I’m building two images – one for amd64 and one for arm64. They are both being built on the EC2 instance that has that specific architecture. After they are both finished building and pushing, the AMD64 EC2 instance is the one responsible for appending the latest-amd64 and latest-arm64 manifest to the latest tag and pushing out the latest tag.

We’ve been doing it for that way for a few years without any issues, but just lately it seems the amd64 EC2 instance doesn’t like the appending/creating manifest and pushing it. pretty odd.

I just tried a docker system prune but still got the same error

Run docker manifest push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO
failed to put manifest ***.dkr.ecr.***.amazonaws.com/***:latest: manifest blob unknown: Images with digests '[sha256:a1a4efe0c3d0e7e26398e522e14037acb659be47059cb92be119d176750d3b56, sha256:5d1b00451c1cbf910910d951896f45b69a9186c16e5a92aab74dcc5dc6944c60]' required for pushing image into repository with name '***' in the registry with id '***' do not exist
Error: Process completed with exit code 1.

The docker system prune command did free up 1.55GB of space and seems like everything was pruned/removed correctly, but still this lingering error.

You do bring up a good point though. Maybe this is something within the workflow itself since the same exact commands work just perfectly fine from the CLI, but not when running via the GitHub Workflow.

Definitely something with the runner itself. i just blew the entire github runner folder away and reprovisioned the runner. working fine now.

Really appreciate your help and replying to assist!

Thank you for the explanation, I understand now.

It looks like there is no perfect solution. At least I have an idea what to expect when I try GitHub too. If you find out why it happened, please share :slight_smile: I am glad I could help at least a little.