Multi-platform build fails

I have two dockerfiles:

dockerfile.amd

FROM amd64/ubuntu:22.04
RUN apt update
RUN apt full-upgrade -y
RUN apt install -y <BUNCH OF PACKAGES>

dockerfile.arm

FROM arm64v8/ubuntu:22.04
RUN apt update
RUN apt full-upgrade -y
RUN apt install -y <BUNCH OF PACKAGES>   # Same packages

On their own, each dockerfile works for the platform it runs on. There is one difference, an addition to the arm dockerfile, but it is irrelevant to the problem at hand, and only affects later layers.

I’m trying to combine those into a single dockerfile

FROM ubuntu:22.04
RUN apt update
RUN apt full-upgrade -y
RUN apt install -y <BUNCH OF PACKAGES>   # Same packages
...

compose.yml

services:
  prep: 
    build: 
      platforms:
        - linux/amd64
        - linux/arm64/v8


I have two MacOS machines, one using an Intel chip, and another with M1, both running Docker Desktop with the default multi-platform builder enabled

I have confirmed the multi-platform builder works with a small test build:

FROM alpine
RUN echo uname -m > arch

When trying to run docker compose build, each machine fails at building the opposite platform:

  • The AMD64 machine fails at the apt install -y ... step (when building for the arm64v8 platform)

    This step takes 90s for the AMD machine to build for amd, but around 400s before crashing, when trying to build for arm.
    Meanwhile, the ARM machine builds this step in 54s (for arm)

    370.0 done.
    370.0 Processing triggers for libc-bin (2.35-0ubuntu3.8) ...
    370.1 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
    370.1 Segmentation fault
    370.2 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
    370.2 Segmentation fault
    370.2 dpkg: error processing package libc-bin (--configure):
    370.2  installed libc-bin package post-installation script subprocess returned error exit status 139
    370.2 Processing triggers for ca-certificates (20240203~22.04.1) ...
    370.3 Updating certificates in /etc/ssl/certs...
    385.7 0 added, 0 removed; done.
    385.7 Running hooks in /etc/ca-certificates/update.d...
    385.8 
    390.5 done.
    390.5 done.
    390.6 Processing triggers for dbus (1.12.20-2ubuntu4.1) ...
    390.6 Errors were encountered while processing:
    390.6  libc-bin
    390.8 E: Sub-process /usr/bin/dpkg returned an error code (1)
    
  • The ARM machine fails at the apt update step, when trying to build for the amd platform, with the error message:

    0.086 exec /bin/sh: exec format error
    

Why could it be failing?

My gues was that some packages like libc-bin wasn’t compatible with qemu that emulated the build (I used the virtualization framework first), but couldn’t reproduce it with qemu either.

On the other hand, what I noticed is you run apt update in a separate instruction making a separate layer, so the cache could be outdated and you end up installing older packages.

I also never run any upgrade in a container unless the base image is not maintained anymore. But even if you do it, it doesn’t help much in a separate instruction which would upgrade only once. Unless of course you use --no-cache.
What happens when you merge the layers and install only a single package. Since libc-bin is the one in the error message, try this:

FROM ubuntu:22.04
RUN apt update \
 && apt full-upgrade -y \
 && apt install -y libc-bin

but I would prefer this

FROM ubuntu:22.04
RUN apt-get update \
 && apt-get install -y --no-install-recommends libc-bin
  • apt always shows a warning that apt doesn’t have a stable CLI interface and “use with caution in scripts”. SO apt-get is still the preferred in scripts, including a Dockerfile
  • Using --no-install-recommends you don’t install recommended packages. Sometimes you need them, and then you can add those to the package list, but otherwise you just make the image bigger and I can imagine other issues after installing unwanted packages.
  • Again, I would not use upgrade commands in a Dockerfile. When you know there is a specific vulnerability in a package, you can install a newer version of that. By upgrading you can also introduce vulnerabilities, but at least you upgrade irrelevant packages causing who knows what.

I thought it may have been cache, so I made sure to docker system prune beforehand

I’ll try removing the upgrade step, the dockerfile was not all created by me, and so I am not entirely sure of the necessity or exact meaning of each step (update, upgrade, etc.)

I figured it’s probably the builder’s virtualization layer :confused:

Until the multi-platform build process is resolved, I am using an alternative method

docker manifest create deanayalon/fms-prep:u22 \
  --amend deanayalon/fms-prep:u22-amd \
  --amend deanayalon/fms-prep:u22-arm

docker manifest push -p deanayalon/fms-prep:u22

What I am wondering is how one can list the existing manifests? I see no docker manifest list command, do they appear anywhere?


Edit: Nevermind, found it, ~/.docker/manifests

Alternatively,

Tried that and the results are pretty weird

When building using the AMD machine

  • libc-bin only works!
  • Adding any other package crashes QEMU

The ARM machine still crashes stating “exec format error”


I have found this though

Now sure how/if that may help me, but I’ll keep on hacking at it