Better leak detection and prevention for qemu in buildx

buildx allows qemu to consume more memory than is available on the host. When this happens, the Docker Go system can fail in surprising ways, trigging all manner of process panics and errors.

I’d like to see Docker do a few things to mitigate this problem:

Maintain a tighter grip on how much memory qemu is allowed to consume. This relates to batching, because batching builds into smaller concurrent groups relieves most of the need for qemu to grow so big in the first place.

Enforce a cap on the maximum amount of RAM qemu can consume. For example, 90% of the host RAM.

Restart the qemu process more often, such as when no Docker containers are running on qemu.

Detect when qemu may be corrupt, such as when it has requested more RAM than the host provides. Perform integrity checks early, so that the Go system remains intact and doesn’t obfuscate RAM errors with other kinds of errors.

Implementing these sorts of techniques will improve the user experience for buildx enjoyers.

Have you raised that as an issue or submitted a PR ?