Copying OS disk with Docker install - odd behavior, suggestions?


We have an embedded linux application that runs on ARM hardware and we’re in the process of upgrading from a base system running ubuntu 18 to a base running ubuntu 20, and we’re upgrading some things about Docker (old version 19.03.6, new version 24.0.5) at the same time.

We’ve got everything working as expected, EXCEPT the deployment piece - the way in which we make multiple copies of the boot disk to run on new hardware seems to have hit a …weird… difference. The way we’ve been doing things (successfully) is to do the full Docker install and pull down the initial container, etc, then shut down the system and copy the boot partition (make copy of all files into a tar archive, make new fs on new disk and untar them there with correct permissions, etc) onto the new disk - the hardware we’re running on has some quirks about booting that we address in our install procedure as well.

Here’s the part I can’t figure out: with the new system, when we copy the full root partition, the Docker container seems to be incomplete - we consistently get an error that certain Python libraries (needed by our application) are missing from the container. The container image seems ok (according to docker image ls), but the app won’t run. If I then delete-and-redownload the container (I have to remove the container and do docker system --prune all in order to force the redownload), it then starts up without issue.
This would be fine except for the fact that the container image is several gigabytes and the systems it would get downloaded on has a VERY slow internet connection.

I want to stress: on the old system (ubuntu 18, Docker 19) doing it this way works fine. Also, when it fails on the new system, it’s always the same python libs missing, which makes me think it’s not random corruption.

Is there a better way of doing this? Is there some better way I can copy a Docker image with OS tools (ie, not using “docker”)? If I revert to an older version of Docker are things likely to start working again?

I’m pretty confused.

Here’s a little more data, from interesting stuff I’ve found:

  • if I copy the data onto the boot disk using rsync -axHAWX (from the original disk), the clone works fine.
  • if I make a tar file, then extract from the tar file (which would be preferred!), then I get this python error INSIDE the container:
2024-03-05T02:23:02.759733080Z   File "/usr/local/lib/python3.8/dist-packages/charset_normalizer/", line 10, in <module>
2024-03-05T02:23:02.760442136Z     from .cd import (
2024-03-05T02:23:02.760476216Z   File "/usr/local/lib/python3.8/dist-packages/charset_normalizer/", line 9, in <module>
2024-03-05T02:23:02.761441400Z     from .md import is_suspiciously_successive_range
2024-03-05T02:23:02.761491800Z AttributeError: partially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import)

…the tarfile is created like this:
tar --xattrs --format=posix -cvzf ${ROOTFS}-xattr-posix.tgz -C $ROOTFS .
…and extracted like this:
tar --xattrs -xpf /home/riz/testing/rootfs_r35_4-xattr-posix.tgz --checkpoint=10000 --warning=no-timestamp --numeric-owner -C /mnt
(For the record, I tried the tar with both POSIX and GNU format - no change)

Since I can get it to work with rsync, I at least have a path forward - but it would be MUCH more convenient if I could turn the file system into a single file, then extract it again…