I run a distributed volunteer computing project, www.cosmologyathome.org, where the code we send to the various volunteer hosts is in Docker containers. Currently we run a “docker pull” from each host to get the images on them. We’d like to instead deliver images to the hosts using our own internal file transfer system which is better since it presents the users with a download indicator, automatically retries failed downloads, and can run in the background while other jobs are running. The question is how to take a Docker image and smartly break it up into files which we can deliver via this system.
A simple solution is “docker save” the image and send the tar, which is basically perfect except for that we lose the ability to not resend layers which they already have. Since file transfer is not great for many hosts, avoiding transferring unnecessary data is important. So we could “docker save” the image, open up the tar, and send the different layer folders individually, that way if they already have one we don’t resend it. Then on the host the image can be put back together again and "docker load"ed. This seems OK to me, although I’m worried its dependent on some internal specification which may change in the future breaking this process. Is there any cleaner way to do what we’re trying to do?