Docker Checkpoint & Restore on another host?

Docker recently has incorporated the CRIU (Checkpoint & Restore) into the experimental build (v1.13). It successfully works while doing C/R in single host.

Is there any way I can checkpoint & restore the container on another host? Is it possible by using overlay network?

Looks like you have to create the container on the target system first. Not necessarily run it, but just create it. So if you were using the CRIU examples, you’d have a busybox container named cr on the source system. Once you create the checkpoint on the source system, you need to copy that checkpoint directory to your target system (typically in /var/lib/docker/containers/). On the target system, you’d need to do sudo docker create --name cr busybox and then your restore command which would look like sudo docker start --checkpoint checkpoint1 --checkpoint-dir /tmp cr.

Hope this helps! I was able to test this on 2 identical Centos 7.2 machines (kernel 3.10.0-327.13.1.el7.x86_64) using Docker 17.03.1-ce with experimental mode on and criu 2.3 both packages installed with yum.

Alas, not all is well in Dockerland when it comes to CRIU. First if you’re using a 3.10 kernel like I did, you’ll find a lot of CRIU functionality doesn’t work. I had a hard time with “ghost files” and this specific issue.

So I started using an Ubuntu 16.04 LTS host (kernel 4.4.0-75-generic). This allowed me to install CRIU 2.6 which got me past the ghost file issue and I was able to successfully dump my container. However I then ran into issues when attempting to restore a checkpointed container. I have mounted filesystems in my container which CRIU and Docker in particular were unable to restore. This is documented in Docker issue 32227. The last 2 comments on that issue are pretty enlightening and if I knew Go, I’d fix it.

Hi,

I was able to work with CRIU with Docker for a single host (Ubuntu 16.04). But when it comes to migrating the application to another host, it looks like the application starts the count from the beginning. I am referring the document- https://criu.org/Docker

I ran an application first:
docker run -d --name looper2 --security-opt seccomp:unconfined busybox
/bin/sh -c ‘i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done’

Then created a checkpoint.
$ docker checkpoint create --checkpoint-dir=/tmp looper2 checkpoint2
Copied the checkpoint to another host.

Created new application
$ docker create --name looper-clone --security-opt seccomp:unconfined busybox
/bin/sh -c ‘i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done’

Here this application to stop state and if the restoring is done it gives error as the container has gone to exit state. But if the application is started the process starts from the beginning after restoring the state of the previous container even when the command ‘do echo $i; i=$(expr $i + 1)’ is not run.

Thank you.

Hi,
I am experimenting live migration in Docker. I print a series of numbers in one machine, then create a checkpoint and then resume printing rest of the numbers in another machine. So, I need to find out how much time it actually takes to transfer the file to the destination.
Is there any way to find out how much time(delay) it took to migrate? Thanks.

1 Like

Hi Anibar,

         I am trying to simulate the same environment.How are you transferring the files from one machine to another through network and did you managed to calculate time while transferring the files at run.Thanks
1 Like

I am currently trying the same thing. Could you explain how you did that?
I mean the migration