Upgrade data within data-container

I have a data-container that has persistent data within a volume (say /var/data). That container holds the persistent data for another container with the software.

For a new version of the software, the persistent data needs to be upgraded (structure or layout changed etc). As a result, I want to have another data container with the upgraded data at the same location (/var/data) and still keep the old data-container with its data unchanged.

This way, I can use the old data-container with the old version of the software, in case something goes wrong.

But how can I do that? The steps required to achieve the desired result are not obvious to me.

I can run an upgrade container like docker run -i -t --name temp --volumes-from data -v /upgraded image /upgrade_script.sh but then, how do I move the upgraded data back to the original position without overwriting the old data? If I do docker run -i -t --volumes-from temp image cp /upgraded /var/data it will overwrite my old data. Do I have to use a host-mounted volume for the upgraded data, or is there some more elegant solution?

Just guessing here since in general I prefer to use straight host mounted volumes and am struggling to find the utility of data containers.

But … could you not commit your data container and then save that image etc. ?

There is some interesting work going on in adding volume management tools to docker - i think they’re heading for 1.4 , I’ll make some enquirys. (there’ll be a docker volumes list and manipulation stuff)

I’d probably make a backup_data volume container, then run a data-migration image, which I’d run attached to both data and backup_data - the first thing it might do is copy everything from data to backup_data, and then it would do the data-migration.

And then you could run both old and new, attached to each respective databackend (with the backup possibly attached readonly?)

Doing this should pretty much be the same if you use host-mounted, either directly, or via data container style ambassadors.

Oh, and do consider the awesome-ness of @sam’s suggestion of snapshotting a container using docker commit
:paw_prints:

Your suggestions is in line with my first thoughts, but it does not fulfill my expectations, since after the procedure, the migrated data and original data will be under different paths and I see no way to change that, because non-host volumes can not be re-mounted to a different path. The path of a volume from a data-container is static - even for containers that inherit the volumes by “–volumes-from”.
This is different for host volumes, since I can change the mount-location of these on each docker-run invocation.

I think these volume management tools you talk about are very much needed. For me, this data-container docker idiom feels more like a work-around.

Could you elaborate on the “awesomeness of docker commit”, because I can not see it, yet. At least for the use-case at hand. As far as I know, a docker-commit gives me a new image which contains the current state of a container. So that would include all OS-data besides the persistent data I am interested in.

oh crud. you’re right, the volume path is currently static. so you’d need one more step

  1. you have the existing data container at /data
  2. migrate into a temporary data container at /migration (as you have the original mounted)
  3. migrate the /migration data into a new upgraded-data container mounted at /data (this second migration image would not need the original data-volume mounted

@cpuguy83 might be able to tell you more about the new tools :smile:

wrt docker commit - when you commit, you are not making a single image layer containing everything, you’re making a new image layer, containing all the changes (vs the image the container was started with) made in the container.

So if you use the container, and not volumes for your persistent data, and volume off things like logs, you could use docker commit to snapshot/backup just your persistent data - and docker export might let you store those layers too.

I actually use data containers like unix pipes; I find they fit more natural in that paradigm

docker run -name some_pipe_storage some_container_which_generates_data

docker run --volumes-from some_pipe_storage something_that_operates_on_data

Syntacticaly, pretty cumbersome. Very powerful as a primitive though.

2 Likes

Yeah, I wouldn’t rely on “commit” because you are going to be limited to 127 commits, unless you flatten the images out.

@matlehmann see github.com/cpuguy83/docker-volumes
This is a far from perfect solution, but works fairly well in the meantime.

1 Like

@sven thanks for your reply and more information. I still do not understand step 3 “migrate the /migration data into a new upgraded-data container mounted at /data (this second migration image would not need the original data-volume mounted”. As it is currently (with docker 1.2 and without special volume commands) I do not see how I can have a container with volumes from another container - part of them mounted and part of them not mounted. As far as I can see, it is either all or nothing, either with “–volumes-from other_container” or without. So if the migration-container from step 2 has the original data mounted, the container in step 3 has it mounted, too, and therefore the copy operation from /mifrated to /data would overwrite the original data. Or am I missing something here?

Thanks for your tip concerning the “commit” command, I need to ponder the possibilities of this a bit more.

@keeb this is a nice pattern, but as far as I see, it does not solve the problem I am talking about. All these “piped” containers are still bound to the volumes of “some_pipe_storage” and can not create a different container with different data at a given path without overwriting the original data. But maybe I miss your point?

mmm, lets see if I can make an example that explains it:

presuming that someone has already created some Docker images, webappv10, webappv11, webapp_migratorv10_to_v11.

Initially, you would have been running your 1.0 based system as

docker run -v /data --name datav10 busybox true
docker run -p 80:80 --volumes-from datav10 --name webv10 webappv10

and then to upgrade, requiring your data to me upgraded, you would do step 2 (as you noted, we can’t have both volumes in the same dir)

docker run -v /migration --name datav10-to-v11 busybox true

docker run --volumes-from datav10-to-v11 --volumes-from datav10 --name migration webapp_migratorv10_to_v11

and then step 3, to copy that migrated data to a new data container, with the data in the /data dir, ready for use

docker run -v /data --volumes-from datav10-to-v11 --name datav11 busybox cp -r /migration /data

and then run the v1.1 webapp

docker run -p 80:80 --volumes-from datav11 --name webv11 webappv11

and for extra credit, you’d script it all.

updated to add the datav10-to-v11 volume container step due to discussion below

1 Like

@sven thanks again for your detailed answer. I appreciate your thoughts and time.

However, the procedure you outline does not work. It makes the assumption that a volume mounted with “-v” overrules a volume with the same path inherited by “–volumes-from”. I just tested it again, to make sure, but this is not the case. That is why docker run -v /data --volumes-from migration --name datav11 busybox cp -r /migration /data overwrites my original data in container datav10.

It there a particular reason you are preferring a data container over a simple volume (that is far easier to grok / deal with)

I’m confused. The steps are carefully made so that there are never 2 volume-from statements that contain /data dirs. we’re using the migration one to buffer it, and then we’re copying that into the new data v11 container.

@sam - there’s so little conceptual difference between a bind-mounted volume and a volume container - you still do the same 3 steps - for me the biggest difference is that bind mount only works locally, and assumed you have the disk-space for it (which i do not), whereas the volume container approach assumes your Docker-data partition is big enough to run Docker containers, and works the same locally as remotely.

if you change the docker run -v /data ... lines to docker run -v /local/data:/data ..., then you’re using bind mounts, ambassadored.

@sam I currently switched to using bind-mounted (or whatever the official term is for “-v /host:/container”) volumes instead of using data-containers because of the shortcoming oulined in this thread. I started using data-containers, because that idiom is used and recommended all over the internet and seems to be the “official way”.

1 Like

@sven

  • “datav10” has
    • “/data” (via -v /data)
  • “migration” has
    • “/data” (via --volumes from datav10)
    • “/migration” (via `-v /migration’)
  • “datav11” has
    • “/data” (via -v /data)
    • “/migration” (via --volumes from migration)
    • “/data” (via --volumes-from migration)

So we have two volume definition for “/data” of container “datav11” - and for me it looks like the one from --volumes-from wins.

@sven I guess I am just struggling to figure out the reason you would be storing data in AUFS, it seems to be the wrong kind of file system to use for this problem. btrfs would be ok, but aufs seems an odd choice for log files, postgres databases and so on. Am I misunderstanding the mechanics of data containers ?

@matlehmann we use “volumes” extensively.

  • We store all logs outside containers in host mounted volumes for easy rotation and persistence. (another option here would be to stream them out of the container, but the mechanics are non trivial, think about say an NGINX container, what do you do with logs?)
  • We store some config on a mounted GlusterFS volume so we can sink stuff across the farm

@sven This would be a topic for another thread: I am really interested to hear about your GlusterFS setup. Are there ready to use images available that can be used as something like an ambassador for volumes within GlusterFS or how do you do that?

@supermathie would be best for details on our GlusterFS setup, however it is all setup in a very traditional way, we don’t use a trusted docker container to power up gluster.

OH (*&^, You’re right.

and I thought I was being so clever removing one step.

you need to put the /migration folder into its own data-container, so that you avoid the problem you noted in the last step.

I’ve updated the step by step example to reflect this