Am completely new to docker. I’ve been following the tutorials online and everything went fairly well, until I got a unpleasant surprise. Docker loses some of my data when I restart the container.
I know I can commit changes just before I restart the container. I find this tedious. Also there’s a possibility that the host OS can restart, hence not giving me time to save the state of my docker image. My question is ; can this be done automatically?
Also I read about creating volumes which persist your data in /var/lib/docker which is good. But this is only applicable to specific directories of the docker image. I want a solution that persists the entire docker image automatically. Thanks
Right, and this is important. When you docker run a container, it gets a temporary filesystem that’s a copy-on-write version of the filesystem in its underlying image.
It is totally normal and expected to need to occasionally restart a container for whatever reason. Maybe the host system dies; I’ve had some trouble with the Docker daemon hiccuping; in multi-host environments you might need to move a container to a different host. The container should be able to reconstruct whatever state it needs on startup.
Usually a container does only one thing, and so you can specify pretty concretely where its state is. If a container runs, say, Elasticsearch, the underlying data is in /usr/share/elasticsearch/data and that should be in a volume, but besides the application and its configuration, there’s nothing else in a container.
One other important part of the story is docker build and the Dockerfile system. Re-running docker build is usually pretty efficient because Docker won’t repeat work it’s already done. So the best way to set things up is to write a Dockerfile to install a single application, possibly add an ENTRYPOINT script to do startup-time setup, and if you ever need to change this setup, update the Dockerfile (and hg commit it to source control) and rebuild the image.
Finally, there’s an option to map parts of the host filesystem into the container, also using the docker run -v option. This should probably be used sparingly, but I find it an extremely convenient way to inject configuration files into the container and get log files back out. (I do sort of have the impression that using host directories at all is considered not a best practice, though.)
So, in summary:
docker build an image out of your application.
Commit the build artifacts to source control, so you can easily rebuild the image later.
Use environment variables or a host directory to inject configuration; use named volumes to store data across container runs.
Now there is nothing “precious” in the container itself and you can freely docker rm it without losing anything.