Volume vs mounting a host folder in regards to file corruption

vrms · May 29, 2016, 12:41pm

this question comes from a place pretty remote from particular knowledge about docker, filesystems or data corruption. So in case it is silly or naive I might blame it on that aforementioned remoteness.

there are 2 different basic concepts for data storage (maybe even more but these 2 I am curious about here)

docker volumes or data containers
mounting a folder from the host into a container

Let’s say I store 100 files. From my host’s point if view these are 100 actual files (host folder mounted in container) or just 1 file (a data container/docker volume [containing 100 files seen from the containers perspective though]).

now I assume data corruption happens on filesystem level, so can you see it that the likeliness of a data container being corrupted (and practically causing the 100 files stored inside of it being lost) is as big as 1 single file being corrupted if stored in a hosts folder being mounted to a container?

Is that a too simplistic way to look at this or practically TRUE?

IF the above was TRUE … then I guess mounting a host’s folder into a container would be safer in this regards, wouldn’t it?

richardpayne · May 31, 2016, 9:24am

I don’t think there’s any difference between “normal” volumes and host volumes. They are, in fact, both volumes mounted on a host drive. The difference is that Docker controls volume folder for normal volumes.

If you look in /var/lib/docker/volumes you’ll see a load of directories named by volume id. Inside each one is a _data directory and that contains all of the files. A volume is not a single file.

vrms · May 31, 2016, 12:56pm

interesting an TRUE. thanks,

how a about a datacontainer (if that’s the right term? I mean something like docker create -v /container/mountpoint1 -v /container/mountpoint2 --name datacontainer debian:8 for example? Also same thing?

richardpayne · May 31, 2016, 1:14pm

Yep, it’s still a container with non-host mapped volumes so those volumes for mountpoint1 and mountpoint2 are normal volumes, found under /var/lib/docker/volumes, just like any other.

dmaze · May 31, 2016, 1:20pm

I would think about the problem this way. Say a cosmic ray flips a single bit on your hard disk and the error is undetected. The actual data is the same size in any configuration (unless it’s being stored compressed, but I don’t think any readily available options do that) so your risk is equal. Similarly, the amount of metadata (filenames, permissions, timestamps, …) is more or less the same in any configuration, so there are details in how much risk there actually is, but for the most part, it’s probably pretty similar.

Yet another thing to consider is that the directory itself is a point of failure, even if you’re storing the files directly on the host system. That’s probably the best analogy, in fact: if Docker stored local volume content in a single-file container, what’s the risk of that file’s header getting corrupted vs. the directory object on the host getting corrupted? (So, all of the files are intact, but in the worst case you don’t know which files they are.) (I’ve had HDFS fail this way. Multiple times. Not fun.)

IME the actual risk of this sort of thing is negligible, and the best mitigation against it is the usual recommendations around backups and reproducible environments.

Topic		Replies	Views
Help with mounting volumes General	1	1226	October 18, 2015
What practical advantages do data volumes containers have over host directories as volumes and vice versa? General	2	5000	April 4, 2016
Mounting one host directory/file to multiple containers General	1	3457	February 8, 2016
Data Container vs Host Directory for Persistent Data General	2	11966	May 21, 2017
Mounting a data volume without overwriting the container directory General	2	14813	April 18, 2018

Volume vs mounting a host folder in regards to file corruption

Related topics