this question comes from a place pretty remote from particular knowledge about docker, filesystems or data corruption. So in case it is silly or naive I might blame it on that aforementioned remoteness.
there are 2 different basic concepts for data storage (maybe even more but these 2 I am curious about here)
docker volumes or data containers
mounting a folder from the host into a container
Let’s say I store 100 files. From my host’s point if view these are 100 actual files (host folder mounted in container) or just 1 file (a data container/docker volume [containing 100 files seen from the containers perspective though]).
now I assume data corruption happens on filesystem level, so can you see it that the likeliness of a data container being corrupted (and practically causing the 100 files stored inside of it being lost) is as big as 1 single file being corrupted if stored in a hosts folder being mounted to a container?
Is that a too simplistic way to look at this or practically TRUE?
IF the above was TRUE … then I guess mounting a host’s folder into a container would be safer in this regards, wouldn’t it?
I don’t think there’s any difference between “normal” volumes and host volumes. They are, in fact, both volumes mounted on a host drive. The difference is that Docker controls volume folder for normal volumes.
If you look in /var/lib/docker/volumes you’ll see a load of directories named by volume id. Inside each one is a _data directory and that contains all of the files. A volume is not a single file.
how a about a datacontainer (if that’s the right term? I mean something like docker create -v /container/mountpoint1 -v /container/mountpoint2 --name datacontainer debian:8 for example? Also same thing?
Yep, it’s still a container with non-host mapped volumes so those volumes for mountpoint1 and mountpoint2 are normal volumes, found under /var/lib/docker/volumes, just like any other.
I would think about the problem this way. Say a cosmic ray flips a single bit on your hard disk and the error is undetected. The actual data is the same size in any configuration (unless it’s being stored compressed, but I don’t think any readily available options do that) so your risk is equal. Similarly, the amount of metadata (filenames, permissions, timestamps, …) is more or less the same in any configuration, so there are details in how much risk there actually is, but for the most part, it’s probably pretty similar.
Yet another thing to consider is that the directory itself is a point of failure, even if you’re storing the files directly on the host system. That’s probably the best analogy, in fact: if Docker stored local volume content in a single-file container, what’s the risk of that file’s header getting corrupted vs. the directory object on the host getting corrupted? (So, all of the files are intact, but in the worst case you don’t know which files they are.) (I’ve had HDFS fail this way. Multiple times. Not fun.)
IME the actual risk of this sort of thing is negligible, and the best mitigation against it is the usual recommendations around backups and reproducible environments.