This is a weird problems we are having I’m interested in understanding more of. We have an image of MySQL with a populated database. The image itself is about 100 Gb. I understand that it is likely an anti-pattern to have an image of that size that we pass around, but for the sake of this discussion I’m interested in understanding the differences we are seeing in docker environments.
What we are seeing is that in some environment, we can create a container from this image and have MySQL started and ready in about 1-2 seconds. Every database/table is queryable instantly. This is a very desirable behaviour as a user of the image can very quickly discard any changes and get back to clean state in a matter of seconds.
However, in other environments MySQL often takes 30-60 seconds to start, and then when the database is queries the container will hang for anywhere between 2-15 minutes. During this time we see spikes in I/O and an increase of disk usage on the host system.
What I’m suspect is happening is that docker for whatever reason decides that large files in the docker image needs to be first copied to the container before any reading can be done. This would both explain the high I/O and increase of disk space.
Sadly, we have not been able to understand what the factors are that make environments behave differently. In particular, I would expect details like storage driver, backing file system and “Native overlay diff” to matter, but they are all the same. As are the docker version itself (we use latest). This is not a new problem, we have seen it for literally years.
Myself does not have this problem. I suspect that it might actually be related to the kernel used. My docker environment for the time being is a CentOS 7 install in a VirtualBox (Windows is host). In fact, CentOS 7 is the only environment so far where we have observed the “fast” behaviour. A VirtualBox with a newer distro? High IO and increased disk usage. CentOS 7 is very old, and its kernel as well.
Does anyone have any tips on how we can troubleshoot this? Preferrably we would like to find the set of settings to change to get the desired behaviour in all environments. Environments in this context refers to developer machines so no need to worry about any production issues.