Docker Community Forums

Share and learn in the Docker community.

Real world File Sync requirements - Magento 2 example

(Garyamort) #1

Reviewing the current docker for max file syncing strategies, it strikes me that there is little relationship between what they are striving to accomodate and a number of real world situations. They focus on keeping a volume “immediately” in sync with another. In my case, I can wait 2 whole seconds before they are in sync.

Take Magento 2 development. I am developing on my host, and testing in a container. The file structure is all under 1 volume:


For the volume there is a general rule:
Any changes I make on the host must appear inside the container within 2 seconds. 2 seconds is the soonest I can change from the editor I made a code change in, and reloading the web page in the browser. So don’t try for millsecond response times, just a second or two. If for some reason my system crashes before it syncs I do not care, I have to reboot and start the container all over again and the sync can happen at that point.

After the initial docker creation, almost all changes to these images will be done inside the container. These images are dynamically generated when missing during page load. I really do not have any need for them to get synced back to the Host immediately. I can wait 5 minutes for such a sync. The only reason I want them on my host at all is so that 3 days from now when I start working on a new feature and start up a new container, being able to avoid having to regenerate the images on the first page load will save a lot of dev time.

As a general rule of thumb, these files will be edited on my Host and need to sync from the host to the container within 5 seconds.

I have no desire to have any of these files on my Host at all. They are being written to constantly, and I am more then content to access them directly on the container if I need to.
$HOME/Projects/mysite/var/sessions - see comment regarding logs.
$HOME/Projects/mysite/var/cache - see comments regarding logs

These files are statically generated via the command line. I can create them on the Host or the Container. If I generate them on the host, then I need a 2 second sync as per the general rule so they are there when I load a page. If i generate them on the host, I need a 5 second sync so that I can review the files in my editor. I am more then happy picking a single rule, for example cached, and then having to only generate them via the container.

Based on the above, I need one new rule: never sync for directories such as var/log, var/sessions, and var/cache

I need one additional parameter for cached, delegated, and consistent - target sync time. Let me tell Docker what the urgency is for the sync process.

I also need to be able to declare these policies for subdirectories. On my host, it is not really possible to maintain an entirely seperate directory and symlink subdirectories into the correct place. Moreover it is confusing on what a reasonable symlink policy is on the host side and the container side.

Nor is it clear how I would define those volumes to begin with. Given my magical priority numbers, let’s look at the directories. Because there may be directories where my assumption on my need is wrong, I will use consistent as my general rule - but I can live with a couple of seconds of delay in either direction.

It is unclear under the “shared volumes overlap rules” what happens here We have delegated volumes which are under cached volumes which are under consistent volumes. Based on it seems that I am stuck with having forced consisten sync across everything. In this real world case, I want the most specific rule to be honored.

Heck, this problem is so bad that if I had access to the source code I am perfectly happy to take a WEEK of my own time and try to add in 3 new rules. Call them sconsistent, scached, and sdelegated for “sorta …” and give them the “most specific” guidelines and if at all possible some sort of priority mechanism. I’ve got a team of developers who keep saying their blocked because of performance issues. I managed to appease them by limiting how much of the system CPU hyperkit can use - but give them 4 more weeks and they will be back to complaining about how badly their systems are running.