File access in mounted volumes extremely slow, CPU bound

Any news on the issue, does Docker team know about it?

Still same slow in rc4-beta20 :(, have to keep using rsync and other workarounds.

This issue also doesn’t seem to be in the list of known bugs/issues, i don’t think Docker team might be looking into it sadly.

I pretty much rsync files to tmp folder, build java project in tmp and copy targets to the volume. This reduce building time from 30 minutes to just 5 :slight_smile:

I am also very interestet in a native solution. I using Docker to run a drupal installation. Pageload takes more then a minute :frowning:

please Docker team, take notice of this sad issue!

2 Likes

Maybe @dockersupport can confirm if the issue is noticed? It kind of surprises me it’s mentioned there’s no bug report to be found yet. So if it’s unknown, we might need to create a ticket and link to this thread :slight_smile:

@radmiraal Several weeks ago, I had read through the entire thread and had been keeping up with updates to this thread.

Buried in the beginning was a promise (that is, a statement of intent, not a statement of obligation) by a representative of the Docker team that they are working on this issue; that they intend to get this fixed for the first full release of Docker for Mac; and that their intended performance goal was to have better improvement than using shared volumes either VMWare Fusion or VirtualBox.

It also concerns me that this has been dropped from the official bug list, and it has been very quiet on this end. I get that this is probably a much bigger, complex technical issue. However, it feels like a cover-up. If the Docker team doesn’t intend to get this performance issue fixed by release time (or maybe, right now, there are no easy fixes), I’d rather have something said about this. A lot of people are eager to use this functionality, yet we need to have some sort of predictability because we’re going to base our own tech and development flow on this.

Someone else had also suggested that Docker bundle in Rsync or Unison with fswatch into Docker for Mac as an acceptable solution. I’m not sure about the headaches of bundling a software compiled from OCaml source, but it does fulfill the main objectives: ease of use, and faster performance than shared folders on VMs. This seems like an acceptable solution to me – and obviously, a lot of other people, considering that there is great interest in using these alternatives. If they were bundled, they would not be alternatives.

It seems like a case of “not invented here” or maybe, “tunnel vision” – getting fixated on some objective that is no longer as relevant. Or maybe, given alternatives like rsync and unison, this performance issue was deemed as not as critical to get this fixed before the end of the release. But what do I know? I can’t read minds or get inside the head of Docker. I mean, after all, that’s what communication is for.

Dear Docker Team,

Please let us know what’s going on with this bug and what you intend or not intend to do with it so that uncertainty in the community can be reduced.

Thank you,
Just Some Guy Who Uses Docker

6 Likes

I’m not sure if this is a real big issue.

Is it not the case that if you leave your Docker volumes in the VM, that I/O is very fast?

The only problem is when you mount a Mac OS folder as a volume in the container that I/O is very slow?

Or, am I missing the issue?

For my containers, I choose to place them on the Mac in my home folder. This means that initialization of my container is slow (takes around 30 seconds when the nginx container does a “chown -R nginx:nginx /www” which has around 8,000 files in it). Loading a page in my browser is slower too because my app is written in PHP, but I use the PHP OpCache to speed things up a bit. I do have the cache check for changed PHP files so it isn’t as fast as it could be if the /www directory with my source files were on a local volume, but I like the convenience of editing my files on the Mac and not in a container.

Anyway, I would think a solution for most people that don’t need the changes made in the Mac filesystem to be reflected immediately within the container would be to leave the volumes in the VM. That is, use “docker volume” to create the persistent volumes and run a “sync” container to sync the contents of the corresponding Mac folder to the volume in the VM before running your app containers/services (that use those docker volumes). You can later run the “sync” container later whenever you want to re-sync the volumes back to the Mac.

This is the way I used the Docker Toolbox which ran a VirtualBox VM. I left the volumes in the VM and “sync’d” to the Mac for backup and to make sure my changes survive the VM going away (being reinitialized). With VirtualBox, I had problems with the permissions within the container and the owner of the files on the Mac (the MySQL database always had problems running in a shared Mac folder within VirtualBox, so I regularly did backups to the Mac and left the database in VirtualBox).

I’m not sure, but aren’t shared folders in VirtualBox slower than folders in the VM too?

Maybe not as slow as shared folders in Docker for Mac, but there was probably some slowness.

I think that it will be possible to speed up shared folders in Docker for Mac over time, but I am happy to have the choice of mounting Mac folders inside my containers and have the permission problems solved and to use Docker local volumes in the VM when I need filesystem access that is faster.

Of course, I am only using Docker for Mac as my development environment and not as a staging/production swarm cluster so slower page loading, while somewhat annoying, do not really affect my productivity.

Or, did I miss the point of this thread?

The point is simple - there is only one single scenario to run docker under OSX - and that is development. There is no way you run a production stack or anything persistent under a Darwin kernel.

Development means, for a lot of people, changing code, obviously. And that means using shares.

If you want to offer a docker-environment for OSX, you have to support development, or its basically useless.

@hosh very, very nice writeup. You totally nailed it.
I also think, that fixing osxfs is by far more complicated then the docker for mac team expected, but it also reflects, why VMware ( not a very small company ) nor SUN / Oracle or Parallels managed to ever solve this.
Being realistic, there is only a little chance, the docker team will manage to solve this in the near future. This is not disrespectful to the docker team, not at all. It just a really complex and complicated problem.

That said, that is why so many alternatives have been spawned, based on rsync/unison/nfs or similar. Because people do no longer expect this issue to be solved in a timeframe, were it makes sense waiting.

4 Likes

How is Docker for Mac worse than Docker Toolbox with VirtualBox?

What is the performance difference with shared Mac folders between the two?

For my development, Docker for Mac has been a big step forward and not backwards. Would NFS mounts in the Moby VM be faster? If so, maybe have Moby VM mount all Mac shared folders using NFS instead of the fuse osxfs as a User Preference option (like we can change the size of the VM and restart it)?

Or mount the shared folders under /nfs too so we could have the option of specifying the shared Mac folder in the -v option either with the full Mac path or its path under /nfs.

Just had an idea…

What if Docker for Mac had a Preference Setting for creating/mounting/deleting extra virtual disks to the Moby VM (.qcow2 files)?

The Preference Panel could have a setting for mounting such a virtual disk to a mount point in the VM wherever I wanted (e.g., to /data).

The virtual disk would only be used for Docker volumes mounted by “-v /data/db:/var/lib/mysql” type volumes. Alternatively, you might have a checkbox to place volumes created by “docker volume” command on this virtual disk too. I/O should be fast within a docker container. The disk space for the virtual disk on the Mac side would be only as big as needed by the persistent data that I keep in it, so it can easily be backed up by Time Machine.

The main problem I have with keeping my persistent data in the Moby VM is that I lose it if I ever need to do a “Reset to factory defaults” which I have been doing once a week or so during the public beta process. A secondary problem I have is that the Moby VM .qcow2 file currently grows to many GBs so I don’t back it up to Time Machine.

With this feature, I would be able to have persistent data stored in an image file that is reasonably small, easily backed up (as a single file), and have fast I/O to it inside my app. I wouldn’t be worried that the .qcow2 file and my data just gets wiped out by needing to optimize the main .qcow2 file or re-installing Docker for Mac to get back to a working installation.

Further down the road, maybe the Moby VM could NFS export the mounted virtual disk so the filesystem could be accessed on the Mac side. For my development, I only use vim to edit my files, so I can edit them from inside a container running in the VM. But, maybe if I could mount the .qcow2 using NFS on the Mac side, I could use other Mac based tools to access my data files on the Mac side. These tools don’t usually need superfast I/O on the Mac side, so accessing them over a network connection would be fine, in most use cases, I would think.

Ideally every developer on the team would not have to fiddle with settings parameters to ensure adequate performance.

A vanilla install of just a few tools and running a script straight from a repository checkout is ideal.

I’m all for a CLI tool I could run on the Mac side to configure the Moby VM making it easy to install/reset/setup Docker for Mac. But, the Docker for Mac team seems to have decided that it is best to have the user configure the Moby VM from within the Preferences Panel inside the Moby Menu. I always have to remember to bump the size of the VM to 8GBs after a reset since building my images currently takes a bit more than 2GBs of virtual memory.

If you scroll up towards the beginning of this thread you will see countless posts comparing performance between VirtualBox and Docker for Mac. The difference is usually one to two orders of magnitude (slower in docker for mac). In other words, unacceptable.

2 Likes

Is somebody solved this problem?

1 Like

The problem can only be solved by the docker team as they didn’t release the code of docker for mac in open source yet. I’m sure their are working on it.

For the moment the only workaround for local development on mac OSX seem to be https://github.com/EugenMayer/docker-sync . I’m using it and it’s pretty stable and very fast. Lot work was done in the past days and full dual-side synchronization is almost their: https://github.com/EugenMayer/docker-sync/pull/64

3 Likes

Hello everyone,

I would like to tell you a bit about the status of this issue, how/why it exists, what we are doing about it, what you can do to help us, and what you can expect in the future.

An apology

First, I’d like to apologize that we have been fairly quiet about this issue. Shared file system performance is a top priority for the Docker for Mac file system team but we will always prioritize fixing severe defects over performance improvement, in order to deliver high quality software to you.

Understanding performance

Perhaps the most important thing to understand is that shared file system performance is multi-dimensional. This means that, depending on your workload, you may experience exceptional, adequate, or poor performance with osxfs, the file system server in Docker for Mac. File system APIs are very wide (20-40 message types) with many intricate semantics involving on-disk state, in-memory cache state, and concurrent access by multiple processes. Additionally, osxfs integrates a mapping between OS X’s FSEvents API and Linux’s inotify API which is implemented inside of the file system itself complicating matters further (cache behavior in particular).

At the highest level, there are two dimensions to file system performance: throughput (read/write IO) and latency (roundtrip time). In a traditional file system on a modern SSD, applications can generally expect throughput of a few GB/s. With large sequential IO operations, osxfs can achieve throughput of around 250 MB/s which, while not native speed, will not be the bottleneck for most applications which perform acceptably on HDDs.

Latency is the time it takes for a file system system call to complete. For instance, the time between a thread issuing write in a container and resuming with the number of bytes written. With a classical block-based file system, this latency is typically under 10μs (microseconds). With osxfs, latency is presently around 200μs for most operations or 20x slower. For workloads which demand many sequential roundtrips, this results in significant observable slow down. To reduce the latency, we need to shorten the data path from a Linux system call to OS X and back again. This requires tuning each component in the data path in turn – some of which require significant engineering effort. Even if we achieve a huge latency reduction of 100μs/roundtrip, we will still “only” see a doubling of performance. This is typical of performance engineering, which requires significant effort to analyze slowdowns and develop optimized components. We know how we can likely halve the roundtrip time but we haven’t implemented those improvements yet (more on this below in What you can do).

There is hope for significant performance improvement in the near term despite these fundamental communication channel properties, which are difficult to overcome (latency in particular). This hope comes in the form of increased caching (storing “recent” values closer to their use to prevent roundtrips completely). The Linux kernel’s VFS layer contains a number of caches which can be used to greatly improve performance by reducing the required communication with the file system. Using this caching comes with a number of trade-offs:

  1. It requires understanding the cache behavior in detail in order to write correct, stateful functionality on top of those caches.

  2. It harms the coherence or consistency of the file system as observed from Linux containers and the OS X file system interfaces.

What we are doing

We are actively working on both increasing caching while mitigating the associated issues and on reducing the file system data path latency. This requires significant analysis of file system traces and speculative development of system improvements to try to address specific performance issues. Perhaps surprisingly, application workload can have a huge effect on performance. I will describe two different use cases and how their performance differs and suffers due to latency, caching, and coherence:

  1. The rake example (mentioned upthread) appears to attempt to access 37000+ different files that don’t exist on the shared volume. We can work very hard to speed up all use cases by 2x via latency reduction but this use case will still seem “slow”. The ultimate solution for rake is to use a “negative dcache” that keeps track of, in the Linux kernel itself, the files that do not exist. Unfortunately, even this is not sufficient for the first time rake is run on a shared directory. To handle that case, we actually need to develop a Linux kernel module or patch which negatively caches all directory entries not in a specified set – and this cache must be kept up-to-date in real-time with the OS X file system state even in the presence of missing OS X FSEvents messages and so must be invalidated if OS X ever reports an event delivery failure.

  2. Running ember build in a shared file system results in ember creating many different temporary directories and performing lots of intermediate activity within them. An empty ember project is over 300MB. This usage pattern does not require coherence between Linux and OS X but, because we cannot distinguish this fact at run-time, we maintain coherence during its hundreds of thousands of file system accesses to manipulate temporary state. There is no “correct” solution in this case. Either ember needs to change, the volume mount needs to have coherence properties specified on it somehow, some heuristic needs to be introduced to detect this access pattern and compensate, or the behavior needs to be indicated via, e.g., extended attributes in the OS X file system.

These two examples come from performance use cases contributed by users and they are incredibly helpful in prioritizing aspects of file system performance to improve. I am personally developing statistical file system trace analysis tools to characterize slow-performing workloads more easily in order to decide what to work on next.

Under development, we have:

  1. A Linux kernel module to reduce data path latency by 2/7 copies and 2/5 context switches

  2. Increased OS X integration to reduce the latency between the hypervisor and the file system server

  3. A server-side directory read cache to speed up traversal of large directories

  4. User-facing file system tracing capabilities so that you can send us recordings of slow workloads for analysis

  5. A growing performance test suite of real world use cases (more on this below in What you can do)

  6. Experimental support for using Linux’s inode, writeback, and page caches

  7. End-user controls to configure the coherence of subsets of cross-OS bind mounts without exposing all of the underlying complexity

What you can do

When you report shared file system performance issues, it is most helpful to include a minimal Real World reproduction test case that demonstrates poor performance.

Without a reproduction, it is very difficult for us to analyze your use case and determine what improvements would speed it up. When you don’t provide a reproduction, one of us has to take the time to figure out the specific software you are using and guess and hope that we have configured it in a typical way or a way that has poor performance. That usually takes 1-4 hours depending on your use case and once it is done, we must then determine what regular performance is like and what kind of slow-down your use case is experiencing. In some cases, it is not obvious what operation is even slow in your specific development workflow. The additional set-up to reproduce the problem means we have less time to fix bugs, develop analysis tools, or improve performance. So, please include simple, immediate performance issue reproduction test cases. The rake reproduction case by @hirowatari above is a great example (as are other contributions, thank you!) because it has:

  1. A version-controlled repository so any changes/improvements to the test case can be easily tracked

  2. A Dockerfile which constructs the exact image to run

  3. A command-line invocation of how to start the container

  4. A straight-forward way to measure the performance of the use case

  5. A clear explanation (README) of how to run the test case

What you can expect

Docker for Mac will be leaving Beta in the near future. It is unlikely that it will include major shared file system performance improvements before that time. However, we will continue to work toward an optimized shared file system implementation on the Beta channel of Docker for Mac. We have put a note about shared file system performance in the “Known Issues” section of the upcoming release notes and the Docker for Mac documentation.

You can expect some of the performance improvement work mentioned above to reach the Beta channel in the coming release cycles.

In due course, we will open source all of our shared file system components. At that time, we would be very happy to collaborate with you on improving the implementation of osxfs and related software.

Finally, the nitty gritty details of shared file system performance analysis and improvement will be written up in more detail and published on the Docker blog. Do look out for those articles in the coming months as they will serve as a good jumping off point for understanding the system and, perhaps, measuring it or contributing to it.

Wrapping Up

I hope this has given you a rough idea of where osxfs performance is and where it’s going. We are treating good performance as a top priority feature of the file system sharing component and we are actively working on improving it through a number of different avenues. The osxfs project started in December 2015 (~7 months ago). Since the first integration into Docker for Mac in February 2016 (~5 months ago), we’ve improved performance by 50x or more for many workloads while achieving nearly complete POSIX compliance and without compromising coherence (it is shared and not simply synced). Of course, in the beginning there was lots of low-hanging fruit and now many of the remaining performance improvements require significant engineering work on custom low-level components.

I’d like to thank you for your understanding as we continue development of the product and work on all dimensions of performance. I’m excited to work with many of you as you report issues and, soon, collaborate on the source code itself.

Thanks for participating in the Docker for Mac Beta!

Best regards,

David Sheets

32 Likes

Thanks @dsheets for the very detailed wrap-up.
Do you have a schedule for the next releases ? And when Docker for mac will exit the beta state, i guess that docker toolbox will be deprecated, is this correct ?

Thanks for the very thoughtful and thorough response, @dsheets! If you’re able, I have a couple follow-up questions:

  • The performance of NFS across a wide range of even bi-directional use cases with the Docker VM is astounding, as reported by many folks here. Has the osxfs team looked into implementing any of the core design behind NFS, or even piggybacking on NFS itself, to achieve more order-of-magnitude speed ups?

  • It was noticed earlier in the thread that an update of Docker For Mac included a note about enabling NFS in the changelog. Will there be a user-friendly way to perform volume mounting through this method, so that complex vm configurations don’t need to happen every time the vm is rebuilt for every member of a dev team? This would be a really fantastic alternative while osxfs is being worked on.

Thanks again for your time and attention!

NFS is still, for bigger projects, extremely slow in reads. I did several benchmarks and tested several projects like listed here https://github.com/EugenMayer/docker-sync/wiki/Alternatives-to-docker-sync#nfs and for projects like rails / symfony / drupal and other bigger stacks like spring (java) it becomes just as unusable as OSXFS.

NFS performing that bad was the reason i created the alternative https://docker-sync.io and to achieve the same 2-way transparent sync, we are working on unison+unox https://github.com/EugenMayer/docker-sync/pull/64

As you see, we have different strategies there, from rsync to unison one way, unison two way based on OSXFS (being blocked by Inotify events are not / stop working from host to container btw) and finally the current unison+unox standing out as the best solution. You can try and benchmark yourself to see the differences.

So bottom line, please do not suggest NFS as a “build in solution” i basically beg you. Its just too bad in read performance to consider it. For smaller projects the performance of NFS to OSXFS right now is even similar so OSXFS might be enough already. Since NFS is not suitable for bigger projects, choose something else like unison+unox, IMHO.

2 Likes

I’ve been using docker-volumes over NFS (dlite) for full-time development of quite big Drupal projects for at least half a year now. It’s slower than native IO, but is very far from useless It is actually fast enough that I’ve used it exclusively for all my development work the last couple of months.

An option to use NFS would quite possible be enough for me to switch to Docker for Mac.

I guess i cannot disagree since we just both have opinions, but here are probably some facts to help, why our results differ ( i am a drupal developer myself btw):

  • you use opcache in FPM env to speed reads up ( avoid them ) - fair enaugh, until you do not use timestamps=0, which does not suite development
  • you most probably are not using drush a lot (being balsy here)

the latter is the issue, you might be ok during development with FPM during opcache reducing the reads, but bigger environments require tooling on the cli like durch, were opcache is not in place. And there, NFS is really painful. In my case, it takes 45 seconds for druch cache-clear all ( i bet you know what that is ), while on native speed that will run in 7 seconds at maximum. So there is a huge gap.

UPDATE: that 45 seconds were benchmarked with dlite, just to add this

1 Like