File access in mounted volumes extremely slow, CPU bound

Docker team, please HELP :slight_smile:

@dsheets Thanks for your informative post. But it didn’t mention anything about the host being CPU bound during file system access at all. I can live with osxfs file system access being somewhat slower than native, but when filesystem access locks a CPU core at 100% on the host it makes it difficult to run other applications on the host during development or other containers. Even if file system access was fast (near-native), if the host is locked at 100% to do it, that’s far from ideal.

3 Likes

For what it’s worth, after upgrading to the latest stable version of docker for mac, I’m still experiencing the slowness.

Can anyone else confirm?

1 Like

I just tried the released “Docker for Mac” bits. Everything else is fine but I am finding that just a simple UNIX find operation on a mounted file system is 10-15 times slower. This is a showstopper for me, back to my previous solution…

1 Like

Are you using a volume in your dockerfile? When I do that, it’s incredibly slow. I’m using other solutions to fix it (docker-sync mostly)

Many of the performance improvements that you’re talking about have been implemented a long time ago by NFS. While I understand that NFS certainly has it’s drawbacks, would it be possible to offer NFS as an alternative that users can switch to while the issues in osxfs are ironed out? Despite @eugenmayer’s assertion that NFS is too slow to be useful, I’m quite happy with it in most of my Vagrant environments. I work on large Drupal sites, so there’s some slowness, sure, but it’s certainly not intolerable, and I’d consider it fast compared to osxfs right now. No offense intended - I know you’re working on it - but that’s what I’ve observed.

More broadly, I’m curious about why NFS (or some other existing/proven project) wasn’t chosen for the base to build on here. If that were the case, the only custom bits that would need to be built would be the event propagation from host -> vm and maybe some caching trickery to speed things up in the VM.

The biggest problem with NFS (for me) is that you don’t get fs events over the mount. You’d still need something else that can propogate those. (As your second paragraph says when I read it again…)

Personally, I use Dinghy with great success. It uses NFS, and has a daemon that watches for events on the host and sends them into the docker VM which simulates them so containers see them.

could you tell more about your solution? I am testing docker for mac now and ended up with cp to tmp folder to build my project, that is annoying :slight_smile:

To steer away people looking for a solution for shares, not matter if it is osxfs or something else (at least it should work), i created this discussion/solution here: Alternatives to OSXFS / performant shares under OSX - so this topic can stay on “when/how to fix osxfs specifically”.@olat this is also for your (docker-sync)

Hopefully this is the last attempt to keep it on topic, i am aware that i am not free of guilt here, sorry.

2 Likes

In regards to easily repeatable performance testing I found a pretty simple case that demonstrates the large performance gap.

tl;dr
In general, virtualbox volume mounts were about 4x faster than docker for mac. We consistently saw around 20MBs write throughput under virtualbox but around 4.5MBs using docker for Mac.

############# Docker for Mac

Current Docker Engine Version

→ docker -v
Docker version 1.12.0, build 8eab29e

Run default ubuntu:14.04 container

docker run --rm -v /tmp:/code -it ubuntu:14.04.4 bash

Volume mount from /tmp on macbook pro to /code in container. Seeing 4.5 MB/s!!

root@0da16bd185e9:/code# pwd
/code
root@0da16bd185e9:/code# time dd if=/dev/zero of=test.dat bs=1024 count=100000

100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 22.5756 s, 4.5 MB/s

real 0m22.587s
user 0m0.100s
sys 0m1.060s
root@0da16bd185e9:/code#

Writing the same file in container

414MB/s!

root@0da16bd185e9:/code# cd ~
root@0da16bd185e9:~# pwd
/root
root@0da16bd185e9:~# time dd if=/dev/zero of=test.dat bs=1024 count=100000
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 0.24733 s, 414 MB/s

real 0m0.249s
user 0m0.020s
sys 0m0.250s

########## Falling back to docker toolbox with virtualbox VM

root@93391eaa7a20:/code# time dd if=/dev/zero of=test.dat bs=1024 count=100000
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 4.91868 s, 20.8 MB/s

real 0m4.923s
user 0m0.000s
sys 0m2.380s

Not great, but certainly a lot faster.

While various suggestions have been made for alternative network filesystems or file syncing, has anyone considered the possibility of syncing/sharing at the block level rather than the filesystem level? If you imagine the host and the docker vm both as devices accessing shared storage and you treat the shared storage as a block device (think partition mount) then maybe solutions such as GFS2 would work. No idea how the block device mounting would work but I thought I’d mention it :slight_smile:

1 Like

I’ve gotten around this my setting up a mirror folder and syncing from my volume to that using Unison

Not a full working example, I’ve just pulled out the relevant parts. Here it syncs from /var/www/mirror to /var/www/html

Example docker-compose.yml:

web:
  build: ./docker/web
  ports:
   - "80"
  volumes:
   - .:/var/www/mirror

Dockerfile for web:

FROM php:5.6.20-apache

RUN apt-get update && apt-get install -y \
        supervisor \
        ocaml

RUN mkdir -p /var/lock/apache2 /var/run/apache2 /var/run/sshd /var/log/supervisor
RUN mkdir -p /var/www/html

RUN mkdir -p /root/unison
COPY unison-2.48.4 /root/unison
WORKDIR /root/unison
RUN  make UISTYLE=text
WORKDIR /root

COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf

VOLUME /var/www/html

CMD ["/usr/bin/supervisord"]

supervisord.conf:

[supervisord]
nodaemon=true

[program:unison]
command=/bin/bash -c "cd /root/unison && ./unison /var/www/mirror /var/www/html -auto -batch -repeat=watch -retry=5 -ignore=\"Name {.git,*.swp}\""
stdout_events_enabled=true
stderr_events_enabled=true

[program:apache2]
command=/bin/bash -c "apache2-foreground"
stdout_events_enabled=true
stderr_events_enabled=true
1 Like

Tried unison for dealing with volumes for Magento 2 and seeing issues with syncing large volumes. Not an ideal solution we’ve been able to get acceptable load times if we pre-build images with and avoid mapping certain core library folders.

Looking forward to this issue being solved otherwise keep up the good work Docker!

Thank you @dsheets for detailed explanation, very helpful.

I’ve noticed an improvement in sync alpine image:

$ docker run --rm -ti -v ~/Junk:/mnt alpine sh

And running command to generate file:

time dd if=/dev/zero of=test.dat bs=1024 count=100000
  • from docker inside non-volume dir - fast (~0.3 sec)
  • from docker inside volume dir - slow (30 sec)
  • from host dir to docker mounted volume - fast (~0.3 sec)

For me it ok, as I don’t generate large files inside docker that requires sync back to host.

Writing log file inside volume is OK’ish - 10 times by 10 MB is approx 3.7 sec.

I don’t see why host -> docker writing would ever be slow - or at least the writing to the native disk part. I may be missing something though. The slowness with reading is still there sadly so anything that reads a lot of files are still slow.

Hi all, I think I’m hitting on the same problem. Filed an issue https://github.com/docker/docker/issues/25656 and was redirected to https://github.com/docker/docker/issues/21485 which is probably the best technical recap of these types of issues I’ve read so far.

Would love an update on where this is going with the Docker for Mac roadmap. We use a dozen or so containers when developing locally, 2 of them being Gulp watchers. The filesystem scanning simply tears into the host’s CPU when it runs.

Edit: I just spotted @dsheets reply from a few weeks ago. Great response and thanks for the update! I can’t give as much detail as the post requested, but know that the two biggest pain points on my stack have been:

  • gulp 3 watch processes, which are based on the standard Node 5-slim Hub image. These hurt us the most because they sit idly in the background during development but still peg the CPU due to their filesystem monitoring.
  • Validation routines on our code using PHP Code Sniffer. This is an on-demand process which tokenizes entire directories of PHP files before analyzing them.

Created a test case for Gulp.
https://github.com/den-t/docker-for-mac-gulp-test/
Currently hard to test in measurable terms. It just shows prolonged periods of CPU load, after npm install finishes.

@taiidani Could you suggest a watcher that will showcase the problem better than what is there.
PRs welcome. :slight_smile:

I recommend using a package like gulp-watch or gulp-nodemon or similar that takes advantage of inotify file system events rather than constantly polling the file system.

1 Like

I’ve got an example repo using gulp-watch over gulp.watch here for the answer to this question

1 Like