New file sharing UID/GID permissions break image portability

Expected behavior

I should be able to run any image I ran previously under Docker Toolbox, under Docker for Mac.

Actual behavior

I have images which have entrypoint scripts that examine the UID and GID of my source directory, and when it differs from the unprivileged user my app runs as inside the container (e.g. because it is bind mounted from the host), it modifies the unprivileged user’s UID and GID to match.

Now, Docker for Mac changes the UID and GID of all bind mounted volumes to match the container user. So on startup, all my bind mounted volumes are mounted as root. If I gosu foo, they will all be owned as foo.

And so my entrypoint script which calls stat -c '%u' "${DIR}" to figure out if and how to modify the unprivileged user, thinks the bind mounted volume is owned by root and usermod fails.

I can work around this, but now my script needs to account for two possible runtime behaviours. Docker for Mac, or regular Docker. If the behaviour is going to change to this new method, it should do so across all platforms.

7 Likes

Hi Tai Lee,

We totally agree and have been waiting for a report like yours to pursure a more faithful UID/GID mapping scheme. Currently, Docker for Mac deviates from the Linux-native behavior in a way that is mostly OK but mostly isn’t good enough for file systems (or really any systems software).

A more correct osxfs permissions model is under development and will be released in a beta, soon. We care very deeply about portability between Docker on Linux and Docker for Mac and your feedback is extremely valuable. Thank you.

If it’s possible, could you rename your issue to something like “New file sharing UID/GID permissions break image portability”? This will help users find the specific issue and help us keep track of which file system improvements correspond to which forum threads.

Thank you,

David Sheets

2 Likes

Done. I’ve updated the title.

Can you share details on the new model? Will you store per-file uid/gid in e.g. extended attributes, will the uid/gid for a subtree be specifiable with -v, or something else?

After trying the docker beta for mac that was a major stumbling block for some work I have to do, had to shelve my project. I.e, I’m getting permission issues in /tmp folders inside my container:

  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/tempfile.py", line 275, in gettempdir
    tempdir = _get_default_tempdir()
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/tempfile.py", line 217, in _get_default_tempdir
    ("No usable temporary directory found in %s" % dirlist))
IOError: [Errno 2] No usable temporary directory found in ['/tmp', '/tmp', '/var/tmp', '/usr/tmp', '/var/spool/cwl']
Exception while running job

Also eager to know the details of the new permissions model for osxfs, let me know if I can contribute/help with something in that regard.

Hi Roman,

Do you have any volume mounts that are overriding /tmp in your setup? With beta8, the default tempdirs work fine with Python for me:

$ docker run -it python python
Python 3.5.1 (default, Apr 21 2016, 17:54:56) 
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tempfile
>>> tempfile.gettempdir()
'/tmp'
>>> 

Thanks avsm, in the end it was an application-specific issue, not on the docker side.

Now though, I’m facing a lot of repeated log messages:

2016-04-26 12:30:46,640 Docker[50121]: Docker is not responding: waiting 0.5s

Nothing to do with permissions, so moving on :wink:

Hello!

Just my observations/thoughts/ideas here, I might be wrong and don’t have access to further source code under this closed beta Docker program.

I think I found the culprit: the container dies mid-execution and so does the associated Volume(s):

2016-04-26 13:35:03,106 Docker[50370]: eventCallback container die: bf92eda7f8f08e6f216fcdf2e8814d7620ff9bc4c5de2a37c20e4200202c0a39 (#watchers: 1)
2016-04-26 13:35:03,106 com.docker.osxfs[50367]: Volume.stop bf92eda7f8f08e6f216fcdf2e8814d7620ff9bc4c5de2a37c20e4200202c0a39 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai, /Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam])
2016-04-26 13:35:04,021 Docker[50370]: eventCallback container start: a65cbc67ac66a341424ec8739b772a71c13d90b97d60260c56e5e210a49b6003 (#watchers: 0)
2016-04-26 13:35:10,875 Docker[50370]: eventCallback container die: a65cbc67ac66a341424ec8739b772a71c13d90b97d60260c56e5e210a49b6003 (#watchers: 1)

Earlier, it seems that the “com.docker.drive” (Storage/osxfs layer?) has some trouble keeping up with IO perhaps?:

2016-04-26 13:21:00,000 kernel[0]: process com.docker.drive[50373] caught causing excessive wakeups. Observed wakeups rate (per sec): 1010; Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 117223

There’s also a stacktrace in the logs:

Powerstats for:  com.docker.drive [50373]
Start time:      2016-04-26 13:20:41 +0200
End time:        2016-04-26 13:20:56 +0200
Parent:          com.docker.drive
Responsible:     Docker
Microstackshots: 11 samples (23%)
Primary state:   9 samples Non-Frontmost App, Kernel mode, Thread QoS Legacy
User Activity:   0 samples Idle, 11 samples Active
Power Source:    0 samples on Battery, 11 samples on AC
  10 thread_start + 13 (libsystem_pthread.dylib) [0x7fff9a04f3ed]
    10 _pthread_start + 176 (libsystem_pthread.dylib) [0x7fff9a051fd7]
      10 _pthread_body + 131 (libsystem_pthread.dylib) [0x7fff9a05205a]
        8  vcpu_add + 647 (com.docker.driver.amd64-linux) [0x43e4fd7]
          8  xh_vm_run + 57 (com.docker.driver.amd64-linux) [0x43da879]
            6  vm_run + 523 (com.docker.driver.amd64-linux) [0x43d8a9b]
              6  vmx_fix_cr4 + 2129 (com.docker.driver.amd64-linux) [0x43e0471]
                6  hv_vcpu_run + 16 (Hypervisor) [0x4a760a9]
            1  vm_run + 1670 (com.docker.driver.amd64-linux) [0x43d8f16]
              1  vmm_fetch_instruction + 102 (com.docker.driver.amd64-linux) [0x43de446]
                1  vm_copy_setup + 249 (com.docker.driver.amd64-linux) [0x43d9fc9]
                  1  <User mode>
            1  vm_run + 1046 (com.docker.driver.amd64-linux) [0x43d8ca6]
              1  __psynch_cvwait + 10 (libsystem_kernel.dylib) [0x7fff941d9136]
        1  lpc_pirq_routed + 13107 (com.docker.driver.amd64-linux) [0x43c7503]
          1  <User mode>
        1  callout_system_init + 376 (com.docker.driver.amd64-linux) [0x43dc0b8]
          1  vlapic_icrtmr_write_handler + 364 (com.docker.driver.amd64-linux) [0x43d5c1c]
            1  vlapic_lvt_write_handler + 686 (com.docker.driver.amd64-linux) [0x43d598e]
              1  vcpu_notify_event + 89 (com.docker.driver.amd64-linux) [0x43d87c9]
                1  __psynch_cvsignal + 10 (libsystem_kernel.dylib) [0x7fff941d911e]
  1  .L1166 + 19 (mirage-block.so) [0x4acc847]
    1  lwt_unix_recv_notifications + 40 (mirage-block.so) [0x4b5d558]
      1  __pthread_sigmask + 10 (libsystem_kernel.dylib) [0x7fff941d92b6]

  Binary Images:
           0x4000000 -          0x4801ff3  com.docker.driver.amd64-linux (0) <CDE0FB10-3C88-33F4-8127-2D61845AAC90> /Applications/Docker.app/Contents/MacOS/com.docker.driver.amd64-linux
           0x4a75000 -          0x4a79fff  com.apple.Hypervisor 1.0 (1) <7960B1D3-1EC1-379D-8CF4-F478BA67CD31> /System/Library/Frameworks/Hypervisor.framework/Versions/A/Hypervisor
           0x4a87000 -          0x4bcdfff  mirage-block.so (0) <00D4FC77-75C5-388B-82E3-A1D1FB72A14A> /Applications/Docker.app/Contents/Resources/lib/mirage-block.so
      0x7fff941c3000 -     0x7fff941e0fff  libsystem_kernel.dylib (2782.50.2) <FAA95C7E-5A59-35FD-9ED5-80BFB27BF3C7> /usr/lib/system/libsystem_kernel.dylib
      0x7fff9a04e000 -     0x7fff9a057fff  libsystem_pthread.dylib (105.40.1) <ACE90967-ECD0-3251-AEEB-461E3C6414F7> /usr/lib/system/libsystem_pthread.dylib

After that, docker tries to bring up those containers and Volumes repeatedly when they die:

2016-04-26 13:34:28,197 com.docker.osxfs[50367]: Volume.start 590e3cfb796d28342da5aebe3ab4ba4c5ff295b8013414777adb96dac7c44920 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict, /Users/romanvg/tmp/test_bcbio_cwl/testdata/automated/variant_regions-bam.bed, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai])
2016-04-26 13:34:28,613 Docker[50370]: eventCallback container start: 590e3cfb796d28342da5aebe3ab4ba4c5ff295b8013414777adb96dac7c44920 (#watchers: 0)
2016-04-26 13:34:41,965 Docker[50370]: eventCallback container die: 590e3cfb796d28342da5aebe3ab4ba4c5ff295b8013414777adb96dac7c44920 (#watchers: 1)
2016-04-26 13:34:41,966 com.docker.osxfs[50367]: Volume.stop 590e3cfb796d28342da5aebe3ab4ba4c5ff295b8013414777adb96dac7c44920 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict, /Users/romanvg/tmp/test_bcbio_cwl/testdata/automated/variant_regions-bam.bed, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai])
2016-04-26 13:34:43,029 com.docker.osxfs[50367]: Volume.start e0faa0b18377c86e6f4e7b88a8366f2c5b8700d0dfae462522ea7ef1b9c8cf75 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict, /Users/romanvg/tmp/test_bcbio_cwl/testdata/automated/variant_regions-bam.bed, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai])
2016-04-26 13:34:43,314 Docker[50370]: eventCallback container start: e0faa0b18377c86e6f4e7b88a8366f2c5b8700d0dfae462522ea7ef1b9c8cf75 (#watchers: 0)
2016-04-26 13:34:51,967 Docker[50370]: eventCallback container die: e0faa0b18377c86e6f4e7b88a8366f2c5b8700d0dfae462522ea7ef1b9c8cf75 (#watchers: 1)
2016-04-26 13:34:51,967 com.docker.osxfs[50367]: Volume.stop e0faa0b18377c86e6f4e7b88a8366f2c5b8700d0dfae462522ea7ef1b9c8cf75 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict, /Users/romanvg/tmp/test_bcbio_cwl/testdata/automated/variant_regions-bam.bed, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai])
2016-04-26 13:34:52,548 com.docker.osxfs[50367]: Volume.start bf92eda7f8f08e6f216fcdf2e8814d7620ff9bc4c5de2a37c20e4200202c0a39 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai, /Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam])
2016-04-26 13:34:52,781 Docker[50370]: eventCallback container start: bf92eda7f8f08e6f216fcdf2e8814d7620ff9bc4c5de2a37c20e4200202c0a39 (#watchers: 0)
2016-04-26 13:35:03,106 Docker[50370]: eventCallback container die: bf92eda7f8f08e6f216fcdf2e8814d7620ff9bc4c5de2a37c20e4200202c0a39 (#watchers: 1)
2016-04-26 13:35:03,106 com.docker.osxfs[50367]: Volume.stop bf92eda7f8f08e6f216fcdf2e8814d7620ff9bc4c5de2a37c20e4200202c0a39 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai, /Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam])
2016-04-26 13:35:04,021 Docker[50370]: eventCallback container start: a65cbc67ac66a341424ec8739b772a71c13d90b97d60260c56e5e210a49b6003 (#watchers: 0)
2016-04-26 13:35:10,875 Docker[50370]: eventCallback container die: a65cbc67ac66a341424ec8739b772a71c13d90b97d60260c56e5e210a49b6003 (#watchers: 1)

So, still some rought edges to polish, but good job anyhow I love to be able to run Docker right away without all the VirtualBox/boot2docker/eval docker-machine cruft.

Hi Jaka,

A new ownership model was introduced in Beta 11. Please see the documentation for more details. We’d appreciate your trying it out and letting us know if it satisfies your use case.

Thanks,

David Sheets

Hi David,

Already posted my thoughts on day 1: Osxfs ownership model is bizzare :slight_smile:

What is the reason for not just doing it right and doing away with the current pid/gid magic?

My problem is that unless my build process running as the “build” user chowns after itself on OS X, then the “run” user can modify stuff that it shouldn’t have the permission to.

Jaka

The updated behaviour is still different to existing platforms (Linux and Docker with Docker Machine on OS X or Windows). I still need to alter my entrypoint scripts to account for two possible scenarios:

  1. a bind mounted volume is owned by whatever UID is checking ownership (new Docker for Mac behaviour) – which makes checking the ownership of a folder that could be bind mounted effectively useless

  2. a bind mounted volume is owned by whatever UID it has on the host (current Docker + Docker Machine behaviour) – which is useful to determine whether or not a folder has been bind mounted

The reason I need to do this is, inside my containers I run my app as an unprivileged user that is created in my Dockerfile. It happens to have UID 104, but I don’t particular care what its UID is. When I bind mount a volume that this user needs to write to, normally it will have a UID from the host system

Normally, I can’t chmod the directory but I can change the unprivileged users UID with usermod.

On Docker for Mac, the UID of the directory is undefined. It might be root, if root checks it, or it might be the unprivileged user. My work around is to check (as root) if it is owned by root, and if so, chmod it to the unprivileged user’s UID.

But now I have to support both of these possibilities in my entrypoint script. I don’t care which way it ends up, I just want it to be consistent so I can use the same simple mechanism to deal with permissions of bind mounted directories in my entrypoint script, without having to know if the container will be run by Docker + Docker Machine or Docker for Mac.

1 Like

As far as I can tell, on standard docker, bind mounts seem to take the permission of the folder they are linked to in the container (if they weren’t owned by root on the host machine). Is that what you guys are aiming for? On the docker mac beta 15, I’m seeing every link being owned by root, as far as I can tell. The same problems seem to apply to a docker cp as well.

Standard docker on OSX and Windows will have the UID and GID from the host system for bind mounts. Docker for Mac will have the UID and GID of the user who checks for permissions, but chown will persist, unlike bind mounts on regular Docker.

I don’t care which way it works. I just want it to be consistent.

But being able fake/adopt the UID and GID of a folder inside the image when bind mounting a host folder (instead of the UID and GID of the user who is checking permissions) would be best.

This would allow us to create a system user inside the image to run our apps as, and not have to worry about usermod or chown to fix permissions by changing one (usermod) or the other (chown) to ensure the user owns the bind mounted folder.

Chown on a large bind mounted folder on regular Docker can take a long time to run, even though it is basically a pointless no-op because the owner can’t be changed. And Usermod can implicitly trigger a chown, too.

If I had a user foo with UID 104 inside my image and its home directory was /opt/foo and then I bind mount a directory from the host to /opt/foo, if we could fake/adopt 104 as the UID then I wouldn’t need to do anything special to run the app as foo with the bind mounted folder.

@dsheets Now that Docker for Mac is out of beta, does that mean that Docker is now committed to supporting and maintaining the current permissions model in Docker for Mac, which differs from other platforms (Linux and Docker Toolbox) in a way that can easily break image portability for any container that checks ownership permissions on bind mounted files and directories?

A reliable way to detect if Docker is running on Docker for Mac (and then fix permissions using a chown) is to test if the volume is mounted with the type fuse.osxfs.

For instance, in the API Platform’s image we use:

if [[ -n $(mount -t fuse.osxfs | grep /app) ]]; then
    chown -R www-data:www-data /app 2> /dev/null
fi

@dunglas this is still a workaround for the fact that “build once run anywhere” is no longer true.