Docker Volumes flapping between `start` and `stop` states while the container is running

Expected behavior

Volumes defined at the cmdline with -v should live during all run/session. The normal behavior and operation from a regular Ubuntu installation is to have a successful execution (exit code 0) from the pipeline detailed below in “Steps to reproduce this behavior”.

Actual behavior

Volumes flapping between start and stop states leave the container without access to those shared resources:

2016-05-02 00:38:00,647 Docker[2142]: Write offset=0 data=[d50ff7181f790741700feef1cff57e25554c43553b10a30d96a18e985e1b142e:/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam:/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai] to file
2016-05-02 00:38:00,647 Docker[2142]: Volume.start d50ff7181f790741700feef1cff57e25554c43553b10a30d96a18e985e1b142e (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam, /Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai])
2016-05-02 00:38:00,648 Docker[2142]: Created instance d50ff7181f790741700feef1cff57e25554c43553b10a30d96a18e985e1b142e
2016-05-02 00:38:02,326 Docker[2142]: Volume.stop d50ff7181f790741700feef1cff57e25554c43553b10a30d96a18e985e1b142e (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam, /Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai])

Information

mba-3:test_bcbio_cwl romanvg$ pinata diagnose -u
OS X: version 10.10.5 (build: 14F1713)
Docker.app: version v1.11.0-beta9
Running diagnostic tests:
[OK] docker-cli
[OK] Moby booted
[OK] driver.amd64-linux
[OK] vmnetd
[OK] osxfs
[OK] db
[OK] slirp
[OK] menubar
[OK] environment
[OK] Docker
[OK] VT-x
Docker logs are being collected into /tmp/20160502-005113.tar.gz
Most specific failure is: No error was detected
Your unique id is: 08E838E6-D885-4699-8399-033F430D4849
Please quote this in all correspondence.

Steps to reproduce the behavior

  1. curl https://repo.continuum.io/miniconda/Miniconda2-latest-MacOSX-x86_64.sh | bash
  2. conda install -c bioconda cwltool
  3. wget https://s3.amazonaws.com/bcbio/cwl/test_bcbio_cwl.tar.gz && tar xvfz test_bcbio_cwl.tar.gz && cd test_bcbio_cwl
  4. sed -ie ‘s/docker_vm_uid() or os.geteuid()/0/g’ $HOME/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/job.py
  5. chmod +x *.sh && ./run_cwltool.sh

I’ve run the same steps and the error is repeated for me :thumbsup:

I’m on a MacBook Pro, running OS X 10.11.3 and the Docker 1.11.0-beta9

I can reproduce the exact same issue on beta 11:

Version 1.11.1-beta10 (build: 6662)
b4d942b304b7d7d0ee48d96e3b424ccfbcceca89

@brainstorm Thanks for the report. I’ve opened issue (#3096) on our internal tracker and one of engineers will look in to this soon. Thanks for trying Docker for Mac!

Great, thanks! Let me know if you need some more info/feedback regarding the bug, happy to help!

After several beta releases, it seems that the execution goes a bit further, until this step and related Docker logs (scroll past the application backtrace):

[job merge_split_alignments] /var/folders/rb/k3q8j6j53_d5fcns670ntm380000gn/T/tmpKt_NlM$ docker run -i --volume=/var/folders/rb/k3q8j6j53_d5fcns670ntm380000gn/T/tmpKt_NlM:/var/spool/cwl:rw --volume=/var/folders/rb/k3q8j6j53_d5fcns670ntm380000gn/T/tmppsSS6i:/tmp:rw --workdir=/var/spool/cwl --read-only=true --user=0 --rm --env=TMPDIR=/tmp --env=MPLCONFIGDIR=. bcbio/bcbio bcbio_nextgen.py runfn merge_split_alignments cwl 'sentinel-runtime={"cores": 4, "outdir": "/var/spool/cwl", "outdirSize": 1024, "ram": 4096, "tmpdir": "/tmp", "tmpdirSize": 9}' sentinel-parallel=single-merge 'sentinel-outputs=["align_bam","work_bam-plus__disc","work_bam-plus__sr","hla__fastq"]' work_bam= align_bam= work_bam-plus__disc= work_bam-plus__sr= hla__fastq= description=Test1
Error while running job: Error validating output record, could not validate field `align_bam` because
  `[]`
   is not a dict

could not validate field `work_bam-plus__disc` because
  `[]`
   is not a dict

could not validate field `work_bam-plus__sr` because
  `[]`
   is not a dict

could not validate field `hla__fastq` because
  the value `[]` is not a valid type in the union, expected one of:
  - File, but
     `[]`
     is not a dict
  - null, but
     the value `[]` is not null

 in {
    "work_bam-plus__disc": [],
    "align_bam": [],
    "hla__fastq": [],
    "work_bam-plus__sr": []
}
[job merge_split_alignments] completed permanentFail
[job merge_split_alignments] {}
Output is missing expected field file:///Users/romanvg/tmp/test_bcbio_cwl/run_info-cwl-workflow/wf-alignment.cwl#merge_split_alignments.align_bam
Output is missing expected field file:///Users/romanvg/tmp/test_bcbio_cwl/run_info-cwl-workflow/wf-alignment.cwl#merge_split_alignments.work_bam-plus__disc
Output is missing expected field file:///Users/romanvg/tmp/test_bcbio_cwl/run_info-cwl-workflow/wf-alignment.cwl#merge_split_alignments.work_bam-plus__sr
Output is missing expected field file:///Users/romanvg/tmp/test_bcbio_cwl/run_info-cwl-workflow/wf-alignment.cwl#merge_split_alignments.hla__fastq
[step merge_split_alignments] produced output {}
[step merge_split_alignments] completion status is permanentFail
[job merge_split_alignments] Removing temporary directory /var/folders/rb/k3q8j6j53_d5fcns670ntm380000gn/T/tmppsSS6i
[job merge_split_alignments] Removing empty output directory /var/folders/rb/k3q8j6j53_d5fcns670ntm380000gn/T/tmpKt_NlM
Exception on step 'alignment'
Workflow error, try again with --debug for more information:
  Output for workflow not available
Traceback (most recent call last):
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/main.py", line 612, in main
    eval_timeout=args.eval_timeout
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/main.py", line 179, in single_job_executor
    for r in jobiter:
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 397, in job
    for w in wj.job(builder.job, basedir, output_callback, **kwargs):
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 312, in job
    for newjob in step.iterable:
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 268, in try_make_job
    for j in jobs:
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 571, in dotproduct_scatter
    for j in process.job(jo, basedir, functools.partial(rc.receive_scatter_output, n), **kwargs):
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 151, in job
    for j in self.step.job(joborder, basedir, output_callback, **kwargs):
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 508, in job
    **kwargs):
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 397, in job
    for w in wj.job(builder.job, basedir, output_callback, **kwargs):
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 331, in job
    raise WorkflowException("Output for workflow not available")
WorkflowException: Output for workflow not available

Here are the docker console logs:

2016-05-30 15:38:37,791 Docker[71437]: Volume.start 1ec6ff2513d2ea06450b617fdb45f84078253d5e2967312b64dad6da47b33e63 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict, /Users/romanvg/tmp/test_bcbio_cwl/testdata/automated/variant_regions-bam.bed, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai])

2016-05-30 15:38:37,794 Docker[71437]: Created instance 1ec6ff2513d2ea06450b617fdb45f84078253d5e2967312b64dad6da47b33e63

2016-05-30 15:38:46,385 Docker[71437]: Volume.stop 1ec6ff2513d2ea06450b617fdb45f84078253d5e2967312b64dad6da47b33e63 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict, /Users/romanvg/tmp/test_bcbio_cwl/testdata/automated/variant_regions-bam.bed, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai])

2016-05-30 15:38:46,958 Docker[71437]: Creating resource Entry(6f5b41850f0abc1ce14598249bfc956a8a5f22503838d85fd46ddd1dcb701151)

2016-05-30 15:38:46,960 Docker[71437]: Write offset=0 data=[6f5b41850f0abc1ce14598249bfc956a8a5f22503838d85fd46ddd1dcb701151:/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai:/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam] to file

2016-05-30 15:38:46,960 Docker[71437]: Volume.start 6f5b41850f0abc1ce14598249bfc956a8a5f22503838d85fd46ddd1dcb701151 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai, /Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam])

2016-05-30 15:38:46,961 Docker[71437]: Created instance 6f5b41850f0abc1ce14598249bfc956a8a5f22503838d85fd46ddd1dcb701151

2016-05-30 15:38:54,987 Docker[71437]: Volume.stop 6f5b41850f0abc1ce14598249bfc956a8a5f22503838d85fd46ddd1dcb701151 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai, /Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam])

Dave, please, have a look at the (new) steps, I forgot to include the “sed” step. Now the Docker team should be able to reproduce it correctly as it happens (fails) in my machine.

Sorry for the inconvenience.

Hi Roman,

I cannot run reproduction step 4:

sed -ie 's/docker_vm_uid() or os.geteuid()/0/g' $HOME/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/job.py

as I have no ~/.anaconda directory.

Could you provide a simpler reproduction example?

It’s also possible that since you reported this the underlying issue has been fixed in a subsequent beta release. If you could check again and let us know if the issue appears to be fixed, that would be great.

Thanks,

David

Yes indeed it was resolved somewhere between beta 9 and the current one and I managed to run “the payload application”, see here:

1 Like