Docker Volumes flapping between `start` and `stop` states while the container is running

brainstorm · May 1, 2016, 10:59pm

Expected behavior

Volumes defined at the cmdline with -v should live during all run/session. The normal behavior and operation from a regular Ubuntu installation is to have a successful execution (exit code 0) from the pipeline detailed below in “Steps to reproduce this behavior”.

Actual behavior

Volumes flapping between start and stop states leave the container without access to those shared resources:

2016-05-02 00:38:00,647 Docker[2142]: Write offset=0 data=[d50ff7181f790741700feef1cff57e25554c43553b10a30d96a18e985e1b142e:/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam:/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai] to file
2016-05-02 00:38:00,647 Docker[2142]: Volume.start d50ff7181f790741700feef1cff57e25554c43553b10a30d96a18e985e1b142e (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam, /Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai])
2016-05-02 00:38:00,648 Docker[2142]: Created instance d50ff7181f790741700feef1cff57e25554c43553b10a30d96a18e985e1b142e
2016-05-02 00:38:02,326 Docker[2142]: Volume.stop d50ff7181f790741700feef1cff57e25554c43553b10a30d96a18e985e1b142e (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam, /Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai])

Information

mba-3:test_bcbio_cwl romanvg$ pinata diagnose -u
OS X: version 10.10.5 (build: 14F1713)
Docker.app: version v1.11.0-beta9
Running diagnostic tests:
[OK] docker-cli
[OK] Moby booted
[OK] driver.amd64-linux
[OK] vmnetd
[OK] osxfs
[OK] db
[OK] slirp
[OK] menubar
[OK] environment
[OK] Docker
[OK] VT-x
Docker logs are being collected into /tmp/20160502-005113.tar.gz
Most specific failure is: No error was detected
Your unique id is: 08E838E6-D885-4699-8399-033F430D4849
Please quote this in all correspondence.

Steps to reproduce the behavior

curl https://repo.continuum.io/miniconda/Miniconda2-latest-MacOSX-x86_64.sh | bash
conda install -c bioconda cwltool
wget https://s3.amazonaws.com/bcbio/cwl/test_bcbio_cwl.tar.gz && tar xvfz test_bcbio_cwl.tar.gz && cd test_bcbio_cwl
sed -ie ‘s/docker_vm_uid() or os.geteuid()/0/g’ $HOME/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/job.py
chmod +x *.sh && ./run_cwltool.sh

robinandeer · May 3, 2016, 6:45am

I’ve run the same steps and the error is repeated for me

I’m on a MacBook Pro, running OS X 10.11.3 and the Docker 1.11.0-beta9

brainstorm · May 6, 2016, 3:54pm

I can reproduce the exact same issue on beta 11:

Version 1.11.1-beta10 (build: 6662)
b4d942b304b7d7d0ee48d96e3b424ccfbcceca89

davetucker · May 11, 2016, 1:41pm

@brainstorm Thanks for the report. I’ve opened issue (#3096) on our internal tracker and one of engineers will look in to this soon. Thanks for trying Docker for Mac!

brainstorm · May 18, 2016, 5:03pm

Great, thanks! Let me know if you need some more info/feedback regarding the bug, happy to help!

brainstorm · May 30, 2016, 1:44pm

After several beta releases, it seems that the execution goes a bit further, until this step and related Docker logs (scroll past the application backtrace):

[job merge_split_alignments] /var/folders/rb/k3q8j6j53_d5fcns670ntm380000gn/T/tmpKt_NlM$ docker run -i --volume=/var/folders/rb/k3q8j6j53_d5fcns670ntm380000gn/T/tmpKt_NlM:/var/spool/cwl:rw --volume=/var/folders/rb/k3q8j6j53_d5fcns670ntm380000gn/T/tmppsSS6i:/tmp:rw --workdir=/var/spool/cwl --read-only=true --user=0 --rm --env=TMPDIR=/tmp --env=MPLCONFIGDIR=. bcbio/bcbio bcbio_nextgen.py runfn merge_split_alignments cwl 'sentinel-runtime={"cores": 4, "outdir": "/var/spool/cwl", "outdirSize": 1024, "ram": 4096, "tmpdir": "/tmp", "tmpdirSize": 9}' sentinel-parallel=single-merge 'sentinel-outputs=["align_bam","work_bam-plus__disc","work_bam-plus__sr","hla__fastq"]' work_bam= align_bam= work_bam-plus__disc= work_bam-plus__sr= hla__fastq= description=Test1
Error while running job: Error validating output record, could not validate field `align_bam` because
  `[]`
   is not a dict

could not validate field `work_bam-plus__disc` because
  `[]`
   is not a dict

could not validate field `work_bam-plus__sr` because
  `[]`
   is not a dict

could not validate field `hla__fastq` because
  the value `[]` is not a valid type in the union, expected one of:
  - File, but
     `[]`
     is not a dict
  - null, but
     the value `[]` is not null

 in {
    "work_bam-plus__disc": [],
    "align_bam": [],
    "hla__fastq": [],
    "work_bam-plus__sr": []
}
[job merge_split_alignments] completed permanentFail
[job merge_split_alignments] {}
Output is missing expected field file:///Users/romanvg/tmp/test_bcbio_cwl/run_info-cwl-workflow/wf-alignment.cwl#merge_split_alignments.align_bam
Output is missing expected field file:///Users/romanvg/tmp/test_bcbio_cwl/run_info-cwl-workflow/wf-alignment.cwl#merge_split_alignments.work_bam-plus__disc
Output is missing expected field file:///Users/romanvg/tmp/test_bcbio_cwl/run_info-cwl-workflow/wf-alignment.cwl#merge_split_alignments.work_bam-plus__sr
Output is missing expected field file:///Users/romanvg/tmp/test_bcbio_cwl/run_info-cwl-workflow/wf-alignment.cwl#merge_split_alignments.hla__fastq
[step merge_split_alignments] produced output {}
[step merge_split_alignments] completion status is permanentFail
[job merge_split_alignments] Removing temporary directory /var/folders/rb/k3q8j6j53_d5fcns670ntm380000gn/T/tmppsSS6i
[job merge_split_alignments] Removing empty output directory /var/folders/rb/k3q8j6j53_d5fcns670ntm380000gn/T/tmpKt_NlM
Exception on step 'alignment'
Workflow error, try again with --debug for more information:
  Output for workflow not available
Traceback (most recent call last):
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/main.py", line 612, in main
    eval_timeout=args.eval_timeout
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/main.py", line 179, in single_job_executor
    for r in jobiter:
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 397, in job
    for w in wj.job(builder.job, basedir, output_callback, **kwargs):
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 312, in job
    for newjob in step.iterable:
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 268, in try_make_job
    for j in jobs:
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 571, in dotproduct_scatter
    for j in process.job(jo, basedir, functools.partial(rc.receive_scatter_output, n), **kwargs):
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 151, in job
    for j in self.step.job(joborder, basedir, output_callback, **kwargs):
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 508, in job
    **kwargs):
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 397, in job
    for w in wj.job(builder.job, basedir, output_callback, **kwargs):
  File "/Users/romanvg/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/workflow.py", line 331, in job
    raise WorkflowException("Output for workflow not available")
WorkflowException: Output for workflow not available

Here are the docker console logs:

2016-05-30 15:38:37,791 Docker[71437]: Volume.start 1ec6ff2513d2ea06450b617fdb45f84078253d5e2967312b64dad6da47b33e63 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict, /Users/romanvg/tmp/test_bcbio_cwl/testdata/automated/variant_regions-bam.bed, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai])

2016-05-30 15:38:37,794 Docker[71437]: Created instance 1ec6ff2513d2ea06450b617fdb45f84078253d5e2967312b64dad6da47b33e63

2016-05-30 15:38:46,385 Docker[71437]: Volume.stop 1ec6ff2513d2ea06450b617fdb45f84078253d5e2967312b64dad6da47b33e63 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict, /Users/romanvg/tmp/test_bcbio_cwl/testdata/automated/variant_regions-bam.bed, /Users/romanvg/tmp/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai])

2016-05-30 15:38:46,958 Docker[71437]: Creating resource Entry(6f5b41850f0abc1ce14598249bfc956a8a5f22503838d85fd46ddd1dcb701151)

2016-05-30 15:38:46,960 Docker[71437]: Write offset=0 data=[6f5b41850f0abc1ce14598249bfc956a8a5f22503838d85fd46ddd1dcb701151:/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai:/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam] to file

2016-05-30 15:38:46,960 Docker[71437]: Volume.start 6f5b41850f0abc1ce14598249bfc956a8a5f22503838d85fd46ddd1dcb701151 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai, /Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam])

2016-05-30 15:38:46,961 Docker[71437]: Created instance 6f5b41850f0abc1ce14598249bfc956a8a5f22503838d85fd46ddd1dcb701151

2016-05-30 15:38:54,987 Docker[71437]: Volume.stop 6f5b41850f0abc1ce14598249bfc956a8a5f22503838d85fd46ddd1dcb701151 (paths = [/Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam.bai, /Users/romanvg/tmp/test_bcbio_cwl/testdata/100326_FC6107FAAXX/7_100326_FC6107FAAXX.bam])

brainstorm · May 30, 2016, 2:11pm

Dave, please, have a look at the (new) steps, I forgot to include the “sed” step. Now the Docker team should be able to reproduce it correctly as it happens (fails) in my machine.

Sorry for the inconvenience.

dsheets · August 16, 2016, 3:04pm

Hi Roman,

I cannot run reproduction step 4:

sed -ie 's/docker_vm_uid() or os.geteuid()/0/g' $HOME/.anaconda/envs/cwltool/lib/python2.7/site-packages/cwltool/job.py

as I have no ~/.anaconda directory.

Could you provide a simpler reproduction example?

It’s also possible that since you reported this the underlying issue has been fixed in a subsequent beta release. If you could check again and let us know if the issue appears to be fixed, that would be great.

Thanks,

David

brainstorm · August 16, 2016, 3:07pm

Yes indeed it was resolved somewhere between beta 9 and the current one and I managed to run “the payload application”, see here:

Topic		Replies	Views
Docker Desktop Containers Stop Immediately After Starting Docker Desktop docker-desktop-for-w	2	811	July 10, 2024
End of Stream encountered before parsing was completed Docker Desktop docker , beta , windows	2	2821	May 26, 2016
Docker containers not running? How to fix it? Docker Desktop docker , linux	3	29432	December 10, 2023
Docker ps timeout: beta for mac Docker Desktop	2	1444	July 26, 2016
Pull works, run does not Docker Desktop docker , beta , windows	3	2380	May 11, 2016

Docker Volumes flapping between `start` and `stop` states while the container is running

Expected behavior

Actual behavior

Information

Steps to reproduce the behavior

Related topics