Bug in Cloudstor hangs any container that tries to mount a cloudstor volume

Expected behavior

Spinning up a brand new stack, I should be able to create a cloudstor volume, then run a container that maps that volume.

Actual behavior

Any container that maps a cloudstor volume hangs indefinitely.

Additional Information

The container is completely unresponsive. I can’t kill or remove it. I can’t kill the pid of the main thread trying to access the volume. I can (from another node) mount the volume and see that files were created, but they do not contain anything. I’ve tried a brand new stack using encrypted EFS, and unencrypted EFS.

Steps to reproduce the behavior

  • Start a new stack using the latest template (“Docker CE for AWS 18.09.2-ce (18.09.2-ce-aws1)”)
  • Create a cloudstor EFS volume:
docker volume create -d "cloudstor:aws" --opt backing=shared portainer
  • Start a service that mounts that volume:
docker service create \
  --name portainer \
  --replicas 1 \
  --mount type=volume,volume-driver=cloudstor:aws,source=portainer,destination=/data \
  portainer/portainer

Debug information

  • I’ve been trying to document this on this GitHub issue, but replicating here for completeness (and because it looks like that issue is dead).
  • As the host continues to try to access the EFS share, dmesg shows the following stack trace:
Mar 31 01:47:13 moby kernel: INFO: task portainer:4823 blocked for more than 120 seconds.
Mar 31 01:47:13 moby kernel:       Not tainted 4.9.114-moby #1
Mar 31 01:47:13 moby kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 31 01:47:13 moby kernel: portainer       D    0  4823   4786 0x00000100
Mar 31 01:47:13 moby kernel:  00000000000190c0 0000000000000000 ffffa02c63a637c0 ffffa02c734821c0
Mar 31 01:47:13 moby kernel:  ffffa02c63a80d00 ffffa02c762190c0 ffffffff8a83caf6 0000000000000002
Mar 31 01:47:13 moby kernel:  ffffa02c63a80d00 ffffc1f4412dfce0 7fffffffffffffff 0000000000000002
Mar 31 01:47:13 moby kernel: Call Trace:
Mar 31 01:47:13 moby kernel:  [<ffffffff8a83caf6>] ? __schedule+0x35f/0x43d
Mar 31 01:47:13 moby kernel:  [<ffffffff8a83cf26>] ? bit_wait+0x2a/0x2a
Mar 31 01:47:13 moby kernel:  [<ffffffff8a83cc52>] ? schedule+0x7e/0x87
Mar 31 01:47:13 moby kernel:  [<ffffffff8a83e8de>] ? schedule_timeout+0x43/0x101
Mar 31 01:47:13 moby kernel:  [<ffffffff8a019808>] ? xen_clocksource_read+0x11/0x12
Mar 31 01:47:13 moby kernel:  [<ffffffff8a12e281>] ? timekeeping_get_ns+0x19/0x2c
Mar 31 01:47:13 moby kernel:  [<ffffffff8a83c739>] ? io_schedule_timeout+0x99/0xf7
Mar 31 01:47:13 moby kernel:  [<ffffffff8a83c739>] ? io_schedule_timeout+0x99/0xf7
Mar 31 01:47:13 moby kernel:  [<ffffffff8a83cf3d>] ? bit_wait_io+0x17/0x34
Mar 31 01:47:13 moby kernel:  [<ffffffff8a83d009>] ? __wait_on_bit+0x48/0x76
Mar 31 01:47:13 moby kernel:  [<ffffffff8a19e758>] ? wait_on_page_bit+0x7c/0x96
Mar 31 01:47:13 moby kernel:  [<ffffffff8a10f99e>] ? autoremove_wake_function+0x35/0x35
Mar 31 01:47:13 moby kernel:  [<ffffffff8a19e842>] ? __filemap_fdatawait_range+0xd0/0x12b
Mar 31 01:47:13 moby kernel:  [<ffffffff8a1a058d>] ? __filemap_fdatawrite_range+0x9d/0xbb
Mar 31 01:47:13 moby kernel:  [<ffffffff8a19e8ac>] ? filemap_fdatawait_range+0xf/0x23
Mar 31 01:47:13 moby kernel:  [<ffffffff8a1a060c>] ? filemap_write_and_wait_range+0x3a/0x4f
Mar 31 01:47:13 moby kernel:  [<ffffffff8a2bcf98>] ? nfs_file_fsync+0x54/0x187
Mar 31 01:47:13 moby kernel:  [<ffffffff8a2221fd>] ? do_fsync+0x2e/0x47
Mar 31 01:47:13 moby kernel:  [<ffffffff8a222425>] ? SyS_fdatasync+0xf/0x12
Mar 31 01:47:13 moby kernel:  [<ffffffff8a0033b7>] ? do_syscall_64+0x69/0x79
Mar 31 01:47:13 moby kernel:  [<ffffffff8a83f64e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

Just checking in… two and a half months later, this is still a problem. I can’t deploy a new stack that uses EFS volumes.