Cloudstor plugin not enabled

Expected behavior

Im expecting the cloudstor plugin to be enabled after installation of Docker-for-aazure template (via https://docs.docker.com/docker-for-azure)

Actual behavior

cloudstor plugin shows disabled
swarm-manager000000:~$ docker plugin ls
ID NAME DESCRIPTION ENABLED
8daecf04281b cloudstor:azure cloud storage plugin for Docker false

This is the same across all nodes.

swarm-manager000000:~$ docker plugin enable 8daecf04281b
Error response from daemon: dial unix /run/docker/plugins/8daecf04281b580493e076b105b6ac020c9bd13c837692ed467997c4dd62bde5/cloudstor.sock: connect: no such file or directory

Additional Information

swarm-manager000000:~$ docker info
Containers: 7
Running: 5
Paused: 0
Stopped: 2
Images: 7
Server Version: 17.06.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: syslog
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: uw45ozquy6tmdk3ikf0321rff
Is Manager: true
ClusterID: vtu4g6gcbn99y1rb5v0hqd1wr
Managers: 3
Nodes: 6
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Root Rotation In Progress: false
Node Address: 10.20.1.10
Manager Addresses:
10.20.1.10:2377
10.20.1.11:2377
10.20.1.12:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.31-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 6.794GiB
Name: swarm-manager000000
ID: LMFO:YI4Y:T6SR:FFIL:V4S3:HW7Q:FCSG:3SHK:OWZX:I5QX:YJTB:NOJP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 94
Goroutines: 166
System Time: 2017-07-16T23:20:23.6432704Z
EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

swarm-manager000000:~$ docker plugin inspect 8daecf04281b
[
{
“Config”: {
“Args”: {
“Description”: “”,
“Name”: “”,
“Settable”: null,
“Value”: null
},
“Description”: “cloud storage plugin for Docker”,
“Documentation”: “https://docs.docker.com/engine/extend/plugins/”,
“Entrypoint”: [
"/cloudstor"
],
“Env”: [
{
“Description”: “”,
“Name”: “CLOUD_PLATFORM”,
“Settable”: [
“value”
],
“Value”: “”
},
{
“Description”: “”,
“Name”: “AZURE_STORAGE_ACCOUNT”,
“Settable”: [
“value”
],
“Value”: “”
},
{
“Description”: “”,
“Name”: “AZURE_STORAGE_ACCOUNT_KEY”,
“Settable”: [
“value”
],
“Value”: “”
},
{
“Description”: “”,
“Name”: “AZURE_STORAGE_ENDPOINT”,
“Settable”: [
“value”
],
“Value”: “”
},
{
“Description”: “”,
“Name”: “EFS_ID_REGULAR”,
“Settable”: [
“value”
],
“Value”: “”
},
{
“Description”: “”,
“Name”: “EFS_ID_MAXIO”,
“Settable”: [
“value”
],
“Value”: “”
},
{
“Description”: “”,
“Name”: “EFS_SUPPORTED”,
“Settable”: [
“value”
],
“Value”: “”
},
{
“Description”: “”,
“Name”: “AWS_REGION”,
“Settable”: [
“value”
],
“Value”: “”
},
{
“Description”: “”,
“Name”: “AWS_STACK_ID”,
“Settable”: [
“value”
],
“Value”: “”
},
{
“Description”: “”,
“Name”: “DEBUG”,
“Settable”: [
“value”
],
“Value”: “”
}
],
“Interface”: {
“Socket”: “cloudstor.sock”,
“Types”: [
“docker.volumedriver/1.0”
]
},
“IpcHost”: false,
“Linux”: {
“AllowAllDevices”: true,
“Capabilities”: [
“CAP_DAC_OVERRIDE”,
“CAP_DAC_READ_SEARCH”,
“CAP_SYS_ADMIN”
],
“Devices”: null
},
“Mounts”: [
{
“Description”: “”,
“Destination”: “/dev”,
“Name”: “”,
“Options”: [
“rbind”
],
“Settable”: null,
“Source”: “/dev”,
“Type”: “bind”
}
],
“Network”: {
“Type”: “host”
},
“PidHost”: false,
“PropagatedMount”: “/mnt”,
“User”: {},
“WorkDir”: “”,
“rootfs”: {
“diff_ids”: [
“sha256:bb3450087c1e1913bc9f020ff76c56a66d39f2e4d812759e344f452d26a0b6e8”
],
“type”: “layers”
}
},
“Enabled”: false,
“Id”: “8daecf04281b580493e076b105b6ac020c9bd13c837692ed467997c4dd62bde5”,
“Name”: “cloudstor:azure”,
“PluginReference”: “docker.io/docker4x/cloudstor:17.06.0-ce-azure1”,
“Settings”: {
“Args”: [],
“Devices”: [],
“Env”: [
“CLOUD_PLATFORM=AZURE”,
“AZURE_STORAGE_ACCOUNT=XXXXXX-EDITED-XXXXX”,
“AZURE_STORAGE_ACCOUNT_KEY=XXXXXX-EDITED-XXXXX”,
“AZURE_STORAGE_ENDPOINT=”,
“EFS_ID_REGULAR=”,
“EFS_ID_MAXIO=”,
“EFS_SUPPORTED=”,
“AWS_REGION=”,
“AWS_STACK_ID=”,
“DEBUG=1”
],
“Mounts”: [
{
“Description”: “”,
“Destination”: “/dev”,
“Name”: “”,
“Options”: [
“rbind”
],
“Settable”: null,
“Source”: “/dev”,
“Type”: “bind”
}
]
}
}
]

Steps to reproduce the behavior

  1. Install docker from Template following instructions here https://docs.docker.com/docker-for-azure

@twginfrastructure Is there any chance you can execute docker-diagnose and post the docker-diagnose ID from your repro environment please?

Further, are you seeing this consistently if you deploy multiple times? Or is this a one off?

Can you also report the region you are trying?

Hi

Its a repeatable issue - i.e. happening all of the time, and we are in Australia South east.

here is docker diagnose:
swarm-manager000000:~$ docker-diagnose
OK hostname=swarm-manager000000 session=1500419283-YxmYUjVuYDCR04tZMQgjTIe6N8LPx6HY
OK hostname=swarm-manager000001 session=1500419283-YxmYUjVuYDCR04tZMQgjTIe6N8LPx6HY
OK hostname=swarm-manager000002 session=1500419283-YxmYUjVuYDCR04tZMQgjTIe6N8LPx6HY
OK hostname=swarm-worker000000 session=1500419283-YxmYUjVuYDCR04tZMQgjTIe6N8LPx6HY
OK hostname=swarm-worker000001 session=1500419283-YxmYUjVuYDCR04tZMQgjTIe6N8LPx6HY
OK hostname=swarm-worker000002 session=1500419283-YxmYUjVuYDCR04tZMQgjTIe6N8LPx6HY
Done requesting diagnostics.
Your diagnostics session ID is 1500419283-YxmYUjVuYDCR04tZMQgjTIe6N8LPx6HY
Please provide this session ID to the maintainer debugging your issue.

@twginfrastructure I went through your logs and at least some of the nodes and storage seem to be in Australia East. You earlier mentioned the Azure region you are using is Australia South East. Can you please confirm that the resource group that you deployed in is indeed in Australia East and don’t have the VMs spread out in some fashion between the regions? Specifically, can you confirm that swarm-manager000000 is in Australia East?

One of the reasons cloudstor may fail is if the storage account being used for the backing File Storage happens to be in a different region than the nodes. If that is the case, the SMB mount command will fail.

Thanks for the info.

We reviewed the configuration, and it led us to discover that the DNS servers were changed on the VNET, which resulted in cloudstor not working. Everything else in the swarm seemed OK, apart from the storage driver.
Thanks for helping us track it down.

P.S. Is there detailed documentation about how the solution all pieces together? I notice that there are 4 containers that appear to run helper services, for what looks like logging and service discovery - how does that work?

Regards

@twginfrastructure Thanks for pointing out that DNS servers were changed. Cloudstor tries to mount the FileStorage backed volumes over SMB using their names (like pw6lsvmu62dr4docker.file.core.windows.net) and that was failing since DNS was not working correctly. Since the name was resolving fine from my test environments, I did not dig any further but great that you discovered it.

We do not have public facing documentation about the internal architecture of the service containers. At a high level:

  1. l4controller keeps the ALB up to date with ports exposed by docker services.
  2. guide runs services to make sure the docker daemons on the nodes are responsive and the nodes are replaced by fresh one if the docker daemon becomes unresponsive. It also helps with upgrades.
  3. meta serves out swarm tokens in a secure way to the other nodes.
  4. logger sets up a syslog listener that receives logs from all containers (using default docker engine logging) and forwards them to the storage account whose name ends with logs. The logs are written to files in FileStorage as well and is sensitive to DNS in the same way Cloudstor is.

How did you manage to enable cloudstor again? I get this:

$ docker plugin  enable 12e884f385c5
Error response from daemon: dial unix /run/docker/plugins/12e884f385c5f788a72847c6b822bc582ad91b5fd4e73634d5b90150e6c303f3/cloudstor.sock: connect: no such file or directory