WORKER_TOKEN=Access Denied

Expected behavior

Worker node connects to the manager and joins the swarm.

Actual behavior

Worker fails to join with error message WORKER_TOKEN=Access Denied

Log messages are:

Get Leader IP from Azure Table
 It's a worker Node, run setup
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100    14  100    14    0     0     61      0 --:--:-- --:--:-- --:--:--    61
WORKER_TOKEN=Access Denied
 Setup Worker
   LEADER_IP=10.1.0.4
"docker swarm join" requires exactly 1 argument(s).
See 'docker swarm join --help'.

Usage:  docker swarm join [OPTIONS] HOST:PORT

Join a swarm as a node and/or manager
SWARM_ID: n/a
NODE: 
Can't connect to leader, sleep and try again

Additional Information

xyz-develop-manager-vmss000000:

~$ docker info
Containers: 6
 Running: 5
 Paused: 0
 Stopped: 1
Images: 6
Server Version: 17.06.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: syslog
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: szwmper9tfhonkqcw1nrjf15i
 Is Manager: true
 ClusterID: wy1u8l0orergabpxzpjpqxx7x
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Root Rotation In Progress: false
 Node Address: 10.1.0.4
 Manager Addresses:
  10.1.0.4:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.36-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 6.794GiB
Name: xyz-develop-manager-vmss000000
ID: VOMC:VSLE:MU6H:7KIQ:44K5:3EJG:LFY2:OFAU:UYMS:DE37:N7ED:MLTQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 145
 Goroutines: 239
 System Time: 2017-08-23T23:13:24.762344033Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

xyz-develop-worker-vmss000000:

~$ docker info
Containers: 4
 Running: 3
 Paused: 0
 Stopped: 1
Images: 4
Server Version: 17.06.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: syslog
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.36-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 6.794GiB
Name: xyz-develop-worker-vmss000000
ID: MDV7:MHTT:ZXCJ:R3HG:PNOW:UXJL:LMEZ:6OH5:AWQ4:EENB:FD32:U6N3
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 33
 Goroutines: 37
 System Time: 2017-08-23T23:16:04.355416807Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Steps to reproduce the behavior

N/A

Forgot mention:

  • I modified the template to support Azure Managed Disks
  • I modified the template to use a different subnet

@ryandanthony … In the leader node, can you post the output from docker node ls please? Did you perhaps enable over-provisioning of nodes as part of your modifications to the template?

xyz-develop-manager-vmss000000:~$ docker node ls
ID                            HOSTNAME                          STATUS              AVAILABILITY        MANAGER STATUS
szwmper9tfhonkqcw1nrjf15i *   xyz-develop-manager-vmss000000   Ready               Active              Leader

On both worker and manager vmss:

"overprovision": false,

Another note, on a prior build of this (i.e. i wiped it a few hours ago) I was able to manually execute:

docker swarm join --token XXXXX 10.1.0.4:2377

on the worker instance successfully.

Updating with what I have in my research.
On the worker node, it is doing:


get_worker_token()
{
    if [ -n "$LEADER_IP" ]; then
        export WORKER_TOKEN=$(curl http://$LEADER_IP:9024/token/worker/)
        echo "WORKER_TOKEN=$WORKER_TOKEN"
    else
        echo "WORKER_TOKEN can't be found yet. LEADER_IP isn't set yet."
    fi
}

What is returned from the curl call is “Access Denied”

It looks like on the manager, port 9024 is being managed by container/image:

 docker4x/meta-azure:17.06.0-ce-azure2 

Any ideas why it would be returning access denied?

OK Looks like it might be a problem with in the meta-azure.
It is expecting “swarm-worker-vmss” to be the name of the vmss, but I renamed it to something else in order to follow our naming conventions.

On the manager instance:

docker logs meta-azure

Shows a bunch of these:

Path:[GET] /token/worker/
User: curl/7.52.1 [10.1.0.7:57030]
userIP: 10.1.0.7 on port 57030

Error:  network.InterfacesClient#ListVirtualMachineScaleSetNetworkInterfaces: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404                      Code="ParentResourceNotFound" Message="Can not perform requested operation on nested resource. Parent resource 'swarm-worker-vmss' not found."
Couldn't get Worker Nic for VMSS: network.InterfacesClient#ListVirtualMachineScaleSetNetworkInterfaces: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service ret                     urned an error. Status=404 Code="ParentResourceNotFound" Message="Can not perform requested operation on nested resource. Parent resource 'swarm-worker-vmss' not found.

@ryandanthony meta-azure checks that the VM/IP requesting the swarm join tokens is part of the VMSS provisioned by the template before giving out the tokens. Changing the names of the resources like you have done is something not supported by meta-azure. So what you are seeing is expected.

ok makes sense, but I’d like to see some documentation that states “changing names will break things”