Docker Community Forums

Share and learn in the Docker community.

No healthy node available in the cluster - Swarm - Docker-machine


(Dresantos) #1

Hi Everybody,

I am having a problem while testing Swarm. Prior to creating this post I have search, and fond a similar issue, but could not solve mine with that one.
So here is the description of what I have done.

I have docker engine on my laptop and I have also created via docker-machine two other nodes. One in VirtualBox and one i VMWare v phere.

NAME                ACTIVE   DRIVER          STATE     URL                         SWARM   DOCKER    ERRORS
testhost            -        virtualbox      Running   tcp://192.168.99.100:2376           v1.10.2
vm-docker-machine   -        vmwarevsphere   Running   tcp://192.168.2.178:2376            v1.10.2

Create Token

[asantos@fosters ~]$ docker run --rm swarm create
7b913280576ec17928b9b14f9657cef9

Create Swarm manager on laptop

[asantos@fosters ~]$ docker run -d -P swarm manage token://7b913280576ec17928b9b14f9657cef9
a8bc404fc4cb60547f42bf7698ce630289d0c9908e3ef4722938552ec6d6b614

[asantos@fosters ~]$ docker ps --no-trunc
CONTAINER ID                                                       IMAGE               COMMAND                                                    CREATED              STATUS              PORTS                     NAMES
a8bc404fc4cb60547f42bf7698ce630289d0c9908e3ef4722938552ec6d6b614   swarm               "/swarm manage token://7b913280576ec17928b9b14f9657cef9"   About a minute ago   Up About a minute   0.0.0.0:32770->2375/tcp   sleepy_torvalds

Create agent on testhost

[asantos@fosters ~ [testhost]]$ docker run -d swarm join --addr=$(docker-machine ip testhost):2376 token://7b913280576ec17928b9b14f9657cef9
1c90bdc51755c312655572acba162858697cd57c35628817112ee1510c03e80c
[asantos@fosters ~ [testhost]]$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
1c90bdc51755        swarm               "/swarm join --addr=1"   12 seconds ago      Up 10 seconds       2375/tcp            elated_williams

Create agent on vm-docker-machine

[asantos@fosters ~ [vm-docker-machine]]$ docker run -d swarm join --addr=$(docker-machine ip vm-docker-machine):2376 token://7b913280576ec17928b9b14f9657cef9
0c9f31f7d1393842432adcfa9c48202bf3eae8e2a2752706be7a59e2b4310c1b
[asantos@fosters ~ [vm-docker-machine]]$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
0c9f31f7d139        swarm               "/swarm join --addr=1"   5 seconds ago       Up 4 seconds        2375/tcp            nauseous_payne

Connectiong to Docker Swarm Manager

[asantos@fosters ~]$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                     NAMES
a8bc404fc4cb        swarm               "/swarm manage token:"   About an hour ago   Up About an hour    0.0.0.0:32770->2375/tcp   sleepy_torvalds
[asantos@fosters ~]$ docker -H tcp://127.0.0.1:32770 info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: swarm/1.1.3
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 2
 (unknown): 192.168.99.100:2376
  â Status: Pending
  â Containers: 0
  â Reserved CPUs: 0 / 0
  â Reserved Memory: 0 B / 0 B
  â Labels:
  â Error: (none)
  â UpdatedAt: 2016-03-09T19:52:52Z
 (unknown): 192.168.2.178:2376
  â Status: Pending
  â Containers: 0
  â Reserved CPUs: 0 / 0
  â Reserved Memory: 0 B / 0 B
  â Labels:
  â Error: (none)
  â UpdatedAt: 2016-03-09T19:56:52Z
Plugins:
 Volume:
 Network:
Kernel Version: 4.3.6-201.fc22.x86_64
Operating System: linux
Architecture: amd64
CPUs: 0
Total Memory: 0 B
Name: a8bc404fc4cb

Deploying NGINX and error

[asantos@fosters ~]$ docker -H tcp://127.0.0.1:32770 run -d -P nginx
docker: Error response from daemon: No healthy node available in the cluster.
See 'docker run --help'.

Can anyone please help me debug what I am doing wrong?
I have tried it several ways, but I always end on the same error.

BR,
André


(Kostasmistos) #2

Well, my suggestion is to use a service discovery tool like (consul, etcd,zookeeper), easy to set up for testing, just pull and run the respective docker image. Then start your nodes, advertise their ip:port and specify the ip of the consul node. Follow the tutorial here ( no need to create replicas)

In the other hand if you do not want to use a service discovery tool, please provide the logs of the swarm agents and manager and also provide the docker daemon arguments. I see you are using port 2376 for connecting to the docker daemon which according to documentation is used for secure connectivity using TLS which in turns implies you need to use certificates when trying to join the cluster.


(Dresantos) #3

Hi kostasmistos,

Thanks for your replay.
Prior to changing to other discovery service I would like to debug and understand what I am doing wrong.

Hope I am providing the info you’ve asked for:

Local docker daemon arguments

[asantos@fosters ~]$ ps -ef | grep docker
root      5488 31776  0 14:21 ?        00:00:00 docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 32769 -container-ip  172.17.0.2 -container-port 2375
root     31776     1  0 14:09 ?        00:00:04 /usr/bin/docker daemon --log-level="debug" -D -H fd://

[root@fosters ~]# cat /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket

[Service]
Type=notify
ExecStart=/usr/bin/docker daemon --log-level="debug" -D -H fd://
MountFlags=slave
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0

[Install]
WantedBy=multi-user.target

Swarm Manager Logs

[asantos@fosters ~]$ date;docker -H tcp://127.0.0.1:32769 info;docker -H tcp://127.0.0.1:32769 run -P nginx;docker logs a8bc404fc4cb

Sex Mar 11 15:07:06 WET 2016


Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: swarm/1.1.3
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 2
 (unknown): 192.168.2.178:2376
  └ Status: Pending
  └ Containers: 0
  └ Reserved CPUs: 0 / 0
  └ Reserved Memory: 0 B / 0 B
  └ Labels: 
  └ Error: (none)
  └ UpdatedAt: 2016-03-11T14:21:22Z
 (unknown): 192.168.99.100:2376
  └ Status: Pending
  └ Containers: 0
  └ Reserved CPUs: 0 / 0
  └ Reserved Memory: 0 B / 0 B
  └ Labels: 
  └ Error: (none)
  └ UpdatedAt: 2016-03-11T14:26:22Z
Plugins: 
 Volume: 
 Network: 
Kernel Version: 4.4.3-201.fc22.x86_64
Operating System: linux
Architecture: amd64
CPUs: 0
Total Memory: 0 B
Name: a8bc404fc4cb
docker: Error response from daemon: No healthy node available in the cluster.
See 'docker run --help'.

time="2016-03-11T15:07:06Z" level=error msg="HTTP error: No healthy node available in the cluster" status=500 

Swarm Agent Logs

time="2016-03-11T15:05:35Z" level=info msg="Registering on the discovery service every 1m0s..." addr="192.168.99.100:2376" discovery="token://7b913280576ec17928b9b14f9657cef9" 
time="2016-03-11T15:06:36Z" level=info msg="Registering on the discovery service every 1m0s..." addr="192.168.99.100:2376" discovery="token://7b913280576ec17928b9b14f9657cef9" 
time="2016-03-11T15:07:37Z" level=info msg="Registering on the discovery service every 1m0s..." addr="192.168.99.100:2376" discovery="token://7b913280576ec17928b9b14f9657cef9" 
time="2016-03-11T15:08:38Z" level=info msg="Registering on the discovery service every 1m0s..." addr="192.168.99.100:2376" discovery="token://7b913280576ec17928b9b14f9657cef9"

(Phosphre) #4

You have my entire sympathy - I have exactly the same problem. The docs for docker on centos appear to be lacking. Why I should have to spend an entire evening hacking this supposedly simple platform is a mystery.


(Dresantos) #5

Hi,

I was able to “solve” the problem with the help of docker-machie to create the swarm instead of doing in by hand.
This are the steps I have take:

Create the Swarm key

docker run --rm swarm create 
0150bf0acaf94ea1f245d1ddxxxxxxxx 

Create the swarm master
In my case I have used the virtualbox driver

docker-machine create --driver virtualbox --swarm --swarm-master --swarm-discovery token://0150bf0acaf94ea1f245d1ddxxxxxxxx master-node

Create a swarm node
Using again the virtualbox driver

docker-machine create --driver virtualbox --swarm --swarm-discovery token://0150bf0acaf94ea1f245d1ddxxxxxxxx node-01

Create another swarm node
This time using vmware vsphere driver

docker-machine create --driver vmwarevsphere --vmwarevsphere-username=user --vmwarevsphere-password=supersegurinho --vmwarevsphere-vcenter=vcenter.pt --vmwarevsphere-datastore=Test_iSCSI_Datastore --vmwarevsphere-network="VM Network" --vmwarevsphere-datacenter=DATACENTER --vmwarevsphere-hostsystem=CLUSTER/esx4.pt --swarm --swarm-discovery token://0150bf0acaf94ea1f245d1ddxxxxxxxx node-02

Loading system variables automatically and connect to swarm

eval $(docker-machine env --swarm master-node)
[user@fosters ~ [master-node]]$ docker info 
Containers: 7 
Running: 4 
Paused: 0 
Stopped: 3 
Images: 7 
Server Version: swarm/1.1.3 
Role: primary 
<...OutPut Omitted...>

Passing the certificates manually and connect to the swarm
Instead of load the variables automatically, you can load the certs by hand

docker -H tcp://192.168.99.100:3376 --tlsverify=true
--tlscacert=/home/user/.docker/machine/machines/master-node/ca.pem
--tlscert=/home/user/.docker/machine/machines/master-node/server.pem
--tlskey=/home/user/.docker/machine/machines/master-node/server-key.pem info

Containers: 7  
Running: 4 
Paused: 0 
Stopped: 3 
Images: 7 
Server Version: swarm/1.1.3 
Role: primary 
Strategy: spread 
Filters: health, port, dependency, affinity, constraint 
Nodes: 3 
 master-node: 192.168.99.100:2376 
└ Status: Healthy
<...OutPut Omitted...>

Hope this can help you.
But in the mean time I still could not found how to do it without docker-machine help

Best Regards


(5003152) #6

i found that the problem was cause i didn’t bind the right docker port :

netstat -tulpn
tcp6 0 0 :::2375 :::* LISTEN 5979/dockerd

then run this at the worker machine :
docker run -d swarm join --advertise=workeripadder:2375 token://xxxxxxx

run this at the swarm manager -
docker run -d --name manage -p 4243:2375 swarm manage token://xxxxxxx
docker -H tcp://0.0.0.0:4243 info

and youwill see all the info.
hope it help.