Hello everybody,
I’ve setup a docker ucp test environment which i deployed from a deployment machine with docker-machine.
UCP is working fine and docker-machine ls on the deployment machien was also working. But I ran into the following problem:
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS devstackdockerengine1 - generic Running tcp://192.168.123.1:2376 v1.11.1 esxdockerengine1 - vmwarevsphere Error Unknown esxdockerengine2 - vmwarevsphere Error Unknown esxdockerengine3 - vmwarevsphere Error Unknown
As you can see the esxdockerengines are showing state error; This is quite strange and i want to know how I can fix this. Note: ucp and my swarm are still working just fine (the swarm manager is located on the esxdockerengine1 node).
Since they are in state error, I can no longer ssh into my nodes
I have no idea why it broke, here is the output of the docker info command
[root@localhost ~]# docker info Containers: 24 Running: 23 Paused: 0 Stopped: 1 Images: 53 Server Version: swarm/1.1.3 Role: primary Strategy: spread Filters: health, port, dependency, affinity, constraint Nodes: 4 devstackdockerengine1: 192.168.123.1:12376 └ Status: Healthy └ Containers: 3 └ Reserved CPUs: 0 / 4 └ Reserved Memory: 0 B / 8.11 GiB └ Labels: executiondriver=, kernelversion=3.16.0-4-amd64, location=on_premise_BE, operatingsystem=Debian GNU/Linux 8 (jessie), provider=generic, storagedriver=aufs, target=apps, type=devstacker └ Error: (none) └ UpdatedAt: 2016-05-23T10:19:53Z esxdockerengine1: 192.168.123.14:12376 └ Status: Healthy └ Containers: 10 └ Reserved CPUs: 0 / 8 └ Reserved Memory: 0 B / 64.42 GiB └ Labels: executiondriver=native-0.2, kernelversion=4.1.19-boot2docker, location=on_premise_BE, operatingsystem=Boot2Docker 1.10.3 (TCL 6.4.1); master : 625117e - Thu Mar 10 22:09:02 UTC 2016, provider=vmwarevsphere, storagedriver=aufs, target=apps, type=controllers └ Error: (none) └ UpdatedAt: 2016-05-23T10:19:36Z esxdockerengine2: 192.168.123.15:12376 └ Status: Healthy └ Containers: 6 └ Reserved CPUs: 0 / 8 └ Reserved Memory: 0 B / 64.42 GiB └ Labels: executiondriver=native-0.2, kernelversion=4.1.19-boot2docker, location=on_premise_BE, operatingsystem=Boot2Docker 1.10.3 (TCL 6.4.1); master : 625117e - Thu Mar 10 22:09:02 UTC 2016, provider=vmwarevsphere, storagedriver=aufs, target=apps, type=secondary └ Error: (none) └ UpdatedAt: 2016-05-23T10:19:54Z esxdockerengine3: 192.168.123.39:12376 └ Status: Healthy └ Containers: 5 └ Reserved CPUs: 0 / 8 └ Reserved Memory: 0 B / 64.42 GiB └ Labels: executiondriver=, kernelversion=4.1.19-boot2docker, location=on_premise_BE, operatingsystem=Boot2Docker 1.11.0 (TCL 7.0); HEAD : 32ee7e9 - Wed Apr 13 20:06:49 UTC 2016, provider=vmwarevsphere, storagedriver=aufs, target=loadbalancer, type=loadbalancing └ Error: (none) └ UpdatedAt: 2016-05-23T10:19:45Z Cluster Managers: 1 192.168.123.14: Healthy └ Orca Controller: https://192.168.123.14:443 └ Swarm Manager: tcp://192.168.123.14:3376 └ KV: etcd://192.168.123.14:12379 Plugins: Volume: Network: Kernel Version: 4.1.19-boot2docker Operating System: linux Architecture: amd64 CPUs: 28 Total Memory: 201.4 GiB Name: ucp-controller-esxdockerengine1 ID: 4DAZ:FR3E:32PA:N2IG:MHHC:AXO3:L4MH:C2WQ:NL7S:IFRK:JLLA:WONP
ps: I asked the same question @ Docker machine shows suddenly error · Issue #3449 · docker/machine · GitHub
EDIT:
When I want to set the environment to a machine manually I get the following error:
eval "$(docker-machine env esxdockerengine2)"
Error checking TLS connection: Host is not running
But it is running, and I can even ping the address.
Whenever I execute the restart command, I get stuck at “waiting for ssh to be available”
docker-machine -D restart esxdockerengine2
Docker Machine Version: 0.6.0, build e27fb87
Found binary path at /usr/local/bin/docker-machine
Launching plugin server for driver vmwarevsphere
Plugin server listening at address 127.0.0.1:43493
() Calling .GetVersion
Using API Version 1
() Calling .SetConfigRaw
() Calling .GetMachineName
command=restart machine=esxdockerengine2
Restarting "esxdockerengine2"...
(esxdockerengine2) Calling .GetState
Error getting machine state: vm 'esxdockerengine2' not found
(esxdockerengine2) Calling .GetState
Error getting machine state: vm 'esxdockerengine2' not found
Waiting for SSH to be available...
Getting to WaitForSSH function...
(esxdockerengine2) Calling .GetSSHHostname
Error getting ssh command 'exit 0' : Host is not running
I can confirm that ssh is running because when I do ssh -lroot 192.168.123.14, I get prompted for a password.
Since I was using docker-machine version 0.6.0, I updated to 0.7.0, by issuing the command:
curl -L https://github.com/docker/machine/releases/download/v0.7.0/docker-machine-`uname -s
-
uname -m` > /usr/local/bin/docker-machine && chmod +x /usr/local/bin/docker-machine
But I still get the same error…