Docker for Azure no longer accessible

Expected behavior

I may access normally master nodes in my swarm environment via ssh command and I can run any other valid docker commands.

Actual behavior

$ ssh -i .ssh/cert.pem -p 50000 docker@ucplb-xxx.westeurope.cloudapp.azure.com
ssh: connect to host ucplb-xxx.westeurope.cloudapp.azure.com port 50000: Connection refused

$ docker info
Cannot connect to the Docker daemon at tcp://ucplb-xxx.westeurope.cloudapp.azure.com:443. Is the docker daemon running?
$ docker version
Client:
Version: 17.03.2-ce
API version: 1.27
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 03:35:14 2017
OS/Arch: linux/amd64
Cannot connect to the Docker daemon at tcp://ucplb-xxx.westeurope.cloudapp.azure.com:443. Is the docker daemon running?

Additional Information

Azure load balancer show ports open and VMs up and running.
I have tried to restart VM scale sets supporting the dockers.

Steps to reproduce the behavior

Always.

Any help will be greatfull?
Thanks in advance.

Hi Jojuva,

Are you trying to connect to UCP and run services on cluster from your laptop or trying to ssh to the node to check container logs?
If you’d like to run services, you need to download ucp client bundle (admin->Profile->Client Bundle) and source the env.sh file

Hope this helps,

thanks,
Uday

Hi ushetty,
I used to connect either ways but now i cannot acces any of them even to UCP management console!! In the other side, azure resources look fine.
Thanks
Jonás

Jonas,

can you join zoom? Pls send your email addr, will send an invite.

–Uday

Thanks Uday for your help.
I will wait for your feedabck.

Hi @jojuva,

I had the same issue in the past, and overcame the situation deploying docker through docker-cloud, using azure as a cloud provider. So i am able to manage the containers, images, etc. And the environment is always available. I know maybe you wanted to use just as a remote docker-machine instance, but putting docker-cloud in the middle saved me lots of headache. Hope that helps.
Cheers,
Marco

Hi jojuva,
I am facing the same issue when I try to ssh in to the swarm-manager-vmss scale set using the public IP
sh: connect to host XX.XXX.XX.XX port 50000: Connection refused

Could you please let me know if you were able to fix this.

Thanks

@awsazuser Are you using the EE template or the CE one?

I am using CE.
it was working fine for me until last tuesday

I provisioned the swarm cluster using the Stable channel template provided in following link
https://store.docker.com/editions/community/docker-ce-azure

Azure templates provides an option to redeploy the resources , this option will migrate all the resources deployed using a ARM template to a new Azure host. Do you think performing a redeploy will help in this case ?

I provisioned a new CE swarm cluster .
I was able to login using my ssh keys but i was logged out with in few minutes

swarm-manager000000:~$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
gl10nhrov0jluhll4qmsai0n4 * swarm-manager000000 Ready Active Leader
rugg68ye14enag36fk87tozdr swarm-worker000001 Down Active
snbnij14zuvs7eabulln297md swarm-worker000000 Down Active
swarm-manager000000:~$
swarm-manager000000:~$
swarm-manager000000:~$ Connection to XX.XX.XX.XX closed by remote host.
Connection to XX.XX.XX.XX closed.

The fact that all your workers are down is suspicious. What is the state of the VMs in the swarm worker and manager VMSS in the Azure console? Are they up/running?

All the instance in Master and worker scale sets are up and running fine
image

image

Getting the same error on a swarm when I restarted the scale sets.

ssh: connect to host *****-docker.westeurope.cloudapp.azure.com port 50000: Connection refused

Swarm had been created with latest 17.09 CE stable template

I dont think i want to yet start reseting the SSH creds
docs.microsoft.com /en-us/azure/virtual-machines/linux/troubleshoot-ssh-connection

No issues if i create a brand new swarm

$ ssh -i /d/myPrivateKey_rsa -A -p 50000 docker@52.***.***.***
Welcome to Docker!
swarm-manager000000:~$ docker --version
Docker version 17.09.0-ce, build afdb6d4
swarm-manager000000:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
******** docker4x/l4controller-azure:17.09.0-ce-azure1 “loadbalancer run …” 14 minutes ago Up 14 minutes editions_controller

swarm-manager000000:~$

Got this working by scaling the number of manager nodes to 0, then scaling it back up.

Looks like all the running containers are gone, thats fine
The volumes are still here

Not sure if this is a docker or azure issue but it’s very annoying. Any other solutions that don’t require me losing all containers?

I was able to ‘fix’ this by deploying my swarm via docker cloud swarm beta using the latest edge release. Ssh is available after restart, though sometime it will say connection refused a few times, but it does eventually come back.