I have a Host cluster of 6-7 hosts, on Azure, provisioned by Docker Cloud.
I have an autoscaling service that checks whether to scale and scales other services based on some metrics, every 5 minutes.
Every day or so, when attempting to scale a service by placing a new container in some node, it fails to scale and the autoscaling scheme goes into an endless error state. I get:
container-3: Starting with docker id some-id in some-host-id.node.dockerapp.io ERROR: container-3: Unexpected error executing docker command start: HTTPSConnectionPool(host='22.214.171.124', port=2375): Read timed out. (read timeout=10) ERROR: Service Scale action on 'container' (using 'my/image:latest') has failed
The only workaround I’ve found is to manually kill the host and wait till have docker cloud re-schedules my container to healthy hosts, then add a new host.
This is unacceptable for HA needs. Can someone please help?