We are in the process of adding diagnostic tool to docker for AWS that will collect all the required data that will help troubleshoot the issue better.
In order to help you debug this issue, can you please share the following information ?
Are the ELB listeners properly programmed when the services come up and go down ?
Can you confirm if the services can talk-to-each other within the swarm cluster ? Is this a ingress network only issue ?
Are the ELB listeners properly programmed when the services come up and go down ?
Yes.
Can you confirm if the services can talk-to-each other within the swarm cluster ? Is this a ingress network only issue ?
create/delete services are works. Probably only ingress network’s issue.
And I’ve noticed when I’ve added custom worker node which created with ubuntu to swarm cluster for check behavior. Ubuntu workers didn’t fail response, but Moby linux workers are failed to response.
Exact commands and steps to reproduce the issue
I’ve created service with own custom image. it takes 60-100 secs until listen port due to startup script.
We’re seeing the same. The ELB all of a sudden drops listeners/ports, even though services are up (and we can curl them internally). Updates? Short of killing the Swarm there is no way I can see working around this. Specially for this cumbersome SSL terminations where the cert needs to be passed in as a a label. I am sure the CLI could help, alas!