How do I repair or refresh ingress network?

sawanoboly · August 20, 2016, 9:18am

Ingress network is unstable.

If the load balance in the Ingress network is no longer responding, I do not know a way to recovery.
How can you debug and recover it?

Expected behavior

service creation success with publishing option.
ELB returns response.

Actual behavior

service creation success with publishing option.
ELB listener start TCP balancing.
ingress published port become can’t return response.

Additional Information

Template: d4x 1.12.1-bata5
3 managers (t2.small), 3 workers (t2.midium)

Steps to reproduce the behavior

create and delete sevice a number of times.

mavenugo · August 28, 2016, 4:26pm

We are in the process of adding diagnostic tool to docker for AWS that will collect all the required data that will help troubleshoot the issue better.

In order to help you debug this issue, can you please share the following information ?

Are the ELB listeners properly programmed when the services come up and go down ?
Can you confirm if the services can talk-to-each other within the swarm cluster ? Is this a ingress network only issue ?
Exact commands and steps to reproduce the issue

sawanoboly · August 29, 2016, 2:39am

Are the ELB listeners properly programmed when the services come up and go down ?

Yes.

Can you confirm if the services can talk-to-each other within the swarm cluster ? Is this a ingress network only issue ?

create/delete services are works. Probably only ingress network’s issue.
And I’ve noticed when I’ve added custom worker node which created with ubuntu to swarm cluster for check behavior. Ubuntu workers didn’t fail response, but Moby linux workers are failed to response.

Exact commands and steps to reproduce the issue

I’ve created service with own custom image. it takes 60-100 secs until listen port due to startup script.

docker service create -name myservice -p 10000:8080 --constraint 'node.role == worker' myservice

Thanks,

friism · August 29, 2016, 2:29pm

This is not really a supported use case - you should use the AWS scaling group to add or remove workers.

Can you provide more detailed steps to reproduce? Eg.

deployed Docker for AWS with X managers and Y workers of instance type Z
deployed service foo
checked hostname bar, and got no response
4 …

Michael

sawanoboly · August 30, 2016, 4:20am

This is not really a supported use case - you should use the AWS scaling group to add or remove workers.

I see, it’s just added for check swarm network. I’ll remove from cluster after finding what is problem.

deployed Docker for AWS with X managers and Y workers of instance type Z

3 managers (t2.small)
3 workers (t2.midium)

deployed service foo

I could not reproduce 100%…

service create -n 10001 nginx -p 10001:80
- curl http://{ELB_ENDPOINT}:1000x
- service delete 10001
service create -n 10002 nginx -p 10002:80
- curl http://{ELB_ENDPOINT}:10002
- service delete 10002
service create -n 10003 nginx -p 10003:80
- curl http://{ELB_ENDPOINT}:10003
- service delete 10003
service create -n 100xx nginx -p 100xx:80
- curl http://{ELB_ENDPOINT}:100xx
- service delete 100xx

Sometime fails curl from external. But docker -H {Worker} exec {CT_ID} curl localhost was success.

(edit)

After once the cluster have fallen into the state, replacing workers is possible to recover.

cirocosta · June 5, 2017, 9:16pm

Hey, are there any updates to this? I noticed the same in a 6-worker cluster running edge (Server Version: 17.05.0-ce
).

Thx!

jjmata · July 17, 2017, 9:27pm

We’re seeing the same. The ELB all of a sudden drops listeners/ports, even though services are up (and we can curl them internally). Updates? Short of killing the Swarm there is no way I can see working around this. Specially for this cumbersome SSL terminations where the cert needs to be passed in as a a label. I am sure the CLI could help, alas!

Topic		Replies	Views
How to repair a node that's not responding to ingress networking for a service? Swarm	0	1421	September 30, 2016
Failed to find a load balancer IP to use for network:Ingress General	1	2523	January 23, 2018
ELB keeps sporadically resetting its listener ports General aws	2	1764	November 15, 2017
Swarm mode not load balancing General	3	4524	November 14, 2016
Ingress network in present, but docker service fails to create Swarm swarm	1	7886	January 21, 2018

How do I repair or refresh ingress network?

Expected behavior

Actual behavior

Additional Information

Steps to reproduce the behavior

Related topics