One container cannot see another

Summary

This is my first post to the community, I hope I’ve managed to abide by the guidelines and provide sufficient information. The summary is that after upgrading docker, our containers can no longer “see” the oauth container, which for some reason the container attempts to resolve by using the ingress ip address. More details…

Current Platform

Our Spring Cloud based platform has several containers that are launched in a swarm via “docker service” and run in a Docker Engine 1.12 environment. The containers are configured to talk to an oauth container for authentication and authorization purposes:

oauth2:
   server:
     url: https://auth:9999

Upgraded Platform

We are in the process of converting the platform to launch in a swarm via “docker stack deploy” and also with docker-compose files as opposed to laboriously specifying the options at the command line. The target docker environment is Docker engine: 18.09.2 .

However the new configuration suffers from a strange failure: the containers are no longer able to access the oauth container, we get the stack trace below from spring cloud. The strange thing is that it somehow given the ingress ip address for the container which is 10.255.196.98:

 at org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.forward(RibbonRoutingFilter.java:158)
 ... 110 common frames omitted
Caused by: java.lang.RuntimeException: org.apache.http.conn.ConnectTimeoutException: Connect to 10.255.196.98:9999 [/10.255.196.98] failed: connect timed out
 at rx.exceptions.Exceptions.propagate(Exceptions.java:58)
 at rx.observables.BlockingObservable.blockForSingle(BlockingObservable.java:464)
 at rx.observables.BlockingObservable.single(BlockingObservable.java:341)
 at com.netflix.client.AbstractLoadBalancerAwareClient.executeWithLoadBalancer(AbstractLoadBalancerAwareClient.java:112)
 ... 170 common frames omitted

More Information

From docker inspect on any of our containers two overlay networks can be seen (many details omitted):

"Networks": {
	"app_default": {
		"IPAMConfig": {
			"IPv4Address": "192.168.98.37"
		},
	},
	"ingress": {
		"IPAMConfig": {
			"IPv4Address": "10.255.196.98"
		},
	}
}

If only docker had dispensed the app_default ip address of 192.168.98.37 to spring cloud then all would have been well because I’ve confirmed by opening a shell into the container and could successfully connect with telnet 192.168.98.37 9999. The same telnet command did not work for telnet 10.255.196.98 9999, however ping 10.255.196.98 did work. So it would appear accessing the port is a problem. Our Operations department are adamant that there are no firewall rules working against us.

One more important point is that as another experiment we kept our platform the same and only upgraded docker and noticed the failure started when we upgraded to just a few version beyond Docker Engine 1.12 - I’m afraid I don’t have the exact version information.

Questions

We don’t have much experience of overlay networks so any pointers would be gratefully received.

For example, why are there two overlay networks present on each of our containers? I thought the ingress network was the only one needed since its job is to facilitate docker daemon communication.

Also how come the ingress ip address was being selected and passed through to our containers rather than the app_default?

Many thanks