Still having communication problems in 1.12.1

Hi,
Really need some help with this and pounding head against wall.
I have an overlay network being created/spread across each node there is a swarm active container on, but the containers dont seem to be able to communicate with each other.
For example, a container called admin_server on node 1 will return a connection refused against a container called config_server running on node 2. The create strings are:

Admin Server

docker service create
–replicas 1
–restart-condition on-failure
–publish 8081:8081
–network sigmanet
–env SPRING_CLOUD_CONFIG_URI=“http://config-server:8888
–env SPRING_BOOT_ADMIN_URI=“http://admin-server:8081
–name admin-server
teamsigmacloud/admin-server:2.0.0-SNAPSHOT

docker service create
–replicas 1
–restart-condition on-failure
–publish 8888:8888
–network sigmanet
–env SPRING_CLOUD_CONFIG_URI=“http://config-server:8888
–env SPRING_BOOT_ADMIN_URI=“http://admin-server:8081
–name config-server
teamsigmacloud/config-server:2.0.0-SNAPSHOT

and when config server tries to communicate with admin server (different node), it get a connection refused. Same applies to any of the other services trying to connect with each other (all use dedicated ports and http)

I’m being fed stuff as ‘supposed to work’ by my developers, but I keep getting connection refused errors. These all work on a single node.

Desperately in need of guidance here

I’m in the same boat. I’ve asked on IRC, but didn’t get anywhere. I didn’t want to waste time posting on GitHub though, as that’s mainly for bugs.

My own details of the problem are here: https://gist.github.com/daviddyball/85cf3428f93ea706003b6a874b60f9cb

I really hope someone can point us in the right direction for fixing this. As you said according to the docs this should “just work™”… but it doesn’t.

UPDATE

Just to update on my previous comment. According to the docs I need 2377(TCP) 7946(TCP/UDP) and 4789(TCP/UDP) and nothing else. I’m running my setup inside an AWS VPC and I’ve noted through much trial and error that everything works fine if I allow “All Traffic” between hosts rather than the explicit TCP+UDP port definitions given in the documentation. Not sure why though.

Well I have a solution, of sorts, to this.

It seems that if you first run a global container connected to your overlay network, THEN run all your other containers, they can talk to each other. I know it makes no sense, but I tested it twice and it wasn’t a fluke. I used the sematext agent. Its possible that its something inside that agent that is making everything work, but I would imagine it should work with anything.

Worth a try, anyway, as 1.12.1 is obviously still a bit crumbly around the edges