Swam 1.12 Multi-host networking help

Hello, Docker People!

Can anyone help with why multi-host networking is not working on Swarm v1.12?

1. I created 3 nodes with docker-machine

docker-machine create -d virtualbox swarm1
docker-machine create -d virtualbox swarm2
docker-machine create -d virtualbox swarm3
docker-machine create -d virtualbox swarm4

(I then did swarm init and swarm join on all the nodes and and they are all active in the cluster)

2. Logged in swarm1 to create the services

docker network create chat -d overlay
docker service create --name mysql --network chat rusher81572/mysql
docker service create --name chat --network chat -p 8080 rusher81572/nodechat
docker service ls

Chat app keeps dying because it can not resolve “mysql” by hostname.

3. Further testing with empty container for testing:

docker service create --name empty --network chat rusher81572/empty
docker exec -it f343da bash
ping mysql

( Destination not reachable)

1 Like

Hi,

This sounds like the same issue I had recently https://forums.docker.com/t/internal-dns-issue/17398.

For me this worked fine, when using boot2docker for the virtual machines (as you are doing above), but exhibited exactly the same behaviour when using vagrant “ubuntu/trusty64”.

Have you tried checking the /etc/hosts files in the containers themselves? As I understand it these should be automatically rewritten to by docker, but when i examined these, it was only for services on the same host.

chris

I do not think it is a name resolution issue because I can not even ping the IP from each container. I thought that Docker does not use hosts anymore because of their DNS?

I had this identical problem, but I think I solved it by assigning services to the ingress network upon creation (ingress is created automatically when initializing a new swarm), rather than spinning up a new overlay network. Might be worth a try.

Thanks for the suggestion. I tried my instructions with Docker version 1.12.0-rc4, build e4a0dbc, same problem. I then tried your suggestions using the ingress network and had the same problem

Hi.

I had the same issue. The problem is that you are using “ping mysql” to test if the name resolition works. Try with another program, for example “curl”.

The virtual ip used for the load balacing of all the nodes is resoved correctly but it does not support ping and this is by design.

Thanks for the response. That does not make any sense since ping works with the previous version of Docker and Swarm and the app could not resolve mysql anyway on the network. I tried your suggestion and it did not work.

I am experiencing the same thing, I have several containers all on the same network, as separate services in swarm (1.12 rc4) and none of them can talk to one another

I got mine working by following this tutorial by the letter, the modifying incrementally to my use case - http://lucjuggery.com/blog/?p=604. What was useful in this tutorial was the part where he starts the Mongo client in another container to connect to the Mongo server.

The other thing I had to do is destroy and re-create the VirtualBox machines that I created for this. Seems something gets confused if you have multiple host adapters or something. In my case I use Vagrant for other tasks, as well as a separate user account which wanted to create a duplicate host adapter. I had to go into VirtualBox and delete the duplicate entries. Then I created my machines on a specific subnet
docker-machine create -d virtualbox --virtualbox-hostonly-cidr "192.168.90.1/24" --virtualbox-boot2docker-url "file://$HOME/Downloads/boot2docker-experimental-rc4.iso" node-1

Things to keep in mind:

  • 1.12.0-rc5 just came out, maybe re-try with that?
  • ping not working from container-to-container in Swarm overlay is normal and expected behavior (ICMP will not be forwarded by the IPVS-based load balancing), but other high-level protocols such as HTTP should work. Swarm mode networking will not neccessarily work the exact same way as legacy “Swarm proxy” networking did.
  • Docker discovery does not work by re-writing /etc/hosts, each container uses the daemon as a DNS resolver to resolve services based on name.
  • Have you ensured that --listen-addr and --advertise-addr are set properly on the init and join nodes? (This could potentially be a biggie – autodetection doesn’t necessarily work if you have a lot of interfaces in ifconfig)
  • Are the ports for VXLan and Serf (gossip) available between the machines (this should be fine on VBox, but worth checking)
  • What do the Docker logs (/var/log/docker.log) say (esp. “ERROR” level severity)? Any clues?
  • IIRC, ingress network is just for --publish, communication between containers on the same overlay / network should work fine without needing to be on that network.
1 Like

I tested out the GA release and these steps work fine. Pretty cool

1 Like

Nice to hear @rusher81572!

Summary of my findings of the GA release:

  1. Constraints are not honored and containers are running anywhere they want.

  2. Mesh networking is flakey

    • takes a long time to access a container on any host
    • Most of the time you can not access each container at all
    • containers still have problems resolving other containers on the same network on other machines

Sorry to hear @rusher81572. You might want to file issues for the stated problems at https://github.com/docker/docker/issues/new with minimally reproducible examples if possible. As with anything new ironing out the details will take a while in swarm mode.

1 Like

Tested out the GA release and it still doesn’t work :frowning:
Two linux real nodes, running 1.12.0 build 8eab29e
network defined to the swarm as docker network create --driver overlay --subnet 192.168.99.0/24 SIGMAnet
all service builds including the line --network SIGMAnet

any service can resol any other service started on the same node, but cant resolve any service on the opposite node

from the notes above, sounds like a fairly endemic problem.

I haven’t tested it myself yet, but check the release notes on 1.12.1-rc1.
It lists numerous network bugs like the ones described here as fixed.

1 Like

thanks colin - bit scary 1.12.0 would go GA with such fundamental issues (if such the case)

1 Like

Thanks for the clarification. I am trying 1.12.1 on my Rpi cluster now to see if there are any improvements

Nope, did not work on 1.12.1.

Relief - not just me being stupid then (hopefully :slight_smile: )