Inter node service communication in docker swarm mode not working in AzureStack

  • Issue type:

    • inter node service communication in docker swarm mode not working in AzureStack
    • 1 instance of a service can’t ping other instance of service in other node
    • Also docker LB (VIP) not working as it can’t communicate to other node
  • OS Version/build
    I am using Ubuntu 16.04 standard cloud image :
    http://cloud-images.ubuntu.com/releases/xenial/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vhd.zip
    Kernel version of this image : 4.15.0-1025-azure #26~16.04.1-Ubuntu SMP Tue Sep 25 11:09:50 UTC 2018

  • App version
    docker ce version: Docker version 18.06.1-ce, build e68fc7a

  • Steps to reproduce

  1. have 2 Ubuntu nodes (images as described above) connected on same subnet. All fw ports are open.

  2. Each node has only 1 NIC

  3. Node Description:

    • Master : Private IP: 10.0.0.13 , Public IP: 192.168.102.39
    • Worker : Private IP: 10.0.0.14 , Public IP: 192.168.102.38
  4. do docker swarm init --advertise-addr 10.0.0.13

  5. Worker joins as
    docker swarm join --token SWMTKN-1-33d34nxoq8cvf4xl39sbv0gr8963t0i1df9zqjzl4i4g8i47n7-ee3p96zm4gzddbiuen4xh4u9u --advertise-addr 10.0.0.14 10.0.0.13:2377

  6. docker node ls in Master node shows everything up and running and IPs are correct

  7. deploy a service with 2 replicas (simple ubuntu nginx)

  8. Get into the instance running in Master and ping the other one - it doesn’t work
    The VIP is ping able and dns resollution works fine

  9. I did a tcpdump capture on worker node while doing #8 step above and I see all udp vxlan packets on port 4789 being received

  10. There is no iptables special entries blocking this

  11. I see : netstat -nua that udp socket open at 0.0.0.0:4789 but vxlan0 interface in docker namespace not showing this packet by tcpdump

  12. I seems that vxlan termination not working.
    Also to note , when I do the same experiment in Azure public cloud with almost similar market place image things work fine.

Any pointers/help appreciated.

thanks

I don’t use azure. I have a swarm running on bare metal RHEL 7.5 hosts. I want to help, but can only ask questions.

when you say:

Get into the instance running in Master and ping the other one - it doesn’t work

I think you mean, you ‘docker exec’ into the replica container that happens to run on the master. You ping the ip address of the replica running on the worker, which you found with 'docker inspect container" on the worker.

Is there any information in /var/log/messages on either host? If vxlan is not working or some other issue, dockerd would write logs to host’s syslogd, which writes to /var/log/messages on ubuntu (I think).

How about the network pat from the exported port through the mesh route to each replica in turn? Does the service log access? Can you send requests to the service from external client, and see that each replica can receive the request?