A strange network problem in swarm with vip mode for zookeeper cluster

Description

I have a docker-compose file for zookeeper, it runs ok in the machine of s17, s18, s19

but it doesn’t work in the machine of s27, s52, s53 because of network problem

here is the ip of services

bash-4.4# nslookup zoo1
nslookup: can't resolve '(null)': Name does not resolve

Name:      zoo1
Address 1: 10.0.0.5
bash-4.4# nslookup tasks.zoo1
nslookup: can't resolve '(null)': Name does not resolve

Name:      tasks.zoo1
Address 1: 10.0.0.6 0971a1b2357a
bash-4.4#
bash-4.4# nslookup zoo2
nslookup: can't resolve '(null)': Name does not resolve

Name:      zoo2
Address 1: 10.0.0.7
bash-4.4# nslookup tasks.zoo2
nslookup: can't resolve '(null)': Name does not resolve

Name:      tasks.zoo2
Address 1: 10.0.0.8 zookeeper_zoo2.1.ai49thvfj522s7h9057u2umq5.zookeeper_default
bash-4.4#
bash-4.4# nslookup zoo3
nslookup: can't resolve '(null)': Name does not resolve

Name:      zoo3
Address 1: 10.0.0.9
bash-4.4# nslookup tasks.zoo3
nslookup: can't resolve '(null)': Name does not resolve

Name:      tasks.zoo3
Address 1: 10.0.0.10 zookeeper_zoo3.1.2ub2pz1daxbkr56jmmj2ju7ts.zookeeper_default

  1. 10.0.0.5 is the vip of service zoo1, 10.0.0.6 is the real ip of the container behind the service zoo1
  2. 10.0.0.7 is the vip of service zoo2, 10.0.0.8 is the real ip of the container behind the service zoo2
  3. 10.0.0.9 is the vip of service zoo3, 10.0.0.10 is the real ip of the container behind the service zoo3

here is the logs of the container behind the service zoo3

[zhujipeng@s53 ~]$ sudo docker logs ddbcd737511f | tail -n 30
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
2018-03-13 02:56:18,186 [myid:3] - WARN  [/0.0.0.0:3888:QuorumCnxManager@461] - Exception reading or writing challenge: java.io.EOFException
2018-03-13 02:56:21,640 [myid:3] - INFO  [/0.0.0.0:3888:QuorumCnxManager$Listener@743] - Received connection request /10.0.0.10:37186
2018-03-13 02:56:21,641 [myid:3] - WARN  [/0.0.0.0:3888:QuorumCnxManager@461] - Exception reading or writing challenge: java.io.EOFException
2018-03-13 02:56:43,852 [myid:3] - WARN  [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumCnxManager@584] - Cannot open channel to 1 at election address zoo1/10.0.0.5:3888
java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:610)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:845)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:957)
2018-03-13 02:56:43,853 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer$QuorumServer@184] - Resolved hostname: zoo1 to address: zoo1/10.0.0.5
2018-03-13 02:56:48,858 [myid:3] - WARN  [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumCnxManager@584] - Cannot open channel to 2 at election address zoo2/10.0.0.7:3888
java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:610)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:845)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:957)
2018-03-13 02:56:48,859 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer$QuorumServer@184] - Resolved hostname: zoo2 to address: zoo2/10.0.0.7
2018-03-13 02:56:48,860 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@854] - Notification time out: 60000

the container of service zoo3 can’t access the port 3888 of service zoo1, zoo2

here is the network status of containers

the container of service zoo1

bash-4.4# ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:0A:00:00:06
          inet addr:10.0.0.6  Bcast:10.0.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:10 errors:0 dropped:0 overruns:0 frame:0
          TX packets:50 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:420 (420.0 B)  TX bytes:3380 (3.3 KiB)

eth1      Link encap:Ethernet  HWaddr 02:42:AC:12:00:03
          inet addr:172.18.0.3  Bcast:172.18.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:794 (794.0 B)  TX bytes:497 (497.0 B)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:97 errors:0 dropped:0 overruns:0 frame:0
          TX packets:97 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:6248 (6.1 KiB)  TX bytes:6248 (6.1 KiB)

bash-4.4# netstat -tlpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:2181            0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:3888            0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:40944           0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.11:36659        0.0.0.0:*               LISTEN      -

bash-4.4# nc -z -v -w 3 10.0.0.5:3888
10.0.0.7:3888 (10.0.0.7:3888) open
bash-4.4# nc -z -v -w 3 10.0.0.6:3888
10.0.0.8:3888 (10.0.0.8:3888) open
bash-4.4# nc -z -v -w 3 10.0.0.7:3888
nc: 10.0.0.7:3888 (10.0.0.7:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.8:3888
nc: 10.0.0.8:3888 (10.0.0.8:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.9:3888
nc: 10.0.0.9:3888 (10.0.0.9:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.10:3888
nc: 10.0.0.10:3888 (10.0.0.10:3888): Operation timed out

the container of service zoo2

bash-4.4# ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:0A:00:00:08
          inet addr:10.0.0.8  Bcast:10.0.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:16 errors:0 dropped:0 overruns:0 frame:0
          TX packets:88 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:672 (672.0 B)  TX bytes:6000 (5.8 KiB)

eth1      Link encap:Ethernet  HWaddr 02:42:AC:12:00:03
          inet addr:172.18.0.3  Bcast:172.18.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:223 (223.0 B)  TX bytes:148 (148.0 B)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:2095 (2.0 KiB)  TX bytes:2095 (2.0 KiB)

bash-4.4# netstat -tlpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.11:34171        0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:2181            0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:33384           0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:3888            0.0.0.0:*               LISTEN      -
bash-4.4# nc -z -v -w 3 10.0.0.5:3888
nc: 10.0.0.5:3888 (10.0.0.5:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.6:3888
nc: 10.0.0.6:3888 (10.0.0.6:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.7:3888
10.0.0.7:3888 (10.0.0.7:3888) open
bash-4.4# nc -z -v -w 3 10.0.0.8:3888
10.0.0.8:3888 (10.0.0.8:3888) open
bash-4.4# nc -z -v -w 3 10.0.0.9:3888
nc: 10.0.0.9:3888 (10.0.0.9:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.10:3888
nc: 10.0.0.10:3888 (10.0.0.10:3888): Operation timed out

the container of service zoo3

bash-4.4# ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:0A:00:00:0A
          inet addr:10.0.0.10  Bcast:10.0.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:19 errors:0 dropped:0 overruns:0 frame:0
          TX packets:110 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:798 (798.0 B)  TX bytes:7532 (7.3 KiB)

eth1      Link encap:Ethernet  HWaddr 02:42:AC:12:00:03
          inet addr:172.18.0.3  Bcast:172.18.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:42 errors:0 dropped:0 overruns:0 frame:0
          TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:2520 (2.4 KiB)  TX bytes:2520 (2.4 KiB)

bash-4.4# netstat -tlpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:43588           0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:2181            0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:3888            0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.11:44435        0.0.0.0:*               LISTEN      -
bash-4.4# nc -z -v -w 3 10.0.0.5:3888
nc: 10.0.0.5:3888 (10.0.0.5:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.6:3888
nc: 10.0.0.6:3888 (10.0.0.6:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.7:3888
nc: 10.0.0.7:3888 (10.0.0.7:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.8:3888
nc: 10.0.0.8:3888 (10.0.0.8:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.9:3888
10.0.0.9:3888 (10.0.0.9:3888) open
bash-4.4# nc -z -v -w 3 10.0.0.10:3888
10.0.0.10:3888 (10.0.0.10:3888) open

the port 3888 is listening in all the container but can’t be access by another container and can access by self

here is the compose file

version: '3.3'

services:
  zoo1:
    image: zookeeper
    deploy:
      placement:
        constraints:
          - node.hostname==${MY_MASTER_HOSTNAME}
    ports:
      - target: 2181
        published: 2181
        protocol: tcp
        mode: host
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
    volumes:
      - zoo_data:/data
      - zoo_logs:/datalog

  zoo2:
    image: zookeeper
    deploy:
      placement:
        constraints:
          - node.hostname==${MY_SLAVE1_HOSTANME}
    ports:
      - target: 2181
        published: 2181
        protocol: tcp
        mode: host
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
    volumes:
      - zoo_data:/data
      - zoo_logs:/datalog

  zoo3:
    image: zookeeper
    deploy:
      placement:
        constraints:
          - node.hostname==${MY_SLAVE2_HOSTANME}
    ports:
      - target: 2181
        published: 2181
        protocol: tcp
        mode: host
    environment:
      ZOO_MY_ID: 3
      ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
    volumes:
      - zoo_data:/data
      - zoo_logs:/datalog

volumes:
  zoo_data:
    driver: local
  zoo_logs:
    driver: local

this compose file run ok in the machine of s17, s18, s19

but it doesn’t work in the machine s27, s52, s53

  1. and i see a warning in s27 with docker info
WARNING: bridge-nf-call-ip6tables is disabled

but s52, s53 doesn’t have the warning

  1. and i see a warning in s19 with docker info
WARNING: bridge-nf-call-ip6tables is disabled

but s17, s18 doesn’t have the warning

but when i expose port 2888, 3888, it runs ok in the machine s27, s52, s53

version: '3.3'

services:
  zoo1:
    image: zookeeper
    deploy:
      placement:
        constraints:
          - node.hostname==${MY_MASTER_HOSTNAME}
    ports:
      - target: 2181
        published: 2181
        protocol: tcp
        mode: host
      - target: 2888
        published: 2888
        protocol: tcp
        mode: host
      - target: 3888
        published: 3888
        protocol: tcp
        mode: host
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=${MY_SLAVE1_HOSTANME}:2888:3888 server.3=${MY_SLAVE2_HOSTANME}:2888:3888
    volumes:
      - zoo_data:/data
      - zoo_logs:/datalog

  zoo2:
    image: zookeeper
    deploy:
      placement:
        constraints:
          - node.hostname==${MY_SLAVE1_HOSTANME}
    ports:
      - target: 2181
        published: 2181
        protocol: tcp
        mode: host
      - target: 2888
        published: 2888
        protocol: tcp
        mode: host
      - target: 3888
        published: 3888
        protocol: tcp
        mode: host
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=${MY_MASTER_HOSTNAME}:2888:3888 server.2=0.0.0.0:2888:3888 server.3=${MY_SLAVE2_HOSTANME}:2888:3888
    volumes:
      - zoo_data:/data
      - zoo_logs:/datalog

  zoo3:
    image: zookeeper
    deploy:
      placement:
        constraints:
          - node.hostname==${MY_SLAVE2_HOSTANME}
    ports:
      - target: 2181
        published: 2181
        protocol: tcp
        mode: host
      - target: 2888
        published: 2888
        protocol: tcp
        mode: host
      - target: 3888
        published: 3888
        protocol: tcp
        mode: host
    environment:
      ZOO_MY_ID: 3
      ZOO_SERVERS: server.1=${MY_MASTER_HOSTNAME}:2888:3888 server.2=${MY_SLAVE1_HOSTANME}:2888:3888 server.3=0.0.0.0:2888:3888
    volumes:
      - zoo_data:/data
      - zoo_logs:/datalog

volumes:
  zoo_data:
    driver: local
  zoo_logs:
    driver: local

Output of docker version:

[zhujipeng@s27 volume]$ sudo docker version
Client:
 Version:	17.12.1-ce
 API version:	1.35
 Go version:	go1.9.4
 Git commit:	7390fc6
 Built:	Tue Feb 27 22:15:20 2018
 OS/Arch:	linux/amd64

Server:
 Engine:
  Version:	17.12.1-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.4
  Git commit:	7390fc6
  Built:	Tue Feb 27 22:17:54 2018
  OS/Arch:	linux/amd64
  Experimental:	false

Output of docker info:

[zhujipeng@s27 volume]$ sudo docker info
Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 3
Server Version: 17.12.1-ce
Storage Driver: devicemapper
 Pool Name: docker-8:1-135990890-pool
 Pool Blocksize: 65.54kB
 Base Device Size: 10.74GB
 Backing Filesystem: xfs
 Udev Sync Supported: true
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 1.028GB
 Data Space Total: 107.4GB
 Data Space Available: 14.28GB
 Metadata Space Used: 1.438MB
 Metadata Space Total: 2.147GB
 Metadata Space Available: 2.146GB
 Thin Pool Minimum Free Space: 10.74GB
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: rgqy6joqk9a0gpof105c9v59l
 Is Manager: true
 ClusterID: 5u5jbd9r1cyn94jf0ci7nq9vc
 Managers: 1
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.26.2.el7.v7.4.qihoo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 29.45GiB
ID: RYT6:PETV:OY2B:QVG7:DPEE:LJUU:NNVX:7PD6:WDGN:LBYF:CXOS:4G3Q
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 registry.my.com:5000
 127.0.0.0/8
Registry Mirrors:
 https://registry.docker-cn.com/
Live Restore Enabled: false

WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
         Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
WARNING: bridge-nf-call-ip6tables is disabled

Additional environment details (AWS, VirtualBox, physical, etc.):

here is the network status

and i find that the container can ping with each other in the machine of s17, s18, s19

but the container can’t ping with each other in the machine of s27, s52, s53

but the container can ping the vip in the machine of s27, s52, s53

but the port can’t be access by the vip in the machine of s27, s52, s53

and my kafka cluster encounter the same problem in the machine of s27, s52, s53

and @thaJeztah reply me that may be because i use a non-standard kernel

the issue in github is here