I have been trying to get a zookeeper ensemble (cluster) running, to support a kafka cluster, in a docker swarm created using the swarm mode of the docker daemon (not the legacy open source swarm). The problem I am running into is that though the zookeeper instances can communicate with one another via the client port 2181, they cannot reach one another via the election port of 3888 and cannot form a quorum. This puts them all in a state where they will not accept requests because they know about each other, but cannot elect a leader. This appears to be solely a network routing issue as I will show below. I am hoping that someone knows of a way to get the routing working correctly or will open an issue for the docker swarm mode developers to look into the issue.
OS: Centos 7
Docker Version: 1.13.0-rc6, build 2f2d055
Zookeeper Image: https://hub.docker.com/_/zookeeper/
Configuration
All configuration is sanitized
Three Nodes in docker swarm mode, all managers and all accepting tasks:
docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
lw2lxeluwnpxcj2bzx7nrgtkq 0114.company.com Ready Active Reachable
p2bcmz5hssym7t58lbs4f59h0 * 1046.company.com Ready Active Leader
xypuins1laz5y7ejgmn5duj08 0161.company.com Ready Active Reachable
An overlay network for the instances to use:
docker network create --driver overlay my-net
Three Zookeeper services with independent names. This is required so that each of the zookeeper instances can know about the others at creation time:
docker service create \
--network my-net \
--name zookeeper-1046_company_com \
--mount type=bind,source=/home/docker/data/zookeeper,target=/data \
--env ZOO_MY_ID=1 \
--env ZOO_SERVERS="server.1=zookeeper-1046_company_com:2888:3888 server.2=zookeeper-0161_company_com:2888:3888 server.3=zookeeper-0114_company_com:2888:3888" \
--constraint "node.hostname == 1046.company.com" \
zookeeper
docker service create \
--network my-net \
--name zookeeper-0161_company_com \
--mount type=bind,source=/home/docker/data/zookeeper,target=/data \
--env ZOO_MY_ID=2 \
--env ZOO_SERVERS="server.1=zookeeper-1046_company_com:2888:3888 server.2=zookeeper-0161_company_com:2888:3888 server.3=zookeeper-0114_company_com:2888:3888" \
--constraint "node.hostname == 0161.company.com" \
zookeeper
docker service create \
--network my-net \
--name zookeeper-0114_company_com \
--mount type=bind,source=/home/docker/data/zookeeper,target=/data \
--env ZOO_MY_ID=3 \
--env ZOO_SERVERS="server.1=zookeeper-1046_company_com:2888:3888 server.2=zookeeper-0161_company_com:2888:3888 server.3=zookeeper-0114_company_com:2888:3888" \
--constraint "node.hostname == 0114.company.com" \
zookeeper
Which results in the three services running:
docker service ls
ID NAME MODE REPLICAS IMAGE
7xedpx6pb1mr zookeeper-1046-_company_com replicated 1/1 zookeeper:latest
k3z2oykarz63 zookeeper-0114_company_com replicated 1/1 zookeeper:latest
pgchzhdax5zt zookeeper-0161_company_com replicated 1/1 zookeeper:latest
Now, if i go to the log files to see what is going on once they are running, I see the following errors repeated over and over for the different nodes (logs truncated and reformatted for clarity - note the IP Address of zookeeper-0114_company_com):
2017-01-12 23:59:31,921 [myid:1] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer@149] - Resolved hostname: zookeeper-1046_company_com to address: zookeeper-1046_company_com/10.0.0.8
2017-01-12 23:59:31,923 [myid:1] - WARN [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@400] - Cannot open channel to 3 at election address zookeeper-0114_company_com/10.0.0.12:3888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:381)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:426)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:822)
Looking at the service for zookeeper-0114_company_com the zookeeper software has resolved its address to the Virtual IP of the service:
docker service inspect --format '{{range .Endpoint.VirtualIPs}}{{.Addr}}{{end}}' zookeeper-0114_company_com | cut -d/ -f1
10.0.0.12
However if we get a shell into the container that backs that service we see the following network configurations:
docker exec -it zookeeper-0114_company_com.1.tqos8do9fwhttcn97uor4zu7b sh
ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:0A:00:00:0D
inet addr:10.0.0.13 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::42:aff:fe00:d%32718/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:6627 errors:0 dropped:0 overruns:0 frame:0
TX packets:6627 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:376144 (367.3 KiB) TX bytes:376040 (367.2 KiB)
and
netstat -plutn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.11:34001 0.0.0.0:* LISTEN -
tcp 0 0 :::2181 :::* LISTEN -
tcp 0 0 ::ffff:10.0.0.12:3888 :::* LISTEN -
tcp 0 0 :::35805 :::* LISTEN -
udp 0 0 127.0.0.11:58991 0.0.0.0:* -
We can see that the zookeeper software is only listening for messages bound for the Virtual IP 10.0.0.12 on port 3888 rather than being like the client port of 2181 which is listening for messages bound for any IP on the machine. Since 10.0.0.12 is only a Virtual IP routed by the swarm, it would seem there will never be any local IP on the machine that matches the packets routed to it (remember eth0 is 10.0.0.13), and therefore it never responds on 3888, probably just dropping the inbound packets and acting like the port is not open. Can anyone think of a way around this issue? I will also contact the Zookeeper docker image managers with this, but I didn’t know if there was anything that could be done at the docker routing layer to help resolve this, or if someone had a creative idea for routing the packets once they were received by the container.