Cannot get zookeeper to work running in docker using swarm mode

I have been trying to get a zookeeper ensemble (cluster) running, to support a kafka cluster, in a docker swarm created using the swarm mode of the docker daemon (not the legacy open source swarm). The problem I am running into is that though the zookeeper instances can communicate with one another via the client port 2181, they cannot reach one another via the election port of 3888 and cannot form a quorum. This puts them all in a state where they will not accept requests because they know about each other, but cannot elect a leader. This appears to be solely a network routing issue as I will show below. I am hoping that someone knows of a way to get the routing working correctly or will open an issue for the docker swarm mode developers to look into the issue.

OS: Centos 7
Docker Version: 1.13.0-rc6, build 2f2d055
Zookeeper Image: https://hub.docker.com/_/zookeeper/

Configuration
All configuration is sanitized

Three Nodes in docker swarm mode, all managers and all accepting tasks:

docker node ls

ID                                                 HOSTNAME               STATUS  AVAILABILITY  MANAGER STATUS
lw2lxeluwnpxcj2bzx7nrgtkq         0114.company.com     Ready       Active               Reachable
p2bcmz5hssym7t58lbs4f59h0 *  1046.company.com     Ready       Active                Leader
xypuins1laz5y7ejgmn5duj08       0161.company.com     Ready       Active                Reachable

An overlay network for the instances to use:

docker network create --driver overlay my-net

Three Zookeeper services with independent names. This is required so that each of the zookeeper instances can know about the others at creation time:

  docker service create \
  --network my-net \
  --name zookeeper-1046_company_com \
  --mount type=bind,source=/home/docker/data/zookeeper,target=/data \
  --env ZOO_MY_ID=1 \
  --env ZOO_SERVERS="server.1=zookeeper-1046_company_com:2888:3888 server.2=zookeeper-0161_company_com:2888:3888 server.3=zookeeper-0114_company_com:2888:3888" \
  --constraint "node.hostname == 1046.company.com" \
  zookeeper

  docker service create \
  --network my-net \
  --name zookeeper-0161_company_com \
  --mount type=bind,source=/home/docker/data/zookeeper,target=/data \
  --env ZOO_MY_ID=2 \
  --env ZOO_SERVERS="server.1=zookeeper-1046_company_com:2888:3888 server.2=zookeeper-0161_company_com:2888:3888 server.3=zookeeper-0114_company_com:2888:3888" \
  --constraint "node.hostname == 0161.company.com" \
  zookeeper

  docker service create \
  --network my-net \
  --name zookeeper-0114_company_com \
  --mount type=bind,source=/home/docker/data/zookeeper,target=/data \
  --env ZOO_MY_ID=3 \
  --env ZOO_SERVERS="server.1=zookeeper-1046_company_com:2888:3888 server.2=zookeeper-0161_company_com:2888:3888 server.3=zookeeper-0114_company_com:2888:3888" \
  --constraint "node.hostname == 0114.company.com" \
  zookeeper

Which results in the three services running:

docker service ls

ID                       NAME                                            MODE        REPLICAS  IMAGE
7xedpx6pb1mr   zookeeper-1046-_company_com  replicated    1/1              zookeeper:latest
k3z2oykarz63     zookeeper-0114_company_com   replicated    1/1              zookeeper:latest
pgchzhdax5zt     zookeeper-0161_company_com   replicated    1/1              zookeeper:latest

Now, if i go to the log files to see what is going on once they are running, I see the following errors repeated over and over for the different nodes (logs truncated and reformatted for clarity - note the IP Address of zookeeper-0114_company_com):

2017-01-12 23:59:31,921 [myid:1] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer@149] - Resolved hostname: zookeeper-1046_company_com to address: zookeeper-1046_company_com/10.0.0.8
2017-01-12 23:59:31,923 [myid:1] - WARN  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@400] - Cannot open channel to 3 at election address zookeeper-0114_company_com/10.0.0.12:3888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at  java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:381)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:426)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:822)

Looking at the service for zookeeper-0114_company_com the zookeeper software has resolved its address to the Virtual IP of the service:

 docker service inspect --format '{{range .Endpoint.VirtualIPs}}{{.Addr}}{{end}}' zookeeper-0114_company_com | cut -d/ -f1
10.0.0.12

However if we get a shell into the container that backs that service we see the following network configurations:

 docker exec -it zookeeper-0114_company_com.1.tqos8do9fwhttcn97uor4zu7b sh

ifconfig

eth0      Link encap:Ethernet  HWaddr 02:42:0A:00:00:0D
      inet addr:10.0.0.13  Bcast:0.0.0.0  Mask:255.255.255.0
      inet6 addr: fe80::42:aff:fe00:d%32718/64 Scope:Link
      UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
      RX packets:6627 errors:0 dropped:0 overruns:0 frame:0
      TX packets:6627 errors:0 dropped:0 overruns:0 carrier:0
      collisions:0 txqueuelen:0
      RX bytes:376144 (367.3 KiB)  TX bytes:376040 (367.2 KiB)

and

netstat -plutn

Active Internet connections (only servers)
Proto Recv-Q Send-Q  Local Address           Foreign Address         State       PID/Program name
tcp        0             0       127.0.0.11:34001        0.0.0.0:*                   LISTEN      -
tcp        0             0        :::2181                         :::*                           LISTEN      -
tcp        0             0        ::ffff:10.0.0.12:3888     :::*                           LISTEN      -
tcp        0             0        :::35805                       :::*                           LISTEN      -
udp       0             0       127.0.0.11:58991         0.0.0.0:*                                     -

We can see that the zookeeper software is only listening for messages bound for the Virtual IP 10.0.0.12 on port 3888 rather than being like the client port of 2181 which is listening for messages bound for any IP on the machine. Since 10.0.0.12 is only a Virtual IP routed by the swarm, it would seem there will never be any local IP on the machine that matches the packets routed to it (remember eth0 is 10.0.0.13), and therefore it never responds on 3888, probably just dropping the inbound packets and acting like the port is not open. Can anyone think of a way around this issue? I will also contact the Zookeeper docker image managers with this, but I didn’t know if there was anything that could be done at the docker routing layer to help resolve this, or if someone had a creative idea for routing the packets once they were received by the container.

I think issue is with port mappings of a container, using -p 2888:2888 -p 3888:3888 should resolve your issue.

Thank you, arunkollipara, for the suggestion. While I believe that it might work for zookeeper instances run on separate computers each running in a separate container, it ended up not being what was needed to resolve this issue. I finally figured out what the problem was based upon a couple of stack overflow questions. The problem, apparently, is that even though a zookeeper node wants to know about all of the nodes that should be in its ensemble, it has to be able to resolve and contact ALL of them, including itself, to work. To this end, it would seem that it will only try to lookup the hostname for its own ID based on local resolution without going out to a DNS server. Why? I don’t know, it may just have something to do with how linux networking operates. To fix it one of two things has to happen:

  1. The hostname needs to be in the hosts file as an alias to localhost.
  2. You have to use 0.0.0.0 as the host for the server. More concretely, if the ID of the zookeeper that is starting is 1, then the ZOO_SERVERS environment variable has to be “server.1=0.0.0.0:2888:3888 server.2=zookeeper-0161_company_com:2888:3888 server.3=zookeeper-0114_company_com:2888:3888”

So I changed the configuration to be:

docker service create \
--network my-net \
--name zookeeper-1046_company_com \
--mount type=bind,source=/home/docker/data/zookeeper,target=/data \
--env ZOO_MY_ID=1 \
--env ZOO_SERVERS="server.1=0.0.0.0:2888:3888 server.2=zookeeper-0161_company_com:2888:3888 server.3=zookeeper-0114_company_com:2888:3888" \
--constraint "node.hostname == 1046.company.com" \
zookeeper

docker service create \
--network my-net \
--name zookeeper-0161_company_com \
--mount type=bind,source=/home/docker/data/zookeeper,target=/data \
--env ZOO_MY_ID=2 \
--env ZOO_SERVERS="server.1=zookeeper-1046_company_com:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zookeeper-0114_company_com:2888:3888" \
--constraint "node.hostname == 0161.company.com" \
zookeeper

docker service create \
--network my-net \
--name zookeeper-0114_company_com \
--mount type=bind,source=/home/docker/data/zookeeper,target=/data \
--env ZOO_MY_ID=3 \
--env ZOO_SERVERS="server.1=zookeeper-1046_company_com:2888:3888 server.2=zookeeper-0161_company_com:2888:3888 server.3=0.0.0.0:2888:3888" \
--constraint "node.hostname == 0114.company.com" \
zookeeper

Which gives me:

docker service ls    

ID                        NAME                          MODE        REPLICAS  IMAGE
evtli9w4cuh3      zookeeper-0146_company_com  replicated  1/1       zookeeper:latest
mgm3qxxoida8  zookeeper-0114_company_com  replicated  1/1       zookeeper:latest
uxrns860wd8j    zookeeper-0161_company_com  replicated  1/1       zookeeper:latest

And in the logs:

2017-01-18 17:14:20,029 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Leader@952] - Have quorum of supporters, sids: [ 1,3 ]; starting up and setting last processed zxid: 0x700000000
2017-01-18 17:14:39,245 [myid:1] - INFO  [LearnerHandler-/10.0.0.5:42734:LearnerHandler@384] - Synchronizing with Follower sid: 2 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x0
2017-01-18 17:14:39,302 [myid:1] - INFO  [LearnerHandler-/10.0.0.5:42734:LearnerHandler@518] - Received NEWLEADER-ACK message from 2

And getting a shell into one of the nodes:

netstat -tlupn

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.11:44701        0.0.0.0:*               LISTEN      -
tcp        0      0 :::42429                :::*                    LISTEN      -
tcp        0      0 :::2181                 :::*                    LISTEN      -
tcp        0      0 :::2888                 :::*                    LISTEN      -
tcp        0      0 :::3888                 :::*                    LISTEN      -
udp        0      0 127.0.0.11:52263        0.0.0.0:*

Which shows all of our expected ports listening.

Running a check against zookeeper in the same shell gives us:

telnet localhost 2181
stat

Zookeeper version: 3.4.9-1757313, built on 08/23/2016 06:50 GMT
Clients:
 /127.0.0.1:59878[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x700000000
Mode: leader
Node count: 4
Connection closed by foreign host

And against another node:

 telnet zookeeper-0114_company_com 2181
stat

Zookeeper version: 3.4.9-1757313, built on 08/23/2016 06:50 GMT
Clients:
 /10.0.0.3:59118[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x0
Mode: follower
Node count: 4
Connection closed by foreign host

The curious thing here is that this shows 4 nodes in the ensemble, which is one more than there actually are. Perhaps we don’t need the loopback server host in the configuration as all of the zookeeper documentation shows?

I hope that this helps someone else!

2 Likes

It did help, thanks for the explanation on how you solved this problem.

Thanks, it helped me out as well

Thanks, this worked out for me as well :slight_smile:

Thanks for this! The only example I could find on the internet. I am new to swarm so I am looking for a simple plug and play basic example. I had to remove the --constraint, after I figured out what it was for. :slight_smile:

So am I understanding that you are not exposing any ports in this example? What I would like is a complete 3 node swarm that is actually functional. Could someone suggest what else is needed to expose the service? Still new to swarm to know how to set up an entrypoint that load balances to all nodes. If I figure it out I’ll update here.

 docker service create \
--network my-net \
--name zookeeper-1046_company_com \
--mount type=bind,source=/Users/myname/zoo1,target=/data \
--env ZOO_MY_ID=1 \
--env ZOO_SERVERS="server.1=0.0.0.0:2888:3888 server.2=zookeeper-0161_company_com:2888:3888
server.3=zookeeper-0114_company_com:2888:3888" \
zookeeper

docker service create \
--network my-net \
--name zookeeper-0161_company_com \
--mount type=bind,source=/Users/myname/zoo2,target=/data \
--env ZOO_MY_ID=2 \
--env ZOO_SERVERS="server.1=zookeeper-1046_company_com:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zookeeper-0114_company_com:2888:3888" \
zookeeper

docker service create \
--network my-net \
--name zookeeper-0114_company_com \
--mount type=bind,source=/Users/myname/zoo3,target=/data \
--env ZOO_MY_ID=3 \
--env ZOO_SERVERS="server.1=zookeeper-1046_company_com:2888:3888 server.2=zookeeper-0161_company_com:2888:3888 server.3=0.0.0.0:2888:3888" \
zookeeper

stonefury,

So I am using my zookeeper ensemble internal to my swarm for a kafka cluster, that is then utilized by other internal applications fronted by a web app. if you need to expose the zookeeper client port you can add the option --publish 2181:2181 to the first zookeeper, --publish 2182:2181 to the second and 2183:2181 to the third. You will have to do this because, to my knowledge, the swarm will only allow one service to use a port at a time. In any case, a client should have options to configure a list of zookeepers to hit.

If you want to front all three with one port, I think you’ll have to set up a reverse proxy by creating and configuring an nginx or apache service instance in your swarm that will round robin to the different zookeepers in the ensemble.

Hope that helps!

Thanks @cmooney! Truly, thank you for your contribution on this.

No worries! Glad to help.

@cmooney This was super helpful. Would it be possible for you to share your docker service create commands for the Kafka brokers. I’m not clear what the network settings would be for the broker. Would the brokers be in the same overlay network, my-net? Could I get away with not creating an overlay network?

If my Kafka brokers are in my-net how do I expose these brokers to services outside of Docker Swarm? Would it be something similar to what you mentioned with zookeeper?

–publish 2181:2181 to the first zookeeper, --publish 2182:2181 to the second and 2183:2181 to the third

but instead using the kafka broker port: 9092?

@zzztimbo

Sure! here are the cleaned up command lines from our scripting:

  docker service create \
    --network my-net \
    --name kafka-1046_company_com \
    --env KAFKA_LOG_DIRS=/data/LogDir1,/data/LogDir2,/data/LogDir3 \
    --mount type=bind,source=/home/myname/data/kafka/LogDir1,target=/data/LogDir1 \
    --mount type=bind,source=/home/myname/data/kafka/LogDir2,target=/data/LogDir2 \
    --mount type=bind,source=/home/myname/data/kafka/LogDir3,target=/data/LogDir3 \
    --env KAFKA_BROKER_ID=0 \
    --env ZOOKEEPER_CONNECTION_STRING=zookeeper-1046_company_com:2181,zookeeper-0161_company_com:2181,zookeeper-0114_company_com:2181 \
    --env KAFKA_DELETE_TOPIC_ENABLE=true \
    --env KAFKA_MESSAGE_MAX_BYTES=1024000000 \
    --constraint "node.hostname == 1046.company.com" \
    ches/kafka:0.10.2.0

  docker service create \
    --network my-net \
    --name kafka-0161_company_com \
    --env KAFKA_LOG_DIRS=/data/LogDir1,/data/LogDir2,/data/LogDir3 \
    --mount type=bind,source=/home/myname/data/kafka/LogDir1,target=/data/LogDir1 \
    --mount type=bind,source=/home/myname/data/kafka/LogDir2,target=/data/LogDir2 \
    --mount type=bind,source=/home/myname/data/kafka/LogDir3,target=/data/LogDir3 \
    --env KAFKA_BROKER_ID=1 \
    --env ZOOKEEPER_CONNECTION_STRING=zookeeper-1046_company_com:2181,zookeeper-0161_company_com:2181,zookeeper-0114_company_com:2181 \
    --env KAFKA_DELETE_TOPIC_ENABLE=true \
    --env KAFKA_MESSAGE_MAX_BYTES=1024000000 \
    --constraint "node.hostname == 0161.company.com" \
    ches/kafka:0.10.2.0

  docker service create \
    --network my-net \
    --name kafka-0114_company_com \
    --env KAFKA_LOG_DIRS=/data/LogDir1,/data/LogDir2,/data/LogDir3 \
    --mount type=bind,source=/home/myname/data/kafka/LogDir1,target=/data/LogDir1 \
    --mount type=bind,source=/home/myname/data/kafka/LogDir2,target=/data/LogDir2 \
    --mount type=bind,source=/home/myname/data/kafka/LogDir3,target=/data/LogDir3 \
    --env KAFKA_BROKER_ID=2 \
    --env ZOOKEEPER_CONNECTION_STRING=zookeeper-1046_company_com:2181,zookeeper-0161_company_com:2181,zookeeper-0114_company_com:2181 \
    --env KAFKA_DELETE_TOPIC_ENABLE=true \
    --env KAFKA_MESSAGE_MAX_BYTES=1024000000 \
    --constraint "node.hostname == 0114.company.com" \
    ches/kafka:0.10.2.0

As you can see, we use our my-net overlay network again and the hostnames that are the service names in docker for the zookeeper connection string. This keeps all of the communication within the overlay network among your nodes and allows you to automatically secure the communications using the SSL communications for the overlay network and to use your firewalls to further lock things down to only accept communications on the docker swarm ports from your swarm nodes.

If you want to expose your kafka instances to the outside, you are absolutely correct, you could use the --publish flags with the same type of port mapping as zookeeper and the Kafka broker port. A little more specifically: Kafka node one --publish 9092:9092, node two --publish 9093:9092 and finally node three --publish 9094:9092. Again, your clients will need to deal with this set of ports and will be hitting the host domain names or IP addresses to connect.

Hopefully that makes sense. Have fun!

@cmooney when we spin up services the first service gives connectivity issues i.e, it tries to connect other zookeeper services which is not up yet. How are you running services? so that all services in multi container setup works in sync with out any issues…

In my case I am spinning up service1, service2 and service 3. service1 logs shows error in connecting to service2,service3. I had to restart service1 or other services to make it stable.

@sureshskit

Ahhh, that is a little secret that I haven’t revealed yet :slight_smile:

In our scripting, we have a check to make sure the first node has started before starting the others (and then the same check as each secondary comes up). It will get an error that it cannot find the other two, but will hang out looking for them. This is really messy, but I haven’t had a whole lot of reasons to clean it up as it is working…

is_node_up() {
  node=$1
  #Example output for docker service ps | grep
  #4bdxikbn90w3g23xb4wjvuzjr  cassandra-tdkc-0161_tdkc_com.1  $dockerRegistryHost/icee_test/cassandra  tdkc-0161.tdkc.com  Running        Running 6 seconds ago
  #awk '{ print $6 }' gives 'Running' from the phrase:   Running 6 seconds ago
  filter_value=$(docker service ps --filter desired-state=running $node | grep $node | awk '{ print $6 }')
  if [[ ! "$filter_value" =~ Running ]]; then
    echo Node $node is still trying to start
    return 1
  else
    echo Node $node has started!
  fi
  return 0
}

docker service create \
--network my-net \
--name zookeeper-1046_company_com \
--mount type=bind,source=/Users/myname/zoo1,target=/data \
--env ZOO_MY_ID=1 \
--env ZOO_SERVERS="server.1=0.0.0.0:2888:3888 server.2=zookeeper-0161_company_com:2888:3888
server.3=zookeeper-0114_company_com:2888:3888" \
zookeeper

sleep_time=0
until is_node_up zookeeper-1046_company_com; do
  echo ...Waiting...
  sleep 10s
  sleep_time=$((sleep_time+10))
  #Give the container up to one minute to finish
  if [[ sleep_time -gt 60 ]]; then
      echo Head zookeeper failed to finish within one minute, stopping process
      exit 1
  fi
done

#Start the next zookeeper

#Use the same wait code as above for it to start

#Repeat for each ensemble node

I hope this helps you get it working. I implemented this when I noticed that the nodes had to calm down to form a quorum otherwise they never seemed to want to sync up.

Thanks for the detail steps for how to setup ZooKeeper with the local volume and constraint!

For the 0.0.0.0 local zoo server address, it is because the container itself is not aware of the node’s hostname. Setting to 0.0.0.0 works. If you could customize the docker-entrypoint.sh, could append the hostname to the last line in /etc/hosts, then would not need to change to 0.0.0.0.

Could also consider to use the volume plugin, such as REX-Ray. This could avoid the constraint that binds the service to one node only. When the node fails, the container could move to another node and still attach the original volume.

Another open source project is FireCamp. It automates the setup and management for ZooKeeper, Kafka, and other data services.

1 Like

Excellent information @junius! I see some excellent opportunities to upgrade what we are doing and take some of the scripted management I have been doing and offload it to these technologies. Thank you very much!

My pleasure. Glad it helps.

I am having the exact same problem yet I can’t seem to get the zookeeper cluster to work even with the 0.0.0.0 fix.

I tried launching it with a docker-compose and a docker stack deploy as well as using the CLI approach to no avail as I still have the connection refused error popping up in the zookeeper logs.

Meanwhile I tried running a zookeeper cluster without Docker and it worked flawlessly.

One odd thing I have noticed is that there is no listener on the 2888 port in my zookeeper containers, so I tried to publish the 2888 and 3888 ports and I am now faced with this after the election takes place:

2017-10-10 15:31:36,335 [myid:2] - WARN  [QuorumPeer[myid=2]/0.0.0.0:2181:QuorumPeer@953] - Unexpected exception
test_zoo2.1.racljq8vt2we@G6781    | java.lang.InterruptedException: Timeout while waiting for epoch from quorum
test_zoo2.1.racljq8vt2we@G6781    | 	at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:896)
test_zoo2.1.racljq8vt2we@G6781    | 	at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:389)
test_zoo2.1.racljq8vt2we@G6781    | 	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:950)

but I shouldn’t have had to publish these ports in the first place, so I don’t really know if this is related or not.

Any chance you can provide a cleaned up version of your docker-compose file? I haven’t had a chance to play with stack deployment yet, but there could be issues with it trying to bring them all up at once. We use shell scripting to bring them up one container at a time so at one has a head start on the other two, acting as a lead.

Sure, here it is:

version: '3.2'

services:
  zoo1:
    image: 31z4/zookeeper
    hostname: zoo1
    ports:
      - 2181:2181
      - 2888:2888
      - 3888:3888
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=<serv2IP>:2888:3888
    deploy:
      placement:
        constraints:
          - node.hostname == <hostnameofserv1>
  zoo2:
    image: 31z4/zookeeper
    hostname: zoo2
    ports:
      - 2182:2181
      - 2889:2888
      - 3889:3888
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=<serv1IP>:2888:3888 server.2=0.0.0.0:2888:3888
    deploy:
      placement:
        constraints:
          - node.hostname == <hostnameofserv2>

After setting up your swarm you can run it with:

docker stack deploy -c docker-compose.yml <nameOfStack>

I actually will be using another docker-compose file that is based on a docker image that I built with a Dockerfile and some scripts (since my network will be air gapped) but it takes too much time to test so I am currently trying to get this basic docker-compose file to work first as they share the same idea.

Here are the netstat in both my zookeeper containers:
Leader

root@e6fcb0cf72d7:/opt/zookeeper-3.4.9# netstat -plutna
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      8/sshd          
tcp        0      0 0.0.0.0:46207           0.0.0.0:*               LISTEN      20/java         
tcp        0      0 0.0.0.0:2181            0.0.0.0:*               LISTEN      20/java         
tcp        0      0 127.0.0.11:42473        0.0.0.0:*               LISTEN      -               
tcp        0      0 0.0.0.0:3888            0.0.0.0:*               LISTEN      20/java         
tcp        0      0 10.255.0.7:3888         10.255.0.2:32840        ESTABLISHED 20/java         
tcp6       0      0 :::22                   :::*                    LISTEN      8/sshd          
udp        0      0 127.0.0.11:34792        0.0.0.0:*  

Follower

root@c9f89679eada:/opt/zookeeper-3.4.9# netstat -plutna
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:2181            0.0.0.0:*               LISTEN      20/java         
tcp        0      0 0.0.0.0:2888            0.0.0.0:*               LISTEN      20/java         
tcp        0      0 0.0.0.0:44364           0.0.0.0:*               LISTEN      20/java         
tcp        0      0 0.0.0.0:3888            0.0.0.0:*               LISTEN      20/java         
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      8/sshd          
tcp        0      0 127.0.0.11:36758        0.0.0.0:*               LISTEN      -               
tcp        0      0 172.18.0.3:32840        10.92.68.45:3888        ESTABLISHED 20/java         
tcp6       0      0 :::22                   :::*                    LISTEN      8/sshd  

10.92.68.45 is the IP of the leader so it looks like there is some sort of an established connection between the two zookeeper instances.