Cannot get zookeeper to work running in docker using swarm mode

The zoo2 ports looks not correct? zoo2 has port 2182, 2889, 3889?

ports:
      - 2182:2181
      - 2889:2888
      - 3889:3888

That is true, I originally did that as a workaround to be able to launch the docker-compose file with a stack deploy but it turns out that what I had to do was to use the long syntax for the ports (Overview | Docker Docs) in my docker-compose file and specify the mode: host (I think it defaulted to ingress) so now with a docker-compose file looking like this

version: '3.2'
services:
  zoo1:
    image: 31z4/zookeeper
    hostname: zoo1
    ports:
      - target: 2181
        published: 2181
        protocol: tcp
        mode: host
      - target: 2888
        published: 2888
        protocol: tcp
        mode: host
      - target: 3888
        published: 3888
        protocol: tcp
        mode: host
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=<serv2>:2888:3888
    deploy:
      placement:
        constraints:
          - node.hostname == <hostname>
  zoo2:
    image: 31z4/zookeeper
    hostname: zoo2
    ports:
      - target: 2181
        published: 2181
        protocol: tcp
        mode: host
      - target: 2888
        published: 2888
        protocol: tcp
        mode: host
      - target: 3888
        published: 3888
        protocol: tcp
        mode: host
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=<serv1>:2888:3888 server.2=0.0.0.0:2888:3888
    deploy:
      placement:
        constraints:
          - node.hostname == <hostname>

it is finally working :slight_smile: !

I was going to suggest that instead of using 0.0.0.0 in your ZOO_SERVERS, add --network-alias zookeeper-1046_company_com (etc…) to your docker service create commands, but it seems that is a missing service command option.

Yep, docker swarm has quite a few little idiosyncrasies like that! :slight_smile:

where are we mentioning the overlay network that is being created above as my-net ?

I believe you would have to do it like this

version: '3.2'
services:
  zoo1:
    image: 31z4/zookeeper
    hostname: zoo1
    ports:
      - target: 2181
        published: 2181
        protocol: tcp
        mode: host
      - target: 2888
        published: 2888
        protocol: tcp
        mode: host
      - target: 3888
        published: 3888
        protocol: tcp
        mode: host
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=<serv2>:2888:3888
    deploy:
      placement:
        constraints:
          - node.hostname == <hostname>
    networks:
      - <network>
  zoo2:
    image: 31z4/zookeeper
    hostname: zoo2
    ports:
      - target: 2181
        published: 2181
        protocol: tcp
        mode: host
      - target: 2888
        published: 2888
        protocol: tcp
        mode: host
      - target: 3888
        published: 3888
        protocol: tcp
        mode: host
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=<serv1>:2888:3888 server.2=0.0.0.0:2888:3888
    deploy:
      placement:
        constraints:
          - node.hostname == <hostname>
    networks:
      - <network>
networks:
  <network>:
    external: true

In this way will the dockers containers be created in the respective hostname mentioned …or just the services. …? as my requirement is to deploy three zookeper containers in 3 different hosts with a single yml.

There will be 2 services running in my example: zoo1 and zoo2.
Each service will start a docker container with zookeeper on the node specified in the docker-compose file. (in deploy: placement: constraints: - node.hostname == <hostname>)

You can check that the services are running with a docker service ls on the node that manages the swarm and on each node you can see that containers are running with a docker ps

The yml is throwing an error saying

ERROR: In file ‘./zookeper-swarm.yml’, network ‘external’ must be a mapping not a boolean.

Did you replace the <network> tags with the name you want the network to have?

This was my YML and while doing the docker-compse -f zookeeper.yml up & I am getting the following errors

version: ‘3.1’

services:
zoo1:
image: manojgeocloud/testzookeeper:latest
hostname: zoo1
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- zookeeper-net
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=qa-node7:2888:3888
deploy:
mode: replicated
replicas: 2
placement:
constraints:
- node.hostname == qa-node16
# - node.role == manager
zoo2:
image: manojgeocloud/testzookeeper:latest
hostname: zoo2
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- zookeper-net
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=qa-node16:2888:3888 server.2=0.0.0.0:2888:3888
deploy:
placement:
constraints:
- node.hostname == qa-node7
# - node.role == worker
networks:
zookeeper-net:

services.zoo1.deploy value Additional properties are not allowed (‘constraints’ was unexpected)
services.zoo1.deploy.placement contains an invalid type, it should be an object
services.zoo1.ports is invalid: Invalid port “{‘target’: 2181, ‘protocol’: ‘tcp’, ‘mode’: ‘host’, ‘published’: 2181}”, should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.zoo2.ports is invalid: Invalid port “{‘target’: 2181, ‘protocol’: ‘tcp’, ‘mode’: ‘host’, ‘published’: 2181}”, should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.zoo1.ports contains an invalid type, it should be a string, or a number
services.zoo2.ports contains an invalid type, it should be a string, or a number
services.zoo1.ports is invalid: Invalid port “{‘target’: 2888, ‘protocol’: ‘tcp’, ‘mode’: ‘host’, ‘published’: 2888}”, should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.zoo2.ports is invalid: Invalid port “{‘target’: 2888, ‘protocol’: ‘tcp’, ‘mode’: ‘host’, ‘published’: 2888}”, should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.zoo1.ports contains an invalid type, it should be a string, or a number
services.zoo2.ports contains an invalid type, it should be a string, or a number
services.zoo1.ports is invalid: Invalid port “{‘target’: 3888, ‘protocol’: ‘tcp’, ‘mode’: ‘host’, ‘published’: 3888}”, should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.zoo2.ports is invalid: Invalid port “{‘target’: 3888, ‘protocol’: ‘tcp’, ‘mode’: ‘host’, ‘published’: 3888}”, should be [[remote_ip:]remote_port[-remote_port]:]port[/protocol]
services.zoo1.ports contains an invalid type, it should be a string, or a number
services.zoo2.ports contains an invalid type, it should be a string, or a number

for the overlay network you created, you could not mix the overlay network with node’s hostname. The overlay network is not compatible with host mode, is not aware of the node’s hostname. One option is to follow the compose file that hwki used. See ref below.

If you are going to use your own overlay network, you should set zk server to like ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888. On this mode, you could not access zookeeper via the host ip and port. The application needs to attach to the same overlay network, and access zookeeper by zoo1.

Thanks for that junis and hwki, and I have a boubt while I was trying to telnet the 2 deployed machines using swarm i was able to telenet only the manager on port 2181(zookeper pert) but not able to telnet on port worker 2182(zookeper port)

Hi All,
I have been trying out deploying Zookeeper cluster in docker swarm mode.

I have deployed 3 machines connected to docker swarm network. My requirement is to, try running 3 Zookeeper instance on each of those nodes, which forms ensemble.
Have gone through this thread, got few insights on how to deploy Zookeeper in docker swarm.

As @junius suggested, I have created the docker compose file.
I have removed the constraints as the docker swarm ignores it. Refer Docker swarm constraints being ignored

My Zookeeper docker compose file looks like this

version: '3.3'

services:
    zoo1:
        image: zookeeper:3.4.12
        hostname: zoo1
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 1
            ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /etc/localtime:/etc/localtime:ro
    zoo2:
        image: zookeeper:3.4.12
        hostname: zoo2
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 2
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /etc/localtime:/etc/localtime:ro
    zoo3:
        image: zookeeper:3.4.12
        hostname: zoo3
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 3
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /etc/localtime:/etc/localtime:ro
networks:
    net:

Deployed using docker stack command.

docker stack deploy -c zoo3.yml zk
Creating network zk_net
Creating service zk_zoo3
Creating service zk_zoo1
Creating service zk_zoo2

Zookeeper services comes up fine, each in each node without any issues.

docker stack services zk
ID NAME MODE REPLICAS IMAGE PORTS
rn7t5f3tu0r4 zk_zoo1 replicated 1/1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp, 0.0.0.0:2888->2888/tcp, 0.0.0.0:3888->3888/tcp
u51r7bjwwm03 zk_zoo2 replicated 1/1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp, 0.0.0.0:2888->2888/tcp, 0.0.0.0:3888->3888/tcp
zlbcocid57xz zk_zoo3 replicated 1/1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp, 0.0.0.0:2888->2888/tcp, 0.0.0.0:3888->3888/tcp

I have reproduced this issue discussed here, when i stop and started the zookeeper stack again.

docker stack rm zk
docker stack deploy -c zoo3.yml zk

This time zookeeper cluster doesn’t form. The docker instance logged the following

ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
2018-11-02 15:24:41,531 [myid:2] - WARN  [WorkerSender[myid=2]:QuorumCnxManager@584] - Cannot open channel to 1 at election address zoo1/10.0.0.4:3888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435)
        at java.lang.Thread.run(Thread.java:748)
2018-11-02 15:24:41,538 [myid:2] - WARN  [WorkerSender[myid=2]:QuorumCnxManager@584] - Cannot open channel to 3 at election address zoo3/10.0.0.2:3888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435)
        at java.lang.Thread.run(Thread.java:748)
2018-11-02 15:38:19,146 [myid:2] - WARN  [QuorumPeer[myid=2]/0.0.0.0:2181:Learner@237] - Unexpected exception, tries=1, connecting to /0.0.0.0:2888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)
2018-11-02 15:38:20,147 [myid:2] - WARN  [QuorumPeer[myid=2]/0.0.0.0:2181:Learner@237] - Unexpected exception, tries=2, connecting to /0.0.0.0:2888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)

On close observation found, that first time when i deploy this stack, ZooKeeper instance with id: 2 running on node 1. this created a myid file with value 2.

cat /home/zk/data/myid
2

When i stopped and started the stack again, I found this time, ZooKeeper instance with id: 3 running on node 1.

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
566b68c11c8b zookeeper:3.4.12 “/docker-entrypoin…” 6 minutes ago Up 6 minutes 0.0.0.0:2181->2181/tcp, 0.0.0.0:2888->2888/tcp, 0.0.0.0:3888->3888/tcp zk_zoo3.1.7m0hq684pkmyrm09zmictc5bm

But the myid file still have the value 2, which was set by the earlier instance.

Because of which the log shows [myid:2] and it tries to connect to instances with id 1 and 3 and fails.

On further debugging found that the docker-entrypoint.sh file contains the following code

# Write myid only if it doesn't exist
if [[ ! -f "$ZOO_DATA_DIR/myid" ]]; then
    echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"
fi

This is causing the issue for me. I have edited the docker-entrypoint.sh with the following,

if [[ -f "$ZOO_DATA_DIR/myid" ]]; then
    rm "$ZOO_DATA_DIR/myid"
fi

echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"

And mounted the docker-entrypoint.sh in my compose file.

With this fix, I am able to stop and start my stack multiple times and every time my zookeeper cluster is able to form ensemble without hitting the connect issue.

My docker-entrypoint.sh file as follows

#!/bin/bash

set -e

# Allow the container to be started with `--user`
if [[ "$1" = 'zkServer.sh' && "$(id -u)" = '0' ]]; then
    chown -R "$ZOO_USER" "$ZOO_DATA_DIR" "$ZOO_DATA_LOG_DIR"
    exec su-exec "$ZOO_USER" "$0" "$@"
fi

# Generate the config only if it doesn't exist
if [[ ! -f "$ZOO_CONF_DIR/zoo.cfg" ]]; then
    CONFIG="$ZOO_CONF_DIR/zoo.cfg"

    echo "clientPort=$ZOO_PORT" >> "$CONFIG"
    echo "dataDir=$ZOO_DATA_DIR" >> "$CONFIG"
    echo "dataLogDir=$ZOO_DATA_LOG_DIR" >> "$CONFIG"

    echo "tickTime=$ZOO_TICK_TIME" >> "$CONFIG"
    echo "initLimit=$ZOO_INIT_LIMIT" >> "$CONFIG"
    echo "syncLimit=$ZOO_SYNC_LIMIT" >> "$CONFIG"

    echo "maxClientCnxns=$ZOO_MAX_CLIENT_CNXNS" >> "$CONFIG"

    for server in $ZOO_SERVERS; do
        echo "$server" >> "$CONFIG"
    done
fi

if [[ -f "$ZOO_DATA_DIR/myid" ]]; then
    rm "$ZOO_DATA_DIR/myid"
fi

echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"

exec "$@"

My docker compose file as follows

version: '3.3'

services:
    zoo1:
        image: zookeeper:3.4.12
        hostname: zoo1
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 1
            ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
			- /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
            - /etc/localtime:/etc/localtime:ro
    zoo2:
        image: zookeeper:3.4.12
        hostname: zoo2
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 2
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
			- /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
            - /etc/localtime:/etc/localtime:ro
    zoo3:
        image: zookeeper:3.4.12
        hostname: zoo3
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 3
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
			- /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
            - /etc/localtime:/etc/localtime:ro
networks:
    net:

With this I am able to get zookeeper instance up and running in docker using swarm mode, without hard coding any hostname in the compose file. If one of my node goes down, services are started on any available node on swarm, without any issues.

Thanks