Easy Elasticsearch cluster with docker 1.12 swarm?

I am looking for a stable clustering solution for ElasticSearch on multiple servers. I seem to be running around in circles to find the best solution:

Option 1: Create a docker service with v1.12 swarm

I can run multiple instances of Elastic, but I can’t automatically set the Elastic master IP. Furthermore I can’t separate the data from the service, so when containers are upgraded, all Elastic data is lost. And syncing of GBs of data takes a long time.

Option 2: Create regular docker containers with overlay network

I can manually start an instance on every server, manually determine the master IP. But I need to deal with the overlay network, which seems to need a separate key-value store to run. So I must deal with docker swarm and --cluster-store and --cluster-advertise parameters. Why can’t this just work like “service create” and the overlay network is automatically available on every node?

Option 3: Create regular docker containers with VPN network

I can create a basic VPN network and I manually start an instance on every server, listening on the VPN IP. Also messy for the manual configuration, but I don’t need extra services like consul and special parameters.

So what’s my best option from your experience?

3 Likes

I’m currently in the process of figuring out ES on top of docker swarm. I’ve been thinking about this for a while and thought I might as well run my current thinking by the community. So, apologies for what will be a longish post. The context is that we are using elasticsearch in a way that requires it to stay available 24x7. We want to be able to regularly deploy new versions.

All the examples I’ve seen on this topic are pretty much unsuitable for production usage and leave a lot of non trivial stuff as an exercise to the reader. One thing I’m struggling with particularly, which I think is actually a deal breaker for running ES well on any kind of autoscaling cloud environment, is doing rolling updates in a responsible way (without the es cluster going red).

The problem here is that (re)-starting an elasticsearch node and an elasticsearch cluster being fully available are two things. During a rolling restart, simply restarting nodes one by one will kill your cluster and make it unavailable and potentially may cause data loss as well. The last thing you want here is something like docker swarm blindly restarting nodes without checking anything in between. This is basically true for most clustered software btw.

The pattern with a rolling restart in ES is simple: 1) bring an instance down, wait for the cluster to stabilize (can take a long time), 2) restart the node 3) wait for the cluster to stabilize again 4) proceed to the next node. The problem is that waiting can take long, hours if you have lots of data. Also, any of these steps can fail in which case you may need to take manual action. If you get it wrong your es cluster goes red, which means data is unavailable, indexing is impossible, and you may lose data when you try to bring the same nodes back up or it may not recover at all without manual intervention. Red is a really bad state to be in for an ES cluster.

Ideally you go from a green state to a yellow state and then back to a green state as you shut down nodes. Yellow means all data is available but not fully replicated because some nodes are down. Shutting down more nodes when the cluster is yellow is a really bad idea. Es clusters try to rebalance when they go yellow and go green once that process completes. This takes time. You can turn rebalancing off and probably should during a rolling restart and turn it back on afterwards. Rebalancing also happens when new nodes get added. Finally when nodes restart they need to check the data they have before they can rejoin the cluster properly. All this takes a lot of time. So, orchestrating this requires polling the cluster for its current state.

So, in short, relying on docker swarm rolling restarts is basically a really bad idea for elasticsearch until there is a way to have some sanity checks in between node restarts. Simply waiting X minutes/hours/days is not good enough.

Luckily, not all elasticsearch nodes are equal you have master nodes, data nodes and query nodes (and with the upcoming version ingest nodes). Out of the box each node can fulfill all roles but you can choose to specialize elasticsearch nodes to be just e.g. a master or just a query node. The master nodes are similar to swarm manager nodes in the sense that they manage es cluster state. Just like with swarm you want to isolate these nodes on docker hosts that are isolated from the other nodes that do all the heavy lifting. The worst thing that can happen in ES is master nodes being unable to talk to each other due to heavy garbage collecting resulting from e.g. data or querying spikes.

So, my plan: use docker labels to mark three different instance types: esmaster 3x, esdata >=2x, esquery >=2x. I’m actually considering to put the es masters on the swarm masters since both are sort of light on memory/cpu requirements and can probably be co-hosted safely to save some cost. Esdata nodes will need a docker volume to store stuff and a bit of CPU but should be pretty OK on memory. Esquery is doing all of the query heavy lifting so it will need loads memory and cpu.

Then I can manage the master and query nodes as two docker services and do rolling restarts pretty easily for those since both are stateless. I’d probably space their restarts a minute or two apart for safety but restart action on those should not trigger rebalancing. We can use labels to ensure the containers end up on the right docker hosts and any non es containers can simply talk to the esquery service and rely on swarm loadbalancing.

However, for internal cluster communication this wont work. The problem is that each es container needs to know the host or ip of at least one other node in the cluster to discover each other. Typically what you do there is list the three (or five) master node ips explicitly on each node in the cluster as the cluster addresses; they sort of discover any other nodes once they connect to any one of the masters.

In swarm, we need the master nodes to know about each other explicitly. However, the other es nodes could just talk to the esmaster service and rely on swarm load balancing to find a esmaster to talk to. However, the master containers need a way of finding out what the other master nodes are in the swarm. Somebody will probably write a elasticsearch plugin for this at some point but for now I don’t see any other way than somehow hardcoding this when firing up the master docker service. This should be fairly easy to script using the docker API or a docker node command with a filter on the esmaster label.

I don’t want to use docker services for managing the esdata nodes though since every rolling restart of those will need to be coordinated and monitored (this is where all the cluster rebalancing will happen). So the easiest is to simply start elasticsearch containers with docker run one by one explicitly tell each container what node to run on. That way I can restart them one by one in a controlled way.

Finally I’d like to use docker networks to ensure I can control access. Elasticsearch uses two ports: one for internal cluster traffic and one for its REST API. The latter is what needs to be exposed to other containers in the swarm but only on esquery nodes. Additionally, there is no need for anything to talk to Elasticsearch from outside the swarm. So, I’m thinking that only certain nodes in the swarm would have outside connectivity and none of those should be running elasticsearch. Furthermore, I want to use to docker networks (escluster, esquery) where anything that needs access to the esquery service in the swarm needs to be in esquery and all es containers need to be in the escluster network.

I’d love some feedback on any aspects on this. I’m relatively new to swarm but we have been using docker for our stateless services for a while.

I’ve spent the past day trying to make swarm run elasticsearch and my conclusion is unfortunately that currently is impossible because of several issues that I’ll outline below.

First, this is a way to do it with docker run (without swarm)

# given a few virtualbox machines with docker 1.12 and a nw interface called enp0s8 that is exposed with --network host
# on each node run:
docker run -d --network host -p 9200:9200 -p 9300:9300 elasticsearch -Des.network.host=_enp0s8_ -Des.discovery.zen.ping.unicast.hosts=192.168.77.21,192.168.77.22,192.168.77.23

# check if there is a healthy cluster with the expected number of nodes
curl 192.168.77.21:9200/_clust/health

This docker run does several things. It tells es to use the enp0s8 network interface to bind to. This is crucial because that way the containers can actually talk to each other. It tells each node that it should try to connect to the host ips to find the es master nodes.

With swarm you run into several obstacles:

  1. any attempt to set es.network.host to a network interface that is available in the container results in elasticsearch trying to bind itself to x.x.x.2, which is the ip swarm uses to loadbalance the swarm. This basically does not work and fails with a network unreachable exception. Also --network host is not supported in swarm so you can’t actually bind to the node ip like I did above with docker run.
  2. elasticsearch provides an alternative where you specify the to a public ip address where others can talk to it (the es.network.publish_host property). Typically you want this to be the docker host ip. However, this is of course different on each node in the swarm. There is no documented way of finding out this ip address from inside the running container or passing this via the docker service create command. There are open issues related to this for either adding dockerhost to the /etc/hosts with the correct ip or supporting some kind of variable substitution. Both have lots of upvotes and as of yet no agreed solution.
  3. elasticsearch also needs to know the ip addresses of the elasticsearch cluster masters (typically 3 or 5 in a production setup). Likewise, you want to use the public ip addresses of the corresponding docker hosts. In a swarm you have little control over this.
  4. Overlay networking could solves this theoretically in the sense that you can assign a subnet. However, then you next problem becomes figuring out ip address assigned to each container so that you can set es.network.publish_host to something sensible. Hint, you can’t. Also, you have no way of figuring out the correct ip addresses assigned to the other containers.
  5. For the latter it would be nice if the built in dns was able to resolve not only myservice (if you set the --name) but also myservice.1, myservice.2, etc. This doesn’t work unfortunately. it does allocate myservice.1.containerid but of course you have no way of knowing the container id upfront so these domain names are useless at the time you are doing the docker service create.
  6. There seems no way to e.g. expose /etc/hosts on the host machine inside the container. In short, you have no way of somehow hard coding sensible names for ip addresses on the host and then have those names actually resolve to something inside the container.

Update. I’ve found a really ugly way that sort of works. This works around several issues and relies on launching three services with one node each. This works around several issues:

  • es transport protocol really doesn’t like load balancing, so each service has to be 1 node and 1 dedicated port
  • unicast list of nodes to connect to have to resolve at start time. So the first node can’t refer the other two nodes because they don’t exist yet and therefore won’t resolve. It will exit with an error for any entry that does not resolve at launch time.
  • internal transport port has to match the swarm one. So, -p 9301:9300 is not good enough: it has to be the port that is advertised by es. Hence the -Des.transport.tcp.port=9301
docker network create es -d overlay

docker service create --name esm1 --network es \
  -p 9201:9200 -p 9301:9301 \
   elasticsearch -Des.network.publish_host=_eth0_ \
  -Des.discovery.zen.ping.unicast.hosts=esm1:9301 \
  -Des.node.master=true -Des.node.data=false \
  -Des.discovery.zen.minimum_master_nodes=2 \
  -Des.transport.tcp.port=9301

docker service create --name esm2 --network es \
  -p 9202:9200 -p 9302:9302 \
  elasticsearch -Des.network.publish_host=_eth0_ \
  -Des.discovery.zen.ping.unicast.hosts=esm1:9301 \
  -Des.node.master=true -Des.node.data=false \
  -Des.discovery.zen.minimum_master_nodes=2 \
  -Des.transport.tcp.port=9302


docker service create --name esm3 --network es \
  -p 9203:9200 -p 9303:9303 \
  elasticsearch -Des.network.publish_host=_eth0_ \
  -Des.discovery.zen.ping.unicast.hosts=esm1:9301,esm2:9302 \
  -Des.node.master=true -Des.node.data=false \
  -Des.discovery.zen.minimum_master_nodes=2 \
  -Des.transport.tcp.port=9303

With the stuff above you get a three node cluster in swarm with swarm usable dns names. This is good enough to be able to use swarm service discovery and has the added advantage that you can do a rolling restart simply by updating the 3 services one by one.

Also check out Christian Kniep’s way of running consul and es in swarm. This sort of bypasses swarm by not using it for unicast and using consul there instead. I’ve seen it demoed and it sort of works.

5 Likes

Good work jillesvangurp. Thank you very much.
Please, keep us updated if you find a better solution in the future.
Cheers

jillesvangurp, I think it is possible to simplify your solution. I did build a swarm cluster of elastic search, and it seems to function fine. Here is the outline of the solution:

  1. Suppose the name for the intended ES cluster is called ELASTIC. Assuming the size of the cluster is N.
  2. I first create a node (docker service create) with a name called SEED-ELASTIC. This node (and its corresponding VIP) will be configured as the unicast seed for (other) nodes discoveries.
  3. Next create services of ELASTIC with replicas option of N-1, with the unicast discovery set to forementioned SEED-ELASTIC. Then all these N nodes will form the intended cluster.
  4. Within each container, I had to add an extra twist: to use shell script to discover its publishing address (not the VIP), and configured the item: network.publish_host. This step is crucial for cluster nodes discovery.

The next question is how to do rolling-restart. My answer is via node labeling and filtering. Within a swarm, I generally confine service deployments to a predefined set of nodes by using constraints (e.g. constraint:node.labels.elasticsearch==1). So if I need to demote a node from the cluster, I simply docker node update --label-rm the label.

Hope it helps.

Wow, great discussion.

There is a much easier way to do all this.
I run docker 1.13.0.rc3

I am just in the process of switching to swarm and I haven’t gotten as far as data persistence yet, but if you set your docker compose file like so (i am actually passing in a elasticsearch.yml file via a volume, so there may be a typo in here, but hopefully, you get the idea)

docker-compose.yml

services:
  elasticsearch:
    command: elasticsearch -Enetwork.host=0.0.0.0 -Ediscovery.zen.ping.unicast.hosts=elasticsearch
    environment:
      ES_JAVA_OPTS: -Xms16g -Xmx16g
    image: docker.elastic.co/elasticsearch/elasticsearch:5.1.1
    ulimits:
      memlock: -1
      nofile:
        hard: 65536
        soft: 65536
      nproc: 65538
    volumes:
      - esdata:/usr/share/elasticsearch/data
volumes:
  esdata:
    driver: local

and then run

docker stack deploy --compose-file docker-compose.yml elasticstack
docker service update --endpoint-mode=dnsrr elasticstack_elasticsearch 
docker service scale elasticstack_elasticsearch=5

For an easy 5 node cluster in swarm, it just works.
Obviously, it is still a good idea to split master nodes from data nodes, etc. for production use.

2 Likes

@treksler Would be cool to see a small GitHub repo with the full contents (including elasticsearch.yml) of what you describe! I do find the built-in Compose in 1.13.0-rc* quite nice

I’d be curious about how exactly to separate masters and workers too, perhaps there could be a service for each.

@treksler, what do you mean with “it just works”? I can follow your instructions and end up with a bunch of elasticsearch instances which I cannot access because no ports are published. When I try to add a published port I get an error message:

docker service update --publish-add 9200:9200 elasticstack_elasticsearch
Error response from daemon: rpc error: code = 3 desc = EndpointSpec: port published with ingress mode can't be used with dnsrr mode

How am I supposed to access the cluster?

I think one of the main problems with elasticsearch in a swarm is that we need to publish 9200 via the load balancer and 9300 directly on each host. That is not supported with a compose file, I think. So I tried manually:

docker network create es -d overlay

docker service create \
  --name es \
  --network es \
  --publish 9200:9200 \
  --publish mode=host,target=9300,published=9300 \
  elasticsearch \
  -Enetwork.host=_eth0_ \
  -Ediscovery.zen.ping.unicast.hosts=<node1_ipaddr>,<node2_ipaddr>,<node3_ipaddr> \
  -Enode.master=true

The Problem with this is that all tasks get the same IP address 10.0.0.2. So when scaling up I get the understandable error

failed to send join request to master [{...}{...}{10.0.0.2}{10.0.0.2:9300}],
reason ... nested: IllegalArgumentException[can't add node {...}{...}{10.0.0.2}{10.0.0.2:9300},
found existing node {...}{...}{...}{10.0.0.2}{10.0.0.2:9300} with same address]; ]

Hence, the only way I see is to create different services like @jillesvangurp did and relinquish the swarm node balancer. So either I’d have to run my own or I won’t have high availability.

Am I missing something here? Is there a special trick to create an ES cluster on swarm?

There is simple solution to run elasticsearch cluster in swarm: https://github.com/a-goryachev/docker-swarm-elasticsearch as one scalable service instead of several services. Hope it will give you an idea.

1 Like

Interesting approach. Several people have also pointed out to me that docker 1.13 adds a few dns and networking features that could be of help here. And I believe that on the ES side there also have been some improvements to the way the clustering deals with dns lookups. So altogether, it should be possible to use swarm now with ES now.

In general my concern with running ES on swarm still is dealing with day to day cluster management (node replacement, rolling upgrades, etc) where there is quite a bit of potential for ending up with corrupted nodes, network splits, etc. But things are moving the right direction.

I am still not able to understand why Swarm is required when Elasticsearch has inbuilt scaling and high availability support and ES cluster with docker containers can easily be formed using option (3) in the question by @bluepuma77 ? Can anyone help me understand this.

I had to modify your example a bit to get it to work. I don’t need to publish any of the 9300 ports because when services are on the same network in swarm mode they have access to all the those services ports. Only in the first node do I publish 9200.

This needs some work as it periodically hangs

docker service create \
	--name esm1 \
	--network elasticnet \
	--detach=true \
	--publish=9200:9200 \
	--env ES_JAVA_OPTS='-Xms512m -Xmx512m' \
	elasticsearch \
		-Enetwork.host=_eth0_ \
		-Ecluster.name=docker-cluster \
		-Ediscovery.zen.ping.unicast.hosts=esm1 \
		-Enode.master=true \
		-Enode.data=false \
		-Ediscovery.zen.minimum_master_nodes=2 
		
docker service create \
	--name esm2 \
	--network elasticnet \
	--detach=true \
	--env ES_JAVA_OPTS='-Xms512m -Xmx512m' \
	elasticsearch \
		-Ecluster.name=docker-cluster \
		-Ediscovery.zen.ping.unicast.hosts=esm1 \
		-Enetwork.host=_eth0_ \
		-Enode.master=true \
		-Enode.data=false \
		-Ediscovery.zen.minimum_master_nodes=2 
		
docker service create \
	--name esm3 \
	--network elasticnet \
	--detach=true \
	--env ES_JAVA_OPTS='-Xms512m -Xmx512m' \
	elasticsearch \
		-Ecluster.name=docker-cluster \
		-Enetwork.host=_eth0_ \
		-Ediscovery.zen.ping.unicast.hosts=esm1,esm2 \
		-Enode.master=true \
		-Enode.data=false \
		-Ediscovery.zen.minimum_master_nodes=2 

I am interested to know how other people are progressing with this. Also, I have yet to set up data volumes but that will follow.

What about putting them behind a nginx load balancer (perhaps with consul + registrator?) so no one will publish a public IP + ports but the nginx
I have two doubts tought:
1.- Is there any more swarmish than consul + registrator
2.- How to achieve persistence? Giving the fact that one copy only will do

checking how this works

Hello everyone,

A lot of good elements in this discussion.
Since the topic is more than 6 month old, and both ES and Docker have been evolved, did anyone finally find an “easy” and efficient solution to run ES cluster in swarm node cluster ?
Thank you

The config below seems to work for me on v18.01.0-ce. Even without dnsrr, tasks.serviceName is giving me back all the IPs. So, now I also have a VIP to connect to without the need for another proxy instance to manage that.

I’m only using global mode because right now I have it setup to use local volumes for persistence.

version: "3.4"
services:
 elastic:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.0.0
    deploy:
      mode: global
    ports:
      - "9200:9200"
    environment:
      - network.publish_host=_eth0_
      - discovery.zen.ping.unicast.hosts=tasks.elastic
      - discovery.zen.minimum_master_nodes=2
      - ES_JAVA_OPTS=-Xmx512m -Xms512m
    volumes:
      - elastic:/usr/share/elasticsearch/data
volumes:
  elastic:
1 Like

when i launch this stack docker-swarm it throughs me below errors. What’s this error and how to resolve it.

[2020-03-29T05:54:43,848][WARN ][o.e.d.z.ZenDiscovery ] [Ho5ynZW] failed to connect to master [{B4lzW_3}{B4lzW_3QR5GNXiLBMsccHg}{oQlj1pIMRfmiyTREI7q-8g}{10.0.0.18}{10.0.0.18:9300}{ml.max_open_jobs=10, ml.enabled=true}], retrying…
org.elasticsearch.transport.ConnectTransportException: [B4lzW_3][10.0.0.18:9300] connect_timeout[30s]
at org.elasticsearch.transport.netty4.Netty4Transport.connectToChannels(Netty4Transport.java:284) ~[?:?]
at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:591) ~[elasticsearch-6.0.0.jar:6.0.0]
at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:495) ~[elasticsearch-6.0.0.jar:6.0.0]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:334) ~[elasticsearch-6.0.0.jar:6.0.0]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:321) ~[elasticsearch-6.0.0.jar:6.0.0]
at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:516) [elasticsearch-6.0.0.jar:6.0.0]
at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:484) [elasticsearch-6.0.0.jar:6.0.0]
at org.elasticsearch.discovery.zen.ZenDiscovery.access$2500(ZenDiscovery.java:90) [elasticsearch-6.0.0.jar:6.0.0]
at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1286) [elasticsearch-6.0.0.jar:6.0.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-6.0.0.jar:6.0.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: 10.0.0.18/10.0.0.18:9300
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267) ~[?:?]
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[?:?]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) ~[?:?]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]