I’ve always tried to work to the idea lack of automation is a single-point-of-failure, so when I started looking at Swarm in Docker 1.12 I became puzzled by the issue of automating the bootstrap process of a swarm manager cluster. In particular, all the documentation on Swarm in 1.12 suggests that you will have to use a discovery token for the managers to join the cluster in the first place.
In my scenario I’m planning a 3-node RAFT cluster for the managers and N-workers at a later date once the managers are up and running.
The issue I’m finding with this is that when I’m writing cloud-config scripts for my manager nodes to build themselves, there is no clear way for one manager to come online first and then, even if I did separate them out into separate manager autoscaling-groups and stagger the first boot-up of each, I can’t think of a non-hacky way to get the two remaining manager nodes a copy of the join token they require to join the RAFT consensus.
My other concern is that, the need for static IP addresses for managers kind of goes against the whole idea of ephemeral servers in the cloud. My thinking was to put an internal load-balancer in-front of the managers and have the join command talk to the load-balancer on port 2377, which would just be passed back to any responding manager behind said load-balancer.
So, my queries are:
Anyone figured out how to go from zero-to-swarm in 1.12 using autoscaling services? (primarily I’m using Terraform and would rather avoid using a A-B/step-based deployment like Salt, Ansible or Puppet to make sure manager1 is online before others etc. Perhaps I’m asking too much though)
Will load-balancing 2377 and connecting to that work with the RAFT protocol, or do I need specific IP addresses to make the initial connection?