Best way to update a Docker Swarm Stack?

Was running the prior version of Docker for AWS stack (17.03) and in the past was able to do an in place upgrade of the stack from the prior version. Both the the templates from the Edge Channel (17.04) version of the Docker for AWS template error out when I choose ‘update stack’ from the cloudformation screen. This is how I have upgraded my current stack in the past. The first template (which apparently tries to create a new vpc) errored out with the message ‘CIDR Block must change if Availability Zone is changed and VPC ID is not changed’ and rolled back. Figured ok, perhaps I need to use the template for those using a ‘pre-existing’ vpc. Again, chose update stack, this was rolling along fine, started spinning up new manager and nodes instances but then all of a sudden started attempting to delete VPC resources - subnets, route tables etc. It is failing of course as I have a swarm cluster already using all of those resource.

Can someone direct me as to how exactly I am supposed to upgrade my current docker swarm to the latest on AWS?

The process eventually completed but I then had to go back in and make some changes to my vpc in order to get everything working (eg reattach my IGW). I suppose from this point going forward, if I want to upgrade my docker engine, I should allow the cloudformation template to create a brand new vpc and then cut dns over to it once everything has been tested (basically a blue/green deployment). But the question still remains - why was the cloudformation template that was designed to work with an existing vpc attempting to delete all the major components of my current vpc? The last time I did an in place update of my stack, only the manager and worker nodes were replaced (which make sense as all I wanted to do was upgrade the docker engine version)

Which region are you trying this in? We have seen this error in us-east-1, due to the fact that AWS recently added a new Availability Zone that wasn’t there before. This new AZ changed the order of AZ’s that we had before, and caused the error you saw.

For example. If you previously had the following AZs available in us-east-1
us-east-1a
us-east-1c
us-east-1d

You might now have the following available.
us-east-1a
us-east-1b
us-east-1c
us-east-1d

Notice the new one us-east-1b. Since this is new, it changes the order of AZ’s that get returned from the Cloudformation GetAZ function, so the subnets that you currently have setup, might not match the new stack that will get created. This confuses CloudFormation which thinks that you want to change AvailabilityZones for a subnet, when that isn’t what we really want to do.

We are trying to find a good way to resolve this, but are currently limited with what we can do with Cloudformation.

Did you see this in us-east-1, or another region?

Ken

Hi Ken,
yes, I was deploying into us-east-1 region, so that must have been the issue!
For future docker engine upgrades, I’ll do a new vpc deployment and cut the dns over. A side benefit of doing this, is that it will guarantee that I can rebuild my full swarm cluster (including all associated aws infrastructure) from the push of a button.

thanks for the quick response!

1 Like