Docker running containers before the old ones are shut down

powerfulxtc · August 22, 2019, 7:57am

First of all excuse my English, it’s not my mother language.

We’re having weird errors and behaviour on some apps we’ve dockerized and we think we’ve pinpointed the origin, but we don’t know how or if we can change this behaviour.

The problem happens when a worker node loses connectivity with the managers.
We have some services that are constrained to only run on one particular node. Also they’re created with --network host.
When connectivity with the managers is lost in this node (basically the internet sometimes goes down for a short period of time) the containers keep running on this node, but the node appears as down on the managers. Then, when connectivity is restored, docker “kills” the running containers and runs them again.
The problems we’re having seem to stem from the fact that it looks like that docker is deploying the services before killing the old ones. This leads to sockets still in use, the newly deployed services connecting to the old ones shortly before they die, etc.

To test this, I’ve used iptables to disallow communication with the managers so the node is down, and then I’ve allowed it again.
You can see how the “old” containers are still running when the new ones are deployed (lines were too long, I’ve deleted most of the output):

jue ago 22 08:56:04 CEST 2019
CREATED              STATUS              PORTS               NAMES
About a minute ago   Up About a minute   8080/tcp            adg-dev.1. ...
2 minutes ago        Up 2 minutes                            lb. ...
2 minutes ago        Up 2 minutes                            net-doc. ...
2 minutes ago        Up 2 minutes        8080/tcp            adg-dev.2. ...
2 minutes ago        Up 2 minutes                            websock-dev.1. ...
2 minutes ago        Up 2 minutes                            cp-dev.1. ...

jue ago 22 08:56:05 CEST 2019
CREATED              STATUS                  PORTS              NAMES
3 seconds ago        Up Less than a second                     net-doc. ...
3 seconds ago        Up Less than a second                     lb. ...
3 seconds ago        Up Less than a second   8080/tcp          adg-dev.2. ...
3 seconds ago        Up Less than a second                     cp-dev.1. ...
3 seconds ago        Up Less than a second   8080/tcp          adg-dev.1. ...
3 seconds ago        Up Less than a second                     websock-dev.1. ...
About a minute ago   Up About a minute       8080/tcp          adg-dev.1. ...
2 minutes ago        Up 2 minutes                              lb. ...
2 minutes ago        Up 2 minutes                              net-doc. ...
2 minutes ago        Up 2 minutes            8080/tcp          adg-dev.2. ...
2 minutes ago        Up 2 minutes                              websock-dev.1. ...
2 minutes ago        Up 2 minutes                              cp-dev.1. ...

And they are gradually killed until some seconds later they’re all gone:

jue ago 22 08:56:16 CEST 2019
CREATED             STATUS              PORTS               NAMES
13 seconds ago      Up 11 seconds                           net-doc. ...
13 seconds ago      Up 11 seconds                           lb. ...
13 seconds ago      Up 10 seconds       8080/tcp            adg-dev.2. ...
13 seconds ago      Up 11 seconds                           cp-dev.1. ...
13 seconds ago      Up 10 seconds       8080/tcp            adg-dev.1. ...
13 seconds ago      Up 11 seconds                           websock-dev.1. ...
2 minutes ago       Up 2 minutes                            lb. ...
2 minutes ago       Up 2 minutes                            net-doc. ...

jue ago 22 08:56:17 CEST 2019
CREATED             STATUS              PORTS               NAMES
14 seconds ago      Up 12 seconds                           net-doc. ...
14 seconds ago      Up 12 seconds                           lb. ...
14 seconds ago      Up 11 seconds       8080/tcp            adg-dev.2. ...
14 seconds ago      Up 12 seconds                           cp-dev.1. ...
14 seconds ago      Up 12 seconds       8080/tcp            adg-dev.1. ...
14 seconds ago      Up 12 seconds                           websock-dev.1. ...

We’d like to know if it’s possible to either:
· When the node becomes ready from being down, if it was already running the services that we were going to deploy, prevent docker from killing and redeploying them and just keep them running.
· If that’s not possible, stopping the old containers before running the new ones.

As I’ve said, we’ve been unable to find how to do either. The services are already created with
--update-order=stop first, --rollback-order=stop-first and update and rollback parallelism of 1. Also, we’ve seen that there’s a --stop-grace-period option that defaults to 10s but we don’t want to set it to zero because some tasks require some cleanup before they’re shut down and need some time.

All nodes and managers are running docker 19.03.1.
Thank you!

Topic		Replies	Views
No running container after node swarm failover General docker	0	791	March 9, 2018
Docker service restart policy - stop old service when new service is really ready General docker , swarm	0	935	June 4, 2019
Docker service update: start everything before redirecting requests to updated services Swarm	0	697	March 19, 2019
Problem with premature state "running" on service Swarm swarm	2	932	May 8, 2017
Die hard containers General docker , swarm	1	1616	May 23, 2017

Docker running containers before the old ones are shut down

Related topics