Replicated container on swarmode for zero downtime with stop_grace_period

pocketkjs · July 11, 2022, 5:27am

I need to know how the property “stop_grace_period” and health check.

version: '3.8'
services:
  web:
    image: web:latest
    stop_grace_period: 30s
    healthcheck:
      interval: 20s
      start_period: 20s
      test: "CMD curl -f http://127.0.0.1 || exit 1"
      retries: 5
      timeout: 4s
    deploy:
      mode: replicated
      placement:
        constraints: [node.role != manager]
      replicas: 2
      rollback_config:
        parallelism: 1
        delay: 20s
        max_failure_ratio: 3
        order: start-first
      update_config:
        parallelism: 1
        order: stop-first
        delay: 10s
    networks:
      myntw:
        aliases:
          - default

I’ve set .yml like this

On this condition i’d like to know how the instance get started and terminated.

Assume scenario 1)
step 1. The new container is created and waits for 30s (stop_grace_period)
step 2. Old container stops (after 30s) and new container run immediately.

scenario 2)
step 1. The new container is created and starts healthcheck
step 2. The new container is running no matter stop_grace_period exists or not

I want to run this system for zero-downtime.
Any advise please

avbentem · July 11, 2022, 5:42am

Please don’t post test as images. So, please edit your post to include the actual text. Note the </> button to format the text to keep indentation visible. Thanks.

meyay · July 11, 2022, 7:00am

Since you have update_config.order: stop-first, one replica is stopped at a time, which will take up to the grace period, once terminated a new replica is created, which needs to become ready (=pass the healthcheck at least once), before the next replica is handled.

With update_config.order: start-first a new replica is created and when it becomes ready, it will be terminated. I feel this is more reliable, but is only usefull if your application is stateless or uses volumes backed by remote shares (=uses the same shared folder for all replicasI

pocketkjs · July 11, 2022, 7:42am

version: '3.8'

services:
  web:
    image: web:latest
    stop_grace_period: 30s
    healthcheck:
      interval: 20s
      start_period: 20s
      test: "CMD curl -f http://127.0.0.1 || exit 1"
      retries: 5
      timeout: 4s
    deploy:
      mode: replicated
      placement:
        constraints: [node.role != manager]
      replicas: 2
      rollback_config:
        parallelism: 1
        delay: 20s
        max_failure_ratio: 3
        order: start-first
      update_config:
        parallelism: 1
        order: stop-first
        delay: 10s
    networks:
      myntw:
        aliases:
          - default

Here’s actual text

Topic		Replies	Views
Will --stop-grace-period help to like this behavior of a container in docker swarm? General swarm	0	750	February 4, 2020
When setting a node to drain, how to get swarm to actually wait for the original container to stop before starting the new General swarm	3	712	November 21, 2023
Docker service restart policy - stop old service when new service is really ready General docker , swarm	0	894	June 4, 2019
Container inside docker swarm is not gracefully shutting down Swarm	1	515	February 29, 2024
Swarm does not differentiate container startup time before checking health-checks General docker , swarm	1	1733	May 16, 2017

Replicated container on swarmode for zero downtime with stop_grace_period

Related topics