Replicated container on swarmode for zero downtime with stop_grace_period

I need to know how the property “stop_grace_period” and health check.
image

version: '3.8'
services:
  web:
    image: web:latest
    stop_grace_period: 30s
    healthcheck:
      interval: 20s
      start_period: 20s
      test: "CMD curl -f http://127.0.0.1 || exit 1"
      retries: 5
      timeout: 4s
    deploy:
      mode: replicated
      placement:
        constraints: [node.role != manager]
      replicas: 2
      rollback_config:
        parallelism: 1
        delay: 20s
        max_failure_ratio: 3
        order: start-first
      update_config:
        parallelism: 1
        order: stop-first
        delay: 10s
    networks:
      myntw:
        aliases:
          - default

I’ve set .yml like this

On this condition i’d like to know how the instance get started and terminated.

Assume scenario 1)
step 1. The new container is created and waits for 30s (stop_grace_period)
step 2. Old container stops (after 30s) and new container run immediately.

scenario 2)
step 1. The new container is created and starts healthcheck
step 2. The new container is running no matter stop_grace_period exists or not

I want to run this system for zero-downtime.
Any advise please :slight_smile:

Please don’t post test as images. So, please edit your post to include the actual text. Note the </> button to format the text to keep indentation visible. Thanks.

Since you have update_config.order: stop-first, one replica is stopped at a time, which will take up to the grace period, once terminated a new replica is created, which needs to become ready (=pass the healthcheck at least once), before the next replica is handled.

With update_config.order: start-first a new replica is created and when it becomes ready, it will be terminated. I feel this is more reliable, but is only usefull if your application is stateless or uses volumes backed by remote shares (=uses the same shared folder for all replicasI

version: '3.8'

services:
  web:
    image: web:latest
    stop_grace_period: 30s
    healthcheck:
      interval: 20s
      start_period: 20s
      test: "CMD curl -f http://127.0.0.1 || exit 1"
      retries: 5
      timeout: 4s
    deploy:
      mode: replicated
      placement:
        constraints: [node.role != manager]
      replicas: 2
      rollback_config:
        parallelism: 1
        delay: 20s
        max_failure_ratio: 3
        order: start-first
      update_config:
        parallelism: 1
        order: stop-first
        delay: 10s
    networks:
      myntw:
        aliases:
          - default

Here’s actual text :slight_smile: