Startup order in swarm

romgo · June 18, 2025, 7:44am

Hello,

I read few documentation about the fact that docker swarm doesn’t support depends_on for service.
I am looking for some feedback from the community on how you are dealing with this.

My use case is a stack composed by a rabbitMQ message, and a service (I simplified for this post) :

services:
  rabbitmq01:
    image: rabbitmq:3-management
    deploy:
      labels:
        - "traefik.http.routers.rabbitmq1.rule=Host(`rabbitmq.domain.com`)"
        - "traefik.http.services.rabbitmq1.loadbalancer.server.port=15672"
        - "traefik.swarm.network=traefik_default"
      placement:
        constraints: [node.role == worker]
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
        window: 60s
    environment:
      TZ: "Europe/Paris"
      RABBITMQ_DEFAULT_USER: "user"
      RABBITMQ_DEFAULT_PASS: "user"
    networks:
      - message_net
      - traefik_default
  app1:
    image: app1:2.8
    environment:
      TZ: "Europe/Paris"
    hostname: "app-{{.Node.Hostname}}-{{.Task.Slot}}"
    deploy:
      labels:
        - "traefik.http.routers.vad.rule=Host(`app.domain.com`)"
        - "traefik.http.services.vad.loadbalancer.server.port=8080"
        - "traefik.swarm.network=traefik_default"
      replicas: 2
      placement:
        constraints:
          - node.role == worker
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
        window: 60s
    networks:
      - message_net
      - traefik_default

So the situation is that some services app are started before rabbitmq is ready to serve connection.
To handle this situation I see two options

Use a dedicated stack file for rabbitMQ, this way I make sure it is started before
See with dev team to be sure there is some retry mechanism in the code if it fails to connect to rabbitMQ.

I think currently service start, but if it fails to connect to rabbitMQ service (hence container) remain up & running.

Thank you for your input !

meyay · June 18, 2025, 7:49pm

Thoughts to option1: the race condition still applies, it becomes just less likely. What happens in the external service becomes unavailable when the application is running?

Thoughts to option2: this would be the right choice to strengthen resiliency.

Another idea could be to make the entrypoint script delay the start of the main application until the required external services are available. That would fix the race condition, but the question from option1 remains

romgo · June 19, 2025, 7:46am

thank you for your feedback.
Yes I don’t like option 1, because the idea I have from docker, is like just create one stack that describe all the application & requirements.

So I went to option2 to show case, and it works. rabbitmq needs around 15s to be ready.

Hope this helps.

bluepuma77 · June 19, 2025, 8:32am

It’s all about resilience when building Internet systems. Your application needs to reconnect when an internal service crashes or network has an outage between services. Trying to get a fixed startup order might work during startup, but you need to implement retries and reconnecting anyway.

Topic		Replies	Views
Unable to create service on docker swarm managers, only on workers Swarm	16	4486	May 19, 2023
How can I deploy docker swarm user docker stack with services by order General	0	745	April 23, 2018
Stack serviceB depends on serviceA running and healthy Swarm	0	2925	November 14, 2017
Need allow docker to create folder if it no existed in source Feature Requests	1	1560	November 30, 2021
How does "order: start-first" exactly work with "update_config"? Swarm	0	5152	June 24, 2021

Startup order in swarm

Related topics