Need help on RabbitMQ Cluster Setup (DNS Auto Discovery) in Docker Swarm

Hi, I’m using rabbitmq 3.12-management-alpine docker image and I’m trying to setup rabbitmq cluster of 5 nodes in docker swarm with rabbit peer discovery dns and I was getting Could not auto-cluster with node {badrpc,nodedown}

The discovery of peer nodes was successful but connecting to those nodes to form a rabbitmq cluster is having issues. Am I missing anything in the network configuration in Docker swarm ??

Below is the docker compose file which I’m using

docker-compose.yaml:

version: "3.9"
services:
  rabbitmq:
    image: rabbitmq:3.12-management-alpine
    hostname: "rabbitmq-{{.Task.Slot}}"
    environment:
      RABBITMQ_DEFAULT_USER: /run/secrets/rabbitmq_user
      RABBITMQ_DEFAULT_PASS: /run/secrets/rabbitmq_password
      RABBITMQ_USE_LONGNAME: "true"
    configs:
      - source: rabbitmq_config
        target: /etc/rabbitmq/rabbitmq.conf
    secrets:
      - rabbitmq_user
      - rabbitmq_password
      - source: erlang_cookie
        target: /var/lib/rabbitmq/.erlang.cookie
        mode: 0600
        gid: "101"
        uid: "100"
    networks:
      - rabbitmq-network
    deploy:
      endpoint_mode: dnsrr
      replicas: 3
      restart_policy:
        condition: on-failure
      resources:
        limits:
          cpus: "1"
          memory: "1g"
        reservations:
          cpus: "1"
          memory: "1g"

networks:
  rabbitmq-network:
    driver: overlay

secrets:
  erlang_cookie:
    external: true
  rabbitmq_user:
    external: true
  rabbitmq_password:
    external: true

configs:
  rabbitmq_config:
    external: true

rabbitmq.conf:

cluster_formation.peer_discovery_backend = dns
cluster_formation.dns.hostname = rabbitmq
cluster_formation.discovery_retry_limit = 10
cluster_formation.discovery_retry_interval = 1000

You state you want 5, but set replica to 3?

You could use mode:global to have a single instance on every node, optionally use placement constraints like labels to limit it further.

Are you sure you need to enable endpoint_mode: dnsrr?

I changed it to 3. Initially I was using 5. And I got issues with auto discovery and then in one of the stackoverflow page, I saw a similar issue posted and they solution was to use endpoint_mode: dnsrr. After adding that rabbitmq was able to discover the peer nodes but the connection to nodes was the issue. Using mode: global will have one container running in each node. I will try with mode: global and if that solves the issue.

Similar issue. Not working after change the mode to global

Are you using your Docker overlay network over VLAN? Then make sure the MTU is set correctly.

It’s a tricky issue, because small packets like ping go through, but data larger 1400 bytes might suddenly fail.

Note that this looks pretty wrong, maybe adding _FILE makes it right, needs to be supported by the image: