Hi, I’m using rabbitmq 3.12-management-alpine docker image and I’m trying to setup rabbitmq cluster of 5 nodes in docker swarm with rabbit peer discovery dns and I was getting Could not auto-cluster with node {badrpc,nodedown}
The discovery of peer nodes was successful but connecting to those nodes to form a rabbitmq cluster is having issues. Am I missing anything in the network configuration in Docker swarm ??
Below is the docker compose file which I’m using
docker-compose.yaml:
version: "3.9"
services:
rabbitmq:
image: rabbitmq:3.12-management-alpine
hostname: "rabbitmq-{{.Task.Slot}}"
environment:
RABBITMQ_DEFAULT_USER: /run/secrets/rabbitmq_user
RABBITMQ_DEFAULT_PASS: /run/secrets/rabbitmq_password
RABBITMQ_USE_LONGNAME: "true"
configs:
- source: rabbitmq_config
target: /etc/rabbitmq/rabbitmq.conf
secrets:
- rabbitmq_user
- rabbitmq_password
- source: erlang_cookie
target: /var/lib/rabbitmq/.erlang.cookie
mode: 0600
gid: "101"
uid: "100"
networks:
- rabbitmq-network
deploy:
endpoint_mode: dnsrr
replicas: 3
restart_policy:
condition: on-failure
resources:
limits:
cpus: "1"
memory: "1g"
reservations:
cpus: "1"
memory: "1g"
networks:
rabbitmq-network:
driver: overlay
secrets:
erlang_cookie:
external: true
rabbitmq_user:
external: true
rabbitmq_password:
external: true
configs:
rabbitmq_config:
external: true
rabbitmq.conf:
cluster_formation.peer_discovery_backend = dns
cluster_formation.dns.hostname = rabbitmq
cluster_formation.discovery_retry_limit = 10
cluster_formation.discovery_retry_interval = 1000
You state you want 5, but set replica to 3?
You could use mode:global
to have a single instance on every node, optionally use placement constraints like labels to limit it further.
Are you sure you need to enable endpoint_mode: dnsrr
?
I changed it to 3. Initially I was using 5. And I got issues with auto discovery and then in one of the stackoverflow page, I saw a similar issue posted and they solution was to use endpoint_mode: dnsrr. After adding that rabbitmq was able to discover the peer nodes but the connection to nodes was the issue. Using mode: global will have one container running in each node. I will try with mode: global and if that solves the issue.
Similar issue. Not working after change the mode to global
Are you using your Docker overlay network over VLAN? Then make sure the MTU is set correctly.
It’s a tricky issue, because small packets like ping go through, but data larger 1400 bytes might suddenly fail.
Note that this looks pretty wrong, maybe adding _FILE
makes it right, needs to be supported by the image: