I am facing an issue where a service cannot be ran on a swarm node that previously failed and lost all data (Simulating critical HW failure).
If I manually remove the old node, and rejoin it, I cannot scale the service back either. The docker scale command just hangs indefinitely.
In docker node ls
I can clearly see the node (The sh-apisix-2 one) as being ready and active:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
49skokrsbhvvky6bx1pvptema * sh-apisix-1 Ready Active Leader 24.0.2
o7905ox7gf4htt8m2hko7frcr sh-apisix-2 Ready Active Reachable 24.0.4
v04f8pet6sgtanavvr000vzag sh-apisix-3 Ready Active Reachable 24.0.2
Yet next running docker service scale apisix_etcd=3
only outputs:
apisix_etcd scaled to 3
overall progress: 2 out of 3 tasks
1/3: running [==================================================>]
2/3: running [==================================================>]
3/3:
And… Never anything more.
The nodes’ dockerd logs (/var/log/syslog |grep dockerd) are not indicating anything being wrong. Yet I cannot even see the new node in docker service ps apisix_etcd
:
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
1tgb40ux1g5i apisix_etcd.1 bitnami/etcd:3.4.9 sh-apisix-1 Running Running 5 days ago
ms5dym9xo2zf apisix_etcd.2 bitnami/etcd:3.4.9 24lkfyzhmroswozgxqx16mdqw Shutdown Rejected 2 hours ago "cannot create a swarm scoped …"
xyhib7aqrqgy apisix_etcd.3 bitnami/etcd:3.4.9 sh-apisix-3 Running Running 5 days ago
The second node is the one I already had to remove, and the one that rejoined doesn’t appear to have joined the service / ran the task.
Yet the other service from the stack did deploy without issues:
docker service ps apisix_apisix
:
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
tdy4nfertnbn apisix_apisix.1 apache/apisix:3.4.0-debian sh-apisix-3 Running Running 6 days ago
zfx773zp2jbf apisix_apisix.2 apache/apisix:3.4.0-debian sh-apisix-1 Running Running 6 days ago
wu7aq6l4tq3m apisix_apisix.3 apache/apisix:3.4.0-debian sh-apisix-2 Running Running 13 minutes ago
The service definition file:
version: "3.8"
services:
apisix:
image: "apache/apisix:3.4.0-debian"
volumes:
- /var/apisix/config.yaml:/usr/local/apisix/conf/config.yaml:ro
depends_on:
- etcd
ports:
- "9180:9180/tcp"
- "9080:9080/tcp"
- "9091:9091/tcp"
- "9443:9443/tcp"
networks:
- apisix
deploy:
mode: replicated
replicas: 3
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
update_config:
parallelism: 1
delay: 10s
placement:
max_replicas_per_node: 1
etcd:
image: bitnami/etcd:3.4.9
user: root
extra_hosts:
- "sh-apisix-1.internal:172.16.0.1"
- "sh-apisix-2.internal:172.16.0.2"
- "sh-apisix-3.internal:172.16.0.3"
volumes:
- /var/lib/etcd:/etcd_data:rw
environment:
ETCD_DATA_DIR: /etcd_data
ETCD_ENABLE_V2: "true"
ALLOW_NONE_AUTHENTICATION: "yes"
ETCD_NAME: "{{.Node.Hostname}}"
ETCD_ADVERTISE_CLIENT_URLS: "http://{{.Node.Hostname}}.internal:2379"
ETCD_LISTEN_CLIENT_URLS: "http://0.0.0.0:2379"
ETCD_LISTEN_PEER_URLS: "http://0.0.0.0:2380"
ETCD_INITIAL_CLUSTER: "sh-apisix-1=http://sh-apisix-1.internal:2380,sh-apisix-2=http://sh-apisix-2.internal:2380,sh-apisix-3=http://sh-apisix-3.internal:2380"
ETCD_INITIAL_CLUSTER_STATE: "new"
ETCD_INITIAL_CLUSTER_TOKEN: "token-00"
ETCD_INITIAL_ADVERTISE_PEER_URLS: "http://{{ .Node.Hostname }}.internal:2380"
ports:
- "2379:2379/tcp"
- "2380:2380/tcp"
networks:
- apisix
deploy:
mode: replicated
replicas: 3
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 5
update_config:
parallelism: 1
delay: 10s
placement:
max_replicas_per_node: 1
networks:
apisix:
driver: overlay
attachable: true
Anyone would be able to point me in the correct direction please? I’m… Kinda desperate…