Hello community,
I’m struggling to join a specific worker to swarm. I followed the typical resolution advice to verify if something is blocking the related tcp/udp traffic on typical swarm ports, but the results indicate this may not be the problem:
- on manager node (ip 172.27.32.18):
j4care@pacslb2:~$ docker swarm init --advertise-addr 172.27.32.18:2377
- on worker node (ip 172.27.32.16)
j4care@pacsvwr2:~$ docker swarm join --token SWMTKN-1-5hb4nwggy2stt09me27d3jusxmwybjep7us1dbk7ljikqm7fm0-f3jzsk1bnltms7ch1foge7386 172.27.32.18:2377
Error response from daemon: Timeout was reached before node joined. The attempt to join the swarm will continue in the background.
- result - on manager:
j4care@pacslb2:~$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
3vvd52294czmefaredb1i8hjt pacsvwr2 Ready Active 26.0.0
however it is not possible to deploy services:
docker service create --constraint node.hostname==pacsvwr2 nginx
… never finishes. In docker logs of the worker:
Mar 22 19:43:37 pacsvwr2 dockerd[78392]: time="2024-03-22T19:43:37.517077033+05:00" level=info msg="manager selected by agent for new session: { 172.27.32.18:2377}" module=node/agent node.id=3vvd52294czmefaredb1i8hjt
Mar 22 19:43:37 pacsvwr2 dockerd[78392]: time="2024-03-22T19:43:37.517399652+05:00" level=info msg="waiting 0s before registering session" module=node/agent node.id=3vvd52294czmefaredb1i8hjt
Mar 22 19:43:37 pacsvwr2 dockerd[78392]: time="2024-03-22T19:43:37.517486544+05:00" level=debug msg="(*session).start" module=node/agent node.id=3vvd52294czmefaredb1i8hjt
Mar 22 19:43:42 pacsvwr2 dockerd[78392]: time="2024-03-22T19:43:42.518528714+05:00" level=error msg="agent: session failed" backoff=100ms error="session initiation timed out" module=node/agent node.id=3vvd52294czmefaredb1i8hjt
however, the connection seems ok:
worker$ telnet 172.27.32.18 2377
Trying 172.27.32.18...
Connected to 172.27.32.18.
also, it seems the request arrives to manager ok - as I see in manager logs:
Mar 22 19:55:04 pacslb2 dockerd[92544]: time="2024-03-22T19:55:04.298506119+05:00" level=info msg="worker 3vvd52294czmefaredb1i8hjt was successfully registered" method="(*Dispatcher).register"
Mar 22 19:55:09 pacslb2 dockerd[92544]: time="2024-03-22T19:55:09.736424292+05:00" level=debug msg="node status updated" method="(*Dispatcher).processUpdates" module=dispatcher node.id=3vvd52294czmefaredb1i8hjt
Mar 22 19:55:09 pacslb2 dockerd[92544]: time="2024-03-22T19:55:09.738386505+05:00" level=info msg="worker 3vvd52294czmefaredb1i8hjt was successfully registered" method="(*Dispatcher).register"
Mar 22 19:55:18 pacslb2 dockerd[92544]: time="2024-03-22T19:55:18.801240679+05:00" level=debug msg="node status updated" method="(*Dispatcher).processUpdates" module=dispatcher node.id=3vvd52294czmefaredb1i8hjt
Mar 22 19:55:18 pacslb2 dockerd[92544]: time="2024-03-22T19:55:18.804435260+05:00" level=info msg="worker 3vvd52294czmefaredb1i8hjt was successfully registered" method="(*Dispatcher).register"
Mar 22 19:51:58 pacslb2 dockerd[92544]: time="2024-03-22T19:51:58.646783000+05:00" level=debug msg="error handling streaming rpc" error="rpc error: code = Unavailable desc = node 3vvd52294czmefaredb1i8hjt exceeded rate limit count of registrations" rpc=/docker.swarmkit.v1.Dispatcher/Session
what I see, however, in the manger logs, is this:
Mar 22 19:21:04 pacslb2 dockerd[92544]: time="2024-03-22T19:21:04.992107150+05:00" level=debug msg="error handling streaming rpc" error="context canceled" rpc=/docker.swarmkit.v1.Dispatcher/Session
which is likely the session that times out. However, as I’m not an expert, I’m not sure what may be the underlying problem. I followed the advice to check the ports typically used by docker swarm - but it seems the traffic is not blocked anywhere on networking level:
- started tcp/udp listeners on both nodes i.e.:
docker run --rm -d -p 0.0.0.0:2377:2377 -e TCP_PORT=2377 --name tcp-listener-2377 cjimti/go-echo
docker run --rm -d -p 0.0.0.0:7946:7946 -e TCP_PORT=7946 --name tcp-listener-7946 cjimti/go-echo
docker run --rm -d -p 0.0.0.0:7946:7946/udp -e UDPPORT=7946 --name udp-listener-7946 mendhak/udp-listener
docker run --rm -d -p 0.0.0.0:4789:4789/udp -e UDPPORT=4789 --name udp-listener-4789 mendhak/udp-listener
- tested access mutually by
… from manager to worker
telnet 172.27.32.16 2377
telnet 172.27.32.16 7946
nc -u 172.27.32.16 7946
nc -u 172.27.32.16 4789
… from worker to manager
telnet 172.27.32.18 2377
telnet 172.27.32.18 7946
nc -u 172.27.32.18 7946
nc -u 172.27.32.18 4789
and all works. docker info - worker:
j4care@pacsvwr2:~$ docker info
Client: Docker Engine - Community
Version: 26.0.0
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.11.2
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.21.0
Path: /usr/libexec/docker/cli-plugins/docker-compose
scan: Docker Scan (Docker Inc.)
Version: v0.23.0
Path: /usr/libexec/docker/cli-plugins/docker-scan
Server:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 7
Server Version: 26.0.0
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: pending
NodeID: 3vvd52294czmefaredb1i8hjt
Is Manager: false
Node Address: 172.27.32.16
Manager Addresses:
172.27.32.18:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3dd1e886e55dd695541fdcd67420c2888645a495
runc version: v1.1.10-0-g18a0cb0
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.15.0-97-generic
Operating System: Ubuntu 22.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.34GiB
Name: pacsvwr2
ID: R26Z:QLMJ:AOAY:ZICF:Z7N4:7IAW:BEBK:V76A:DGIU:GEFQ:T2MK:JFGR
Docker Root Dir: /var/lib/docker
Debug Mode: true
File Descriptors: 27
Goroutines: 80
System Time: 2024-03-22T20:01:37.481520282+05:00
EventsListeners: 0
Experimental: false
Insecure Registries:
docker.j4care.com
127.0.0.0/8
Live Restore Enabled: false
… manager:
j4care@pacslb2:~$ docker info
Client: Docker Engine - Community
Version: 26.0.0
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.11.2
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.21.0
Path: /usr/libexec/docker/cli-plugins/docker-compose
scan: Docker Scan (Docker Inc.)
Version: v0.23.0
Path: /usr/libexec/docker/cli-plugins/docker-scan
Server:
Containers: 5
Running: 5
Paused: 0
Stopped: 0
Images: 8
Server Version: 26.0.0
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: active
NodeID: yo1n6sawn8kqpiorqfgz6fwv9
Is Manager: true
ClusterID: yrd0zw3zc1edpg7yxrargoaj1
Managers: 1
Nodes: 4
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 172.27.32.18
Manager Addresses:
172.27.32.18:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3dd1e886e55dd695541fdcd67420c2888645a495
runc version: v1.1.10-0-g18a0cb0
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.15.0-97-generic
Operating System: Ubuntu 22.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.61GiB
Name: pacslb2
ID: DMD2:TFQV:U2AI:P6KP:G5OW:PK2F:DNJW:DLYT:MFGD:XRMX:CP4B:MLWJ
Docker Root Dir: /var/lib/docker
Debug Mode: true
File Descriptors: 94
Goroutines: 275
System Time: 2024-03-22T20:02:04.347856714+05:00
EventsListeners: 6
Experimental: false
Insecure Registries:
docker.j4care.com
127.0.0.0/8
Live Restore Enabled: false
Do you have any idea what may be the underlying problem? thanks a lot