Removing node from the swarm issue (raft message is too large and can't be sent)

gray380 · January 25, 2024, 11:17am

Hello,

When I try to remove node from the cluster the following error message has appeared:

docker node rm --force p5npyj7vms82jsiwmsywpietm
Error response from daemon: rpc error: code = Unknown desc = raft: raft message is too large and can't be sent

I’ve tried to remove it by the hostname as well, the result is the same.

journal:

Jan 25 12:12:16 sbtv-dock044 dockerd[1508]: time="2024-01-25T12:12:16.160354363+02:00" level=error msg="Handler for DELETE /v1.44/nodes/p5npyj7vms82jsiwmsywpietm returned error: rpc error: code = Unknown desc = raft: raft message is too large and can't be sent"
Jan 25 12:14:16 sbtv-dock044 dockerd[1508]: time="2024-01-25T12:14:16.284127977+02:00" level=error msg="Handler for DELETE /v1.44/nodes/sbtv-dock004 returned error: rpc error: code = Unknown desc = raft: raft message is too large and can't be sent"

The node list:

docker node ls
ID                            HOSTNAME       STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
nieeiv4ja9dxp00zlef952pba     sbtv-dock003   Ready     Active         Leader           25.0.1
p5npyj7vms82jsiwmsywpietm     sbtv-dock004   Down      Active                          25.0.1
n0x6vx2bz9wwb9hwss1g7932z     sbtv-dock005   Ready     Active                          25.0.1
ds3dspc1kc5nvwawyhyjn7yy7     sbtv-dock006   Ready     Active                          25.0.1
p69gynx3p0pz52d982ju6sltb     sbtv-dock007   Ready     Active         Reachable        25.0.1
pjotlc68rgmsjre3jh3t62quj *   sbtv-dock044   Ready     Active         Reachable        25.0.1

We faced this issue while updating cluster nodes from 20.x to 25.x, the node for some reason stop to be manager and we tried to rejoin it in a result there was two nodes with same hostname and different ids in the cluster configuration. We tried to remove “failed” node and faced the issue.
So we can demote, promote, pause, drain this node but remove.

Could you help to get rid of this node from the cluster configuration?

Docker info output:

Client: Docker Engine - Community
 Version:    25.0.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.24.2
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
  scan: Docker Scan (Docker Inc.)
    Version:  v0.23.0
    Path:     /usr/libexec/docker/cli-plugins/docker-scan

Best regards,
Serhiy.

rimelek · January 27, 2024, 10:00am

You mean you had Docker 20.x and you updated to Docker 25.x in one step? That could even make the nodes incompetible with eachother, I guess.

Have you searched for the error messages and read issues like this?

github.com/moby/moby

Error trying to remove dead nodes from swarm: raft message is too large and can't be sent

opened 12:42PM - 05 Apr 18 UTC

closed 03:10PM - 14 Apr 18 UTC

nvivo

area/swarm

I have a swarm with ~1300 nodes, and some enter and leave all the time (about 10…/minute). Since about a week ago, I'm experiencing an error when trying to remove dead nodes with `docker node rm xxxx` from the swarm: ``` Error response from daemon: rpc error: code = Unknown desc = raft: raft message is too large and can't be sent ``` All I see in the logs is the same: ``` Apr 5 12:35:31 ip-10-0-0-10 dockerd[1239]: time="2018-04-05T12:35:31.686809606Z" level=error msg="Error removing node x2nhsvmnzoaq5hp3xqfl2a7dp: rpc error: code = Unknown desc = raft: raft message is too large and can't be sent" Apr 5 12:35:31 ip-10-0-0-10 dockerd[1239]: time="2018-04-05T12:35:31.687236652Z" level=error msg="Handler for DELETE /v1.37/nodes/x2nhsvmnzoaq5hp3xqfl2a7dp returned error: rpc error: code = Unknown desc = raft: raft message is too large and can't be sent" Apr 5 12:35:36 ip-10-0-0-10 dockerd[1239]: time="2018-04-05T12:35:36.289574281Z" level=error msg="Error removing node kdmprxylwjmvutsfb9y1f2o17: rpc error: code = Unknown desc = raft: raft message is too large and can't be sent" Apr 5 12:35:36 ip-10-0-0-10 dockerd[1239]: time="2018-04-05T12:35:36.289644704Z" level=error msg="Handler for DELETE /v1.37/nodes/kdmprxylwjmvutsfb9y1f2o17 returned error: rpc error: code = Unknown desc = raft: raft message is too large and can't be sent" ```

gray380 · January 28, 2024, 10:21am

Yes, thanks.
Certificates rotation does not help.