Task History kills my Swarm

sintbert · November 4, 2022, 2:26pm

Hi

We have a Swarm-Cluster with one Master and 3 Worker nodes.
We use docker services with “–mode replicated-job” to run regular jobs on intervals.
These get created every time they should run, and removed after they finished.

Now i have the Problem, that they stay in the task-history. Every single run that was done since the last reboot of the hostcomputers.

Output from “docker node ps”

j9b7zj6oe0iy   zsg0utokootnpin10u8yks6m6.q7apfswjq1cs2lv5bgbxale3f                    registry.*:5000/*:latest                vsrv-swarmmanager01   Complete        Complete 5 hours ago
wqf2en8fm0s4   zsx2r6gdv3evvhoz6hetbqyop.q7apfswjq1cs2lv5bgbxale3f                    registry.*:5000/*:latest                vsrv-swarmmanager01   Complete        Complete 25 hours ago
rrpyxm7f11wj   zt1g6m68u2wfwokbzcvvaqlh6.q7apfswjq1cs2lv5bgbxale3f                    registry.*:5000/*:latest             vsrv-swarmmanager01   Complete        Complete 12 hours ago
sqdq0m9z0tzi   zt1q36319anelooa6v693brny.q7apfswjq1cs2lv5bgbxale3f                    registry.*:5000/*:latest            vsrv-swarmmanager01   Complete        Complete 27 hours ago
jd5elxnjgca0   zul9qa08zf5wwaret9wpeqgh8.q7apfswjq1cs2lv5bgbxale3f                    registry.*:5000/*:latest         vsrv-swarmmanager01   Complete        Complete 9 hours ago

I have 4000+ of those entries currently. At around 40’000 entries the cluster is no longer able to start new tasks and hast to be rebooted to resume operation.

Why are those not get cleaned up? Why do you let stuff that’s done and dusted kill a full cluster?

Regards
Sintbert

meyay · November 4, 2022, 5:49pm

The task history has a default limit of 5 tasks. As the default setting doesn’t seem to your use-case, you will need to adjust the value to whatever supports your use-case.

Note: Depending on whether you want to be able to perform rolling updates and want swarm to revert to the last running state, you can set the --task-history-limit you see fit. If you don’t care for recovery of failed rolling updates, just set it to 0 and have no task history at all.

sintbert · November 7, 2022, 9:35am

Hei @meyay
The task history doesn’t apply at all. It limits how many tasks stay in the history per service. At least one is always in that history for every running service.
The problem is, those tasks usually get removed when the service gets removed. But not those who where startet by a “–mode replicated-job” service, those stay when the service gets removed and are now orphaned.
I have not found any way except for a complete cluster reboot to remove those orphaned tasks, since tasks can only be removed when the corresponding service is removed.

vanav · May 1, 2023, 12:35am

Created issue: `replicated-job` doesn't respect `task-history-limit` · Issue #45443 · moby/moby · GitHub

softlang · June 17, 2025, 8:38am

Make sure your service --restart-max-attempts=0, otherwise if --restart-max-attempts > 0, --task-history-limit = --restart-max-attempts + 1.

Topic		Replies	Views
Swarm --task-history-limit 0 does not work as expected Swarm	0	1437	May 18, 2018
Can't delete orphaned and failed task from history Swarm	2	1399	July 20, 2023
Portainer.i.o cannot auto delete service after update new service General	6	982	October 4, 2022
Docker 1.12 Swarm Mode - don't want to use service General	13	7095	November 8, 2016
Remove old containers after running "docker service update" on a swarm cluster General	12	11118	February 6, 2025

Task History kills my Swarm

Related topics