Hi everyone,
I run Docker Swarm in my homelab as a single-node Swarm.
While monitoring it, I realized I was missing clear visibility into what the Swarm scheduler is actually doing.
Especially around task states and global services.
I have been working on a small open source Prometheus exporter called Swarm Scheduler Exporter and I would like some early feedback from other Swarm users.
What it focuses on:
- Task state visibility per service using a latest-per-slot model.
- Accurate desired replicas for global services based on eligible nodes only.
Status, availability, constraints, and platform are considered. - Simple service readiness signals for alerting.
Some technical details that may be relevant here:
- Watches service and node events and polls tasks periodically.
- Stable labels and controlled cardinality.
- Runs on a manager node and exposes only
/metricsand/healthz.
Example metrics:
swarm_service_desired_replicasswarm_task_replicas_stateswarm_service_at_desiredswarm_cluster_nodes_by_state
The project started as a fork of akerouanton/swarm-tasks-exporter, but it has diverged quite a bit.
Repository and documentation:
https://github.com/leinardi/swarm-scheduler-exporter
I am mainly looking for feedback on:
- Desired replicas behavior for global services.
- Missing task or service states.
- Swarm edge cases I may have overlooked.
This is not an official Docker project, just a tool I built for my own setup and decided to share.