Swarm monitoring only monitoring one of two nodes

Hi, I recently set up two Raspberry Pis to act as two nodes in a docker swarm. I installed Swarm Visualiser and Portainer on it, and then used Portainer’s app templates to set up monitoring using Prometheus and Grafana (after which I also set up pi-hole). Under the node metrics dashboard I was able to select which instance (node) to view.

Everything worked great so far but at 5:40 in the morning metrics dropped and I was unable to access monitoring for one of my nodes (probably the worker/second node) via Grafana. I’m new to docker, but as far as I can tell everything is supposed to work. Containers, services, etc… are all up and running and the logs of those don’t show anything either. I even tried turning them off and on again! So here are my questions:
a) is this a common issue I can fix, and how?
b) if not, how could I find out what’s wrong?

There is not enough information to diagnose anything. You are aware that Grafana is only the display part, you need to ensure that the exporters and time-series database are also up and running.

I get that, but everywhere I checked everything seems to be running:

  • Monitoring stack is up
  • Cadvisor is running on both nodes (logs seem normal)
  • Grafana is running
  • Node exporter runs on both nodes (logs seem normal)
  • Prometheus is running

Perhaps it’s an issue with networks? I’m not sure how it works exactly.

I wouldn’t know where else to look, everything seems to be indicating it working fine. My second node is also running pihole just fine. How can I debug this?

What do you see in the system logs on the nodes, not in the container logs? Especially on the node which you don’t see in Grafana, but both could be important. if there is a communication issue for example.

You should also check the values directly on the Prometheus dashboard. If you can see the metrics there, you can try to find out why Grafna cannot see that. If you don’t see anything in Prometheus either, that is a different issue. If you see the metrics in Prometheus, you can check the query in Grafana. A wrong query can produce misleading and confusing results.

1 Like