Docker Community Forums

Share and learn in the Docker community.

Metrics, benchmarks

Hi, I’m looking for a some ways to measure times of different things using Docker Swarm. Things that I want to measure are startup time, time of self healing of given pod and average time of response when I’m using loadbalancer. Is there any built-in/extension tool for checking metrics like these?

Thanks for help.

Do somebody have any idea about self healing and start up time?


You are looking for observability as in metrics and request traces, don’t you?

You might want to take a look at swarmprom which leverages metric collectors on node, service and container level, stores them in prometheus and visualizes them in grafana. Though, it will still not give you the exact ramp up times between a container death, rescheduling and when the new container will be available, as the metrics are collected in an intervall and not based on events. Of couse this also depends on the deployment contraint of a container (specific node labels, high ressource contraints?) and the bootstrapping time of the containzied application itself (nginx will be available in an instant, a java application might require 30 to 120 seconds to be “ready” to serve)

I can imaging the docker events can be leveraged to extract the information you need - but you need to corelate them yourself. Swarm healt checks do not seperate between readyness and livenes as it is available in k8s. The events alone will not provide an answer to the question of the average request/response cycle.

True. Thats’ what I need, so to check interval between container death and container startup I have to check it manually, but I’ll try to corelate them using some scripts and greps (I hope so). It’ll be probably small Java web-app (bootstrapping should take about 5 seconds).
Simple startup time of node is possible to check on swarmprom easily, yes?
About request/response cycle I’ll take it from higher level, I’m going to write some Gatling tests.

Thanks for an answer.

Are you sure about that?

In the good old days where people still used wildfly, bootstrapping would sometime be more than two mintues and even with spring-boot it can be easily 30 seconds. If your application does not benefit from the JIT compiler, quarkus might help to build native binaries of the code and reduce start tiime, BUT depending on the usecase the lack of JIT compiler optimzation might provide a performance penelty.

You can easily see how many nodes at a point in time are “seen” in the cluster. Though, like written before this will be an intervall based information, not an event based.

You mgiht be better of to use the docker sdk for the language of your choice and implement it in a real progamming language. Otherwise the correclation part might become a very ugly glue script.

It’ll be really simple app (without any additional libraries packed in, probably with one simple rest endpoint - it’ll be springboot or some lightweight like Ratpack)

Maybe its better to do it by Docker SDK also like with self healing question. It should be much easier, yup?

It will definitly light weight (swarmprop puts some extra load on each node) and you will be flexible to process whatever event you like