As far as I understand it, the
interval directive for health checks is the time that docker will wait to check a newly created container to see if it’s healthy, and it’s also the recurring interval at which it will continue to check the container’s health status.
It would be nice to be able to specify an initial interval that is used just for the very first time when a container starts, and then an interval used for the recurring health checks.
An example: I have a container that takes roughly 1 second to start up and run. I would like to run health checks on 5 minute intervals. The problem is that right now, with my interval set to 5 minutes, when I start this container, it waits a full 5 minutes after creating the container before checking that the container is healthy!
Ideally, I would tell docker to wait 5 seconds after starting the container to run the first health check, then if that passes, move to a 5 minute interval for continued monitoring. Perhaps some extended cases might even warrant an entirely new directive similar to
HEALTHCHECK but perhaps something like
STARTUPCHECK with its own interval, timeout, and retries.
Inversely, someone might have a container that takes several minutes to run some kind of build process, but they’d like to continually monitor it on a shorter interval. Right now the only way to do that is to specify a shorter interval with a higher number of retries. This is bad because it accommodates the initial start of the container, but, you might not want that many retries for the ongoing health checks.
The issue for all this gets worse when your container is part of a stack and there are other containers using the
depends_on directive. The problem compounds and you have to set really high health check intervals or retries and your full stack takes way too long to deploy.
This also affects when you do a service update. For example, I might want to regularly monitor a container using
HEALTHCHECK on 5 minute intervals. Even if this container’s startup is near instant, when I update the service to use a new image it still takes a full 5 minutes before the health check passes and the container is put back into the rotation. That is 5 minutes of unnecessary downtime.