Make container wait for nvidia CDI to start

Hey Guys,

Probably its a simple thing, but I’m pretty new to docker and linux and I couldnt find a good explanation how to do this.
I have a RTX3090 in my home server. It works with docker no problem. However if I restart the containers using the GPU fail to start. Based on the logs it takes roughly 45 seconds for the nvidia CDI to become available. Its a bit annoying to always go in and manually start them, especially when I’m not at home… (there is some power fluctuation in my area so the server unfortunately restarts a few times a week)

I want to add a delay to either the docker service or add some kind of health check as a start condition to my containers.

My problem is I see how can I ping an endpoint with healthcheck in compose or how to run scripts, but dont see any tutorial on how to check if a service is running…

I found this for waiting until a mount is available:
[Unit]
#ExecStartPre=/bin/sleep 30
RequiresMountsFor=/media/localadmin/FILES /media/localadmin/PHOTOS

I would need something like this, but for nvidia CDI.

It was some time ago when I configured Docker or Docker containers for nvidia GPUs, so I don’t remember if docker run fails immediately or the process fails in the container. If the process fails in the container and the container stops, you could simply use

restart: always

In compose or from command line: https://docs.docker.com/reference/cli/docker/container/run/#restart

But checking the nvidia docs:

I see this systemd service: nvidia-cdi-refresh.service, so you could try something like this

[Unit]
After=nvidia-cdi-refresh.service

I also found an open issue on GitHub mentioning that this service can also fail:

So that dependency definition alone might not solve everything, but you can try.

Hey!

restart: always was the first thing I tried (changed from unless stopped), but doesnt help.
Not sure why its not trying to restart.

The 2nd solution you suggested is a really good one, I will try it as soon as I get home.

I dont think cdi refresh is failing as in ~45 seconds I can manually start the docker without restarting it.