What is the latest proper way to use the Nvidia Container Toolkit with docker compose?

What is the equivalent of this docker command in Docker Compose?

docker run --rm -it --device=nvidia.com/gpu=all ubuntu:latest nvidia-smi

That command works for me:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     Off |   00000000:01:00.0 Off |                  N/A |
| 30%   28C    P0             26W /  165W |       1MiB /  16380MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

However, with the following docker-compose.yml:

services:
  testing:
    image: ubuntu:latest
    command: nvidia-smi
    environment:
      NVIDIA_VISIBLE_DEVICES: all
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

… and running docker-compose up I get the following output:

[+] Running 2/2
 ✔ Network testing_default      Created                                                                             0.1s
 ✔ Container testing-testing-1  Created                                                                             0.1s
Attaching to testing-1
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]

It seems like things have been in flux. The above yaml worked for me with nvidia-docker, but not nvidia-container-toolkit. I did have to specify runtime: nvidia before, but now when I specify that I get Error response from daemon: unknown or invalid runtime name: nvidia.

Of course, the only machine in which I had nvidia GPU broke yesterday… so without testing, I can say that nvidia-docker included nvidia-container-runtime too, which is an archived project now as well as nvidia-docker.

What I see as a difference in your docker run command and compose command is that the docker run command seems to refer to all GPUs, while in the compose file you only asked for one. If you have another integrated GPU on the motherboard, count: 1 could actually mean that one. try count: all or use device_ids

By the way for all GPUs with the docker run command, you can use --gpus all too instead of the device option.

Thanks to ereslibre on github I have a solution!

services:
  testing:
    image: ubuntu:latest
    command: nvidia-smi
    deploy:
      resources:
        reservations:
          devices:
            - driver: cdi
              device_ids:
                - nvidia.com/gpu=all

So Docker was installed using Nix?