Hi team,
I was trying to run docker with nvidia-container-toolkit to enable GPU in container:
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
However I got the following error:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI
runtime create failed: runc create failed: unable to start container process: error during container init: error
running prestart hook #0: signal: killed, stdout: , stderr: Auto-detected mode as 'legacy': unknown
Run 'docker run --help' for more information
You can see that there is no meaningful debug information from stderr and stdout. I went through a lot of forums and github issues and didnāt find any similar cases. I also have tried podman in the rootless mode:
podman run --rm --security-opt=label=disable \
--device=nvidia.com/gpu=all \
ubuntu nvidia-smi
And it worked:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01 Driver Version: 535.216.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA L40S-24Q On | 00000000:00:05.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 51MiB / 24576MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
Here is my docker info:
docker info
Client: Docker Engine - Community
Version: 28.1.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.23.0
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.35.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 4
Server Version: 28.1.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
runc version: v1.2.5-0-g59923ef
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.15.0-138-generic
Operating System: Ubuntu 22.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 116.2GiB
Name: rs-l-r9yh49
ID: cbd02c7b-b0d7-41ad-9593-a2079a60350b
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
::1/128
127.0.0.0/8
Live Restore Enabled: false
Do you think