Afternoon,
This appears similar to the issue opened here: No adapters found running docker with -gpus all
However, I’ve R&R’d multiple times, both WSL2 and Docker for desktop. I’ve followed every single step-by-step guide in the docs for Microsoft, Nvidia, and Docker, and nothing is helping.
Here’s the skinny:
I’m using the nvidia/cuda:11.7.1-cudnn8-devel-ubuntu-22.04 image. This is a base image for specific nvidia/cuda distributed computing applications.
Up until yesterday, my images were spinning up and down like a charm. No issues, no nada. Then, all of a sudden, they ceased recognizing my GPUs.
I have uninstalled everything, down to the OS and WSL2.
I cannot spin up a container with the --gpus all flag enabled, and the closest I get to an error is this:
2023-03-23 23:35:23 WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
2023-03-23 23:35:23 Use the NVIDIA Container Toolkit to start this container with GPU support; see
2023-03-23 23:35:23 https://docs.nvidia.com/datacenter/cloud-native/ .
when I spin it up without --gpus all enabled
and this:
2023-03-24 08:31:19 Failed to initialize NVML: Unknown Error
when I try to run nvidia-smi.
When I attempt to issue the run command “docker run --gpus all -it --rm ubuntuslim”, it just hangs. It hangs at the terminal and returns no output at all. If I try to spin it up from within the Docker desktop application, it spins up and spins down within a second, and doesn’t return any logs.
I’ve even included a section in the dockerfile to manually fetch nvidia-drivers-520 and nvidia-container-toolkit, to no avail. I have the CUDA toolkit installed on my local host. I have every single nvidia related package under the sun installed to my Ubuntu WSL2 instance. I have tried manually adding repositories and installing packages at docker build time.
I just did a full R&R (one more time) of WSL2 and Docker, and got the following error:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: WSL environment detected but no adapters were found: unknown.
Nvidia-smi from Windows shell and from WSL2-Ubuntu-22.04 shell is normal, but nothing I’ve done has worked to get my containers to recognize my devices again.
I’m at my absolute wit’s end. I’ve been trying to fix this for 16 hours and everything was working fine yesterday.