I am trying to run a Docker container using nvidia/cuda:11.8.0-base-ubuntu22.04
as the base image, with PyTorch and CUDA-enabled dependencies to execute a FastAPI application. The application works perfectly on my local machine and correctly detects CUDA. However, inside the container, torch.cuda.is_available()
consistently returns False
, and the message “CUDA is not available” is logged. The container otherwise runs correctly.
Environment Setup
Local Environment
- OS: Windows 11 with WSL2 enabled.
- CUDA Toolkit:
11.8.0
. - GPU: NVIDIA RTX 3070.
- NVIDIA Driver Version:
560.94
. - PyTorch Version:
2.0.1+cu118
.
Docker Environment
- Base Image:
nvidia/cuda:11.8.0-base-ubuntu22.04
. - Docker Desktop: WSL2 backend with Ubuntu as the WSL integration.
- NVIDIA Container Toolkit Installed: Verified using
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
.
What Has Been Tried So Far
1. Verified GPU Access in Docker
- Ran the following command to confirm that Docker can detect the GPU:
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
Output:
- GPU is correctly detected.
- NVIDIA driver and CUDA versions are displayed.
2. Ensured Compatibility Between CUDA and PyTorch
- Used PyTorch with CUDA version 11.8 (
2.0.1+cu118
) in both local and containerized environments. - Updated the
DiffI2I_Environment.yml
file to specify:
dependencies:
- python=3.9
- cudatoolkit=11.8.0
- pytorch=2.0.1
- torchvision=0.15.2
- torchaudio=2.0.2
# Other dependencies
3. Updated Dockerfile
Revised the Dockerfile to ensure compatibility and included necessary steps:
FROM nvidia/cuda:11.8.0-base-ubuntu22.04
# Install Miniconda
WORKDIR /app
RUN apt-get update && apt-get install -y wget bzip2 build-essential libgl1 libglib2.0-0 && \
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /miniconda.sh && \
bash /miniconda.sh -b -p /opt/conda && \
rm /miniconda.sh && \
rm -rf /var/lib/apt/lists/*
ENV PATH="/opt/conda/bin:$PATH"
# Copy the Conda environment file and install
COPY DiffI2I_Environment.yml .
RUN conda install -n base -c conda-forge mamba && \
mamba env update -f DiffI2I_Environment.yml && \
conda clean --all --yes
# Set Conda environment
ENV PATH="/opt/conda/envs/StyleCanvasAI/bin:$PATH"
SHELL ["conda", "run", "-n", "StyleCanvasAI", "/bin/bash", "-c"]
# Copy application files
COPY . /app/
# Expose port for FastAPI application
EXPOSE 8000
# Entrypoint to launch the server
CMD ["uvicorn", "Diffi2i_Inference_Server:app", "--host", "0.0.0.0", "--port", "8000", "--log-level", "debug"]
4. Verified CUDA Setup Inside the Container
- Ran the following inside the container:
python -c "import torch; print(torch.cuda.is_available())"
- Output:
False
.- Confirmed CUDA availability:
python -c "import torch; print(torch.version.cuda)"
Output:
11.8
.
5. Confirmed NVIDIA Toolkit Installation
- Ensured the NVIDIA Container Toolkit is installed and functional.
- Verified with:
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
6. Tested with a Clean CUDA Container
- Ran a clean test using:
bash
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
- Output:
- The GPU is detected, and
nvidia-smi
works.
- The GPU is detected, and
7. Verified Dependencies
- Ensured PyTorch, torchvision, and torchaudio are installed correctly in the container.
- Verified that
torch.cuda.is_available()
works on the same configuration locally.
Remaining Issue
Even after verifying GPU access and ensuring compatibility between CUDA, NVIDIA drivers, and PyTorch, the application inside the container consistently logs CUDA is not available
. It’s unclear why the containerized PyTorch cannot detect CUDA.
Additional Context
- Running the application directly on the host system (outside Docker) works flawlessly, and
torch.cuda.is_available()
returnsTrue
. - The same Conda environment and dependencies are used both locally and inside the container.
Help Needed
- Are there additional steps needed to enable GPU access for PyTorch in Docker?
- Is there a known issue with CUDA compatibility in containers using WSL2 backend?
- Are there debugging steps or environment configurations I might have missed?