I am trying to run a Docker container using nvidia/cuda:11.8.0-base-ubuntu22.04 as the base image, with PyTorch and CUDA-enabled dependencies to execute a FastAPI application. The application works perfectly on my local machine and correctly detects CUDA

I am trying to run a Docker container using nvidia/cuda:11.8.0-base-ubuntu22.04 as the base image, with PyTorch and CUDA-enabled dependencies to execute a FastAPI application. The application works perfectly on my local machine and correctly detects CUDA. However, inside the container, torch.cuda.is_available() consistently returns False, and the message “CUDA is not available” is logged. The container otherwise runs correctly.

Environment Setup

Local Environment

  • OS: Windows 11 with WSL2 enabled.
  • CUDA Toolkit: 11.8.0.
  • GPU: NVIDIA RTX 3070.
  • NVIDIA Driver Version: 560.94.
  • PyTorch Version: 2.0.1+cu118.

Docker Environment

  • Base Image: nvidia/cuda:11.8.0-base-ubuntu22.04.
  • Docker Desktop: WSL2 backend with Ubuntu as the WSL integration.
  • NVIDIA Container Toolkit Installed: Verified using docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi.

What Has Been Tried So Far

1. Verified GPU Access in Docker

  • Ran the following command to confirm that Docker can detect the GPU:
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Output:

  • GPU is correctly detected.
  • NVIDIA driver and CUDA versions are displayed.

2. Ensured Compatibility Between CUDA and PyTorch

  • Used PyTorch with CUDA version 11.8 (2.0.1+cu118) in both local and containerized environments.
  • Updated the DiffI2I_Environment.yml file to specify:
dependencies:
  - python=3.9
  - cudatoolkit=11.8.0
  - pytorch=2.0.1
  - torchvision=0.15.2
  - torchaudio=2.0.2
  # Other dependencies

3. Updated Dockerfile

Revised the Dockerfile to ensure compatibility and included necessary steps:

FROM nvidia/cuda:11.8.0-base-ubuntu22.04

# Install Miniconda
WORKDIR /app
RUN apt-get update && apt-get install -y wget bzip2 build-essential libgl1 libglib2.0-0 && \
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /miniconda.sh && \
    bash /miniconda.sh -b -p /opt/conda && \
    rm /miniconda.sh && \
    rm -rf /var/lib/apt/lists/*
ENV PATH="/opt/conda/bin:$PATH"

# Copy the Conda environment file and install
COPY DiffI2I_Environment.yml .
RUN conda install -n base -c conda-forge mamba && \
    mamba env update -f DiffI2I_Environment.yml && \
    conda clean --all --yes

# Set Conda environment
ENV PATH="/opt/conda/envs/StyleCanvasAI/bin:$PATH"
SHELL ["conda", "run", "-n", "StyleCanvasAI", "/bin/bash", "-c"]

# Copy application files
COPY . /app/

# Expose port for FastAPI application
EXPOSE 8000

# Entrypoint to launch the server
CMD ["uvicorn", "Diffi2i_Inference_Server:app", "--host", "0.0.0.0", "--port", "8000", "--log-level", "debug"]

4. Verified CUDA Setup Inside the Container

  • Ran the following inside the container:
python -c "import torch; print(torch.cuda.is_available())"
  • Output:
  • False.
  • Confirmed CUDA availability:
python -c "import torch; print(torch.version.cuda)"

Output:

  • 11.8.

5. Confirmed NVIDIA Toolkit Installation

  • Ensured the NVIDIA Container Toolkit is installed and functional.
  • Verified with:
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

6. Tested with a Clean CUDA Container

  • Ran a clean test using:

bash

docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

  • Output:
    • The GPU is detected, and nvidia-smi works.

7. Verified Dependencies

  • Ensured PyTorch, torchvision, and torchaudio are installed correctly in the container.
  • Verified that torch.cuda.is_available() works on the same configuration locally.

Remaining Issue

Even after verifying GPU access and ensuring compatibility between CUDA, NVIDIA drivers, and PyTorch, the application inside the container consistently logs CUDA is not available. It’s unclear why the containerized PyTorch cannot detect CUDA.


Additional Context

  • Running the application directly on the host system (outside Docker) works flawlessly, and torch.cuda.is_available() returns True.
  • The same Conda environment and dependencies are used both locally and inside the container.

Help Needed

  • Are there additional steps needed to enable GPU access for PyTorch in Docker?
  • Is there a known issue with CUDA compatibility in containers using WSL2 backend?
  • Are there debugging steps or environment configurations I might have missed?

You never shared the Docker command or compose file of the container not working.

Here is my docker file:

FROM nvidia/cuda:11.8.0-base-ubuntu22.04

# Install Miniconda
WORKDIR /app
RUN apt-get update && apt-get install -y wget bzip2 build-essential libgl1 libglib2.0-0 && \
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /miniconda.sh && \
    bash /miniconda.sh -b -p /opt/conda && \
    rm /miniconda.sh && \
    rm -rf /var/lib/apt/lists/*
ENV PATH="/opt/conda/bin:$PATH"

# Copy the Conda environment file and install
COPY DiffI2I_Environment.yml .
RUN conda install -n base -c conda-forge mamba && \
    mamba env update -f DiffI2I_Environment.yml && \
    conda clean --all --yes

# Activate the Conda environment by default
ENV PATH="/opt/conda/envs/StyleCanvasAI/bin:$PATH"
SHELL ["conda", "run", "-n", "StyleCanvasAI", "/bin/bash", "-c"]

# Copy application files
COPY Diffi2i_Inference_Server.py DiffI2IModelEnum.py BetaSchedule.py S2_Parameters.py DiffI2I_S2.py common.py ddpm.py TensorMathTools.py style_canvas_utils.py InferenceImageProcessor.py DiffI2I_Inference.py S2ModelConfigurations.py FaceImageProcessor.py Face_Parsing_Model.py Diff_I2I_lib.py ./
COPY options/ ./options/
COPY ldm/ ./ldm/
COPY Resize_Model_Weights/yolov8l-face.pt Resize_Model_Weights/
COPY Resize_Model_Weights/RealESRGAN_x4plus.pth Resize_Model_Weights/
COPY checkpoints/OilPainting_SC3/DiffI2I_S2/diffi2i_s2_Model_1699.pth.tar checkpoints/OilPainting_SC3/DiffI2I_S2/
COPY checkpoints/OilPainting_SC3/settings.txt checkpoints/OilPainting_SC3/
COPY Test_Images/ ./Test_Images/

# Expose port 8000 for the FastAPI application
EXPOSE 8000

# Set entrypoint for running the server
CMD ["uvicorn", "Diffi2i_Inference_Server:app", "--host", "0.0.0.0", "--port", "8000", "--log-level", "debug"]

and here is my DiffI2I_Environment.yml file:

name: StyleCanvasAI

channels:
  - pytorch
  - conda-forge
  - defaults

dependencies:
  - python=3.9
  - matplotlib
  - numpy
  - opencv
  - pillow
  - scikit-image
  - tqdm
  - natsort
  - ultralytics
  - einops
  - flask
  - fastapi
  - dlib
  - lmdb
  - mamba
  - cudatoolkit=11.7
  - cudnn=9.2.1.18
  - pip
  - pytorch=2.0.1
  - torchvision=0.15.2
  - torchaudio=2.0.2
  - pip:
      - basicsr
      - realesrgan
      - uvicorn
      - dill

and here is the command I have been using:

docker run --gpus all -it -p 8000:8000 --name OilPaintingSC3_Docker diffi2i_oilpainting3

Since you use Docker Desktop on Windows (assuing because you mentioned “WSL backend”, where do you try to install the nvidia toolkit?

I couldn’t find now where the documentation says you don’t have to install anything in WSL, but you could only install anything in your own WSL2 distribution not where the Docker daemon is running.

I read that this issue could be caused by incompatible drivers, but I’m not sure which version should suppot which cuda / pytorch.

I have it on both my computer and the WSL see:

Also, I can access cuda when I run my program on my computer just fine, it is when I try to run it in a docker container that’s the problem

Docker containers run in a virtual machine in Docker Desktop’s own distribution where you couldn’t even install anything even if you wanted to. The client side (where your docker command runs) doesn’t matter. So because you can’t install anything in Docker Desktop’s distribution, all that could matter is what you have on Windows and the Docker image. Maybe Docker Dekstop version, but I’m not sure about that as the GPU support is basically provided by WSL2. Sorry, but for now that is all I could share. Hopefully someone will come who actually used CUDA on Windows in Docker Desktop

Does anyone know how to fix this???

Has anyone gotten this to work before ??

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.