Will try to be brief here.
I have a code-base which utilizes lidar data from a 128Channel lidar and ROS2 to perform detections on it using OpenPCDet models.
Outside of Docker I re-play ros2 bag data and get around 10Hz + my code running at the same rate providing detections etc. Everything works fine.
I setup a docker image based on nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
and set the following enviroment variables
ENV TORCH_CUDA_ARCH_LIST="6.1;7.5;8.6;8.9"
ENV LANG=en_US
ENV FORCE_CUDA="1"
I use the exact same pytorch version which we use outside docker (though I’m using cuda 12.1 in container and 12.4 outside)
RUN pip3 install torch==2.4.0+cu121 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121
All of the dependencies for pip are the exact same as outside the container.
I initially saw extremely low performance and HUGE CPU usage (400-800% in htop) compared to around 120% outside docker.
After a great deal of debugging I found that setting the following variables fixed the cpu usage issue
ENV OMP_NUM_THREADS=1
ENV MKL_NUM_THREADS=1
In terms of performance of the model I see no different between having these variables set to 1 or not, but the CPU usage is completely normal now.
The remaining issue is that in the docker when I playback the ros2 bag data it plays at a lower rate of around 8hz and my detections between 4-6 Hz. (Note this was the same performance before I added the variables above to fix the CPU usage, the performance in docker has always been low, I just have normal cpu usage now)
I use docker compose with /dev:/dev mounted to access the actual lidar unit, network_mode as host, ipc as host and allow access to the GPU
environment:
- DISPLAY=${DISPLAY}
- "QT_X11_NO_MITSHM=1"
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix
- /dev:/dev:rw
- ./configs:/configs:rw
network_mode: host
ipc: host
privileged: true
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
I am at a loss of what could possible be wrong. If it’s a ROS thing, or a docker thing, or cuda.
I am hoping that someone here might have some experience with issues like this. If so I would greatly appreciate the help.