Not sure this is relevant:
Last night I executed
echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p
when NPM was raising some issues on React application server.
Since this morning, I see that docker-compose is unusually slow, raising errors, breaking nvidia-smi
and system reboot time increased. I reversed changes from the above command, but issues persist.
System Details
- Nvidia DGX Station
- Docker version: 18.09.2
- Docker-Compose version: 1.24.1
- Nvidia Driver version: 418.87
- CUDA version: 10.1
Issue
On spinning up a container with docker-compose, it works first, slowly.
Then after a while,
1- nvidia-smi stops working with following issues
$nvidia-smi
Unable to determine the device handle for GPU 0000:0E:00.0: GPU is lost: Reboot the system to recover this GPU
2- Docker-Compose is too slow to respond and raises this sometimes
Comments
I am not sure how Docker is breaking nvidia-smi and its getting slower…
Any thoughts are appreciated.