Docker-compose breaking nvidia-smi

Not sure this is relevant:

Last night I executed echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p when NPM was raising some issues on React application server.

Since this morning, I see that docker-compose is unusually slow, raising errors, breaking nvidia-smi and system reboot time increased. I reversed changes from the above command, but issues persist.

System Details

  • Nvidia DGX Station
  • Docker version: 18.09.2
  • Docker-Compose version: 1.24.1
  • Nvidia Driver version: 418.87
  • CUDA version: 10.1

Issue

On spinning up a container with docker-compose, it works first, slowly.

Then after a while,

1- nvidia-smi stops working with following issues

$nvidia-smi
Unable to determine the device handle for GPU 0000:0E:00.0: GPU is lost: Reboot the system to recover this GPU

2- Docker-Compose is too slow to respond and raises this sometimes

Comments

I am not sure how Docker is breaking nvidia-smi and its getting slower…

Any thoughts are appreciated.