Ok, so in our SaaS we use GPU instances from community clouds like vast.ai or runpod, where we provide the docker image we want to run and the host is controlled by the provider.
We have noticed that sometimes the system clock time of host machines is off by a few seconds, which creates weird bugs in monitoring and the metrics of orchestrating a number of GPU instances .
Imagine having a bunch of worker machines and they don’t agree which time it is.
So, I was hoping I could run a tool like chrony
inside the docker container, unfortunately that doesn’t seem to work since docker seems to share the system clock time.
Any ideas for this problem?