We are setting up a Docker Swarm cluster for our Build/QA environment. Part of the environment includes GPU servers to test Neural net code. We need to serialize access to the GPU cards on those servers. In the best of all worlds I should be able to create a label like TitanX01 and then use a filter to allow only one container to use that label at a time.
In this world, I’m going to assign an unused port number and a label to each card and use the port filter and the label to access the card in Swarm.
Does anyone know of a better solution? Bonus points if there is a way to have the docker run command wait until the resource is available instead of just exiting.