Service update does not resolve conflicts on generic resources

Hello!

I have a swarm cluster that uses generic resources (GPUs). All nodes have several slots, to simplify let it be two nodes with two slots each. One of services consumes much resources, so I use --replicas-max-per-node=1 for it (to be more precise, I use serviceSpec.TaskTemplate.Placement.MaxReplicas field of golang API, but I don`t think that really matters). I also use start-first order for update to minimise the downtime (it sometimes takes a long time to pull images). Today I faced following situation while updating that service (i.e. service_a):

  1. node_1
    a. Slot 1 - service_a:previous_version
    b. Slot 2 - empty
  2. node_2
    a. Slot 3 - service_b
    b. Slot 4 - service_c

Update was stalled with reason “no suitable node”. I solved it with docker service update --force service_b, that made Slot 3 on node_2 empty. Is there a way to resolve such conflicts automatically? Or writing some automation myself is the only way?

Interesting, I thought devices like GPU are not supported in Docker Swarm (issue).

When we update Traefik, which uses the “scarce” resource port 80+443, we usually use stop-first, to not get into a resource conflict.

You are right - there is no official support of GPUs. However, using nvidia container toolkit + generic resources helps to solve that problem. When I have researched, I haven`t seen a complete up-to-date guide, but I can provide you with mine if you need.

Using stop-first can solve the conflict here, but in my case it causes situations, when there is no running instance of service (when we update model weights, it causes update to large docker layers, up to 1.5-2 GiB). And the situation in original post does not have conflict actually - there is enough resources for all running instances and one deploying instance, the conflict is caused by greedy resource reservation algorithm. In this situation conflict can be easily discovered and resolved, but I am afraid that if conflict appears after automatic redeployment (i.e. node outage), it would be quiet challenging even to discover that conflict.