Docker swarm limit ressources

Hello,

I have several containers running on a Docker stack. Recently it happened that I could no longer access the system properly, because one container caused a permanent 100% CPU load.

Restricting a container is possible with the following settings:

version: "3.9"
services:
  redis:
    image: redis:alpine
    deploy:
      resources:
        limits:
          cpus: '0.50'
          memory: 50M
        reservations:
          cpus: '0.25'
          memory: 20M

To stay e.g. always under 90% CPU load, each container should get only a fraction of the 90% as CPU limit, which makes little sense, because the containers should be able to use more CPU, if other containers do not use it at the moment.

How can I achieve that the Docker Stack or alternatively Docker uses a maximum of 90% CPU without setting limits for the containers individually? I would be interested in the same for the memory.

Either you set limits, or you don’t.

Linux processes (and a container is nothing else) will always try to consume the ressources they are allowed to consume. If you don’t restrict the ressources, the processes will fight for whatever is there.

With docker (at least for AWS ECS this is true, which uses docker under the hood) a node’s capacity, that is above the sum of all reservations, is equaly shared amongst the containers based on the distribution of the reservations. If Container A has 100M memory reserved and Container B has 1000M memory reserved, Container B will get 10x spare ressource shares than Container A. The same should be true for the CPU.

Though, a memory limit (=hard limit) will result in a OutOfMemory kill, if the consumed memory is above the limit. In corporate context, you would monitor your ressource usage with something like Prometheus and Grafana to identify how much ressources a container actualy realy needs and cap it there. In the homelab people typicaly head for overcommiting the ressources and fight the sideffects of it :slight_smile:

Thus said: you might want to try how the behavior is, if you just set the cpu limit to the 90% value for all your containers. With no cpu requested, they should equaly share the free cpu ressources.

update: (doh) haven’t thought this thru. The suggestion to only set a cpu limit does not limit the ressources of all containers to consume up to 90% of the node’s ressource. It just prevents a single container to consume more then 90% - but generaly does not prevent overcommitment of the cpu.

1 Like

Thanks for your quick reply. I just tried it again with the following example. The limits add up, which means that exactly one CPU is used (instead of only half a CPU).

version: '3.7'

services:
  stress:
    image: progrium/stress
    command: --vm 2 --vm-bytes 512M
    deploy:
      replicas: 1
      resources:
        limits:
          cpus: '0.50'
          memory: 750M
  stress2:
    image: progrium/stress
    command: --vm 2 --vm-bytes 512M
    deploy:
      replicas: 1
      resources:
        limits:
          cpus: '0.50'
          memory: 750M

Is there any other way to limit the resources? I read something on the internet about cgroups, but I don’t know it very well.