I have several containers running on a Docker stack. Recently it happened that I could no longer access the system properly, because one container caused a permanent 100% CPU load.
Restricting a container is possible with the following settings:
To stay e.g. always under 90% CPU load, each container should get only a fraction of the 90% as CPU limit, which makes little sense, because the containers should be able to use more CPU, if other containers do not use it at the moment.
How can I achieve that the Docker Stack or alternatively Docker uses a maximum of 90% CPU without setting limits for the containers individually? I would be interested in the same for the memory.
Linux processes (and a container is nothing else) will always try to consume the ressources they are allowed to consume. If you don’t restrict the ressources, the processes will fight for whatever is there.
With docker (at least for AWS ECS this is true, which uses docker under the hood) a node’s capacity, that is above the sum of all reservations, is equaly shared amongst the containers based on the distribution of the reservations. If Container A has 100M memory reserved and Container B has 1000M memory reserved, Container B will get 10x spare ressource shares than Container A. The same should be true for the CPU.
Though, a memory limit (=hard limit) will result in a OutOfMemory kill, if the consumed memory is above the limit. In corporate context, you would monitor your ressource usage with something like Prometheus and Grafana to identify how much ressources a container actualy realy needs and cap it there. In the homelab people typicaly head for overcommiting the ressources and fight the sideffects of it
Thus said: you might want to try how the behavior is, if you just set the cpu limit to the 90% value for all your containers. With no cpu requested, they should equaly share the free cpu ressources.
update: (doh) haven’t thought this thru. The suggestion to only set a cpu limit does not limit the ressources of all containers to consume up to 90% of the node’s ressource. It just prevents a single container to consume more then 90% - but generaly does not prevent overcommitment of the cpu.
Thanks for your quick reply. I just tried it again with the following example. The limits add up, which means that exactly one CPU is used (instead of only half a CPU).