Cgroup memory.max is overridden

I’ve created cgroup mygroup.slice with the following script. The value of memory.max is 200M. I can see the content of memory.max is 209715200.

sudo mkdir -p /sys/fs/cgroup/mygroup.slice
echo "200M" | sudo tee "/sys/fs/cgroup/mygroup.slice/memory.max"
echo "+memory" | sudo tee "/sys/fs/cgroup/mygroup.slice/cgroup.subtree_control"
cat "/sys/fs/cgroup/mygroup.slice/memory.max"

Then I started a docker container with cgroup-parent.

docker run -d --cgroup-parent mygroup.slice --env CONTAINER_NAME=container1 --name container1 simple/container1

After that the content of memory.max is max instead of 209715200.

How can I stop docker to override that? Thanks.

Here are details of my environment.

cgroup v2 is used.

stat -fc %T /sys/fs/cgroup/
cgroup2fs

Here is the output of docker version

docker version
Client: Docker Engine - Community
 Version:           27.2.0
 API version:       1.47
 Go version:        go1.21.13
 Git commit:        3ab4256
 Built:             Tue Aug 27 14:15:45 2024
 OS/Arch:           linux/arm64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.2.0
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.21.13
  Git commit:       3ab5c7d
  Built:            Tue Aug 27 14:15:45 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.7.21
  GitCommit:        472731909fa34bd7bc9c087e4c27943f9835f111
 runc:
  Version:          1.1.13
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I guess it is because Docker is managed by systemd. You can create a slice using systemd. /sys/fs/cgroups will be populated by the systemd slice unit.

Create a test folder anywhere called cgroups.
Go to the folder:

cd cgroups

Create ./demo-dev.slice

[Unit]
Description=Demo Dev Slice

[Slice]
CPUQuota=20%
# MemoryLimit=60M
MemoryMax=60M
MemorySwapMax=0

Create the install script (optional, I did it to be able to add other demo slices easily)

./install.sh

#!/usr/bin/env bash

dir="$(cd "$(dirname "$0")" && pwd)"

sudo cp "$dir/demo-dev.slice" /etc/systemd/system/demo-dev.slice

sudo systemctl daemon-reload

Run the install script

chmod +x install.sh
./install.sh

Optionally you can also create an uninstaller

Create ./uninstall.sh

#!/usr/bin/env bash

deletable_files=(
  /etc/systemd/system/demo-dev.slice
)

echo "The following files will be deleted: "
echo

for i in "${deletable_files[@]}"; do
  echo "$i"
done
echo

echo -n "Are you sure? [y/N] "
read -r answer

if [[ "$answer" == "y" ]]; then
  echo "Deleting files: "
  echo
  for i in "${deletable_files[@]}"; do
    echo "Delete $i"
    sudo unlink "$i"
  done
  echo
  echo "Reload systemd daemon"
  sudo systemctl daemon-reload
fi
chmod +x uninstall.sh

Create a script that will allocate memory in the container: ./allocate.sh

#!/usr/bin/env bash

set -eu -o pipefail

fallocate -l "$ALLOCATE" /app/test
echo $(( $(stat /app/test -c "%s") / 1024 / 1024))

exec sleep inf
chmod +x  allocate.sh

Creatre the following compose.yml

x-allocate-dev-1:  &allocate-dev-1  ${ALLOCATE_DEV_1:-30m}
x-allocate-dev-2:  &allocate-dev-2  ${ALLOCATE_DEV_2:-30m}

x-service-base: &service-base
  image: bash:5.2
  command:
    - -c
    - /allocate.sh
  volumes:
    - ./allocate.sh:/allocate.sh
  mem_swappiness: 0
  memswap_limit: 0
  tmpfs:
    - /app
  init: true

services:
  dev-1: &service-dev
    <<: *service-base
    cgroup_parent: demo-dev.slice
    environment:
      ALLOCATE: *allocate-dev-1
  dev-2:
    <<: *service-dev
    depends_on:
     - dev-1
    environment:
      ALLOCATE: *allocate-dev-2

Create a script that runs the demo

#!/usr/bin/env bash

set -eu -o pipefail

dir="$(cd "$(dirname "$0")" && pwd)"

cd "$dir"

export ALLOCATE_DEV_1=${1:-30}m
export ALLOCATE_DEV_2=${2:-30}m


function status() {
  # https://serverfault.com/questions/907857/how-get-systemd-status-in-json-format
  systemctl show --no-page demo-$1.slice \
    | jq --sort-keys \
      --slurp \
      --raw-input \
      'split("\n")
        | map(select(. != "")
        | split("=")
        | {"key": .[0], "value": (.[1:] | join("="))})
        | from_entries
      '
}

function status_short() {
  status $1 \
    | jq 'with_entries(
            select(.key
              | in({"MemoryMax": 1, "CPUQuotaPerSecUSec": 1})))
              | .MemoryMax = (((.MemoryMax | tonumber) / 1024 / 1024 | tostring) + "m"
            )
         '
}

echo "Demo dev: "
status_short dev

echo "compose up"
docker compose up
chmod +x run.sh

Run the demo

./run.sh 30 30

It should fail

Demo:
{
  "CPUQuotaPerSecUSec": "200ms",
  "MemoryMax": "60m"
}
compose up
[+] Running 2/0
 ✔ Container cgroups-dev-1-1  Created                                                             0.0s
 ✔ Container cgroups-dev-2-1  Created                                                             0.0s
Attaching to dev-1-1, dev-2-1
dev-1-1  | 30
dev-2-1 exited with code 137

You can stop by pressing CTRL + C.

Run the following:

./run.sh 30 25

It should succeed:

Demo:
{
  "CPUQuotaPerSecUSec": "200ms",
  "MemoryMax": "60m"
}
compose up
[+] Running 2/0
 ✔ Container cgroups-dev-1-1  Created                                                             0.0s
 ✔ Container cgroups-dev-2-1  Recreated                                                           0.1s
Attaching to dev-1-1, dev-2-1
dev-1-1  | 30
dev-2-1  | 25

Note that although I allowed 60 megbytes, there is some extra memory usage in addition to what I allocated using fallocate so 30 + 29 would probably still fail.

You don’t need to use compose to make it work of course. The important part is how you create and install the systemd slice.

1 Like

Thanks for your detailed reply. I’ve checked the cgroup driver with docker info and it is systemd. I followed your steps and I can now limit the memory.