Cgroup memory.max is overridden

I’ve created cgroup mygroup.slice with the following script. The value of memory.max is 200M. I can see the content of memory.max is 209715200.

sudo mkdir -p /sys/fs/cgroup/mygroup.slice
echo "200M" | sudo tee "/sys/fs/cgroup/mygroup.slice/memory.max"
echo "+memory" | sudo tee "/sys/fs/cgroup/mygroup.slice/cgroup.subtree_control"
cat "/sys/fs/cgroup/mygroup.slice/memory.max"

Then I started a docker container with cgroup-parent.

docker run -d --cgroup-parent mygroup.slice --env CONTAINER_NAME=container1 --name container1 simple/container1

After that the content of memory.max is max instead of 209715200.

How can I stop docker to override that? Thanks.

Here are details of my environment.

cgroup v2 is used.

stat -fc %T /sys/fs/cgroup/
cgroup2fs

Here is the output of docker version

docker version
Client: Docker Engine - Community
 Version:           27.2.0
 API version:       1.47
 Go version:        go1.21.13
 Git commit:        3ab4256
 Built:             Tue Aug 27 14:15:45 2024
 OS/Arch:           linux/arm64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.2.0
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.21.13
  Git commit:       3ab5c7d
  Built:            Tue Aug 27 14:15:45 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.7.21
  GitCommit:        472731909fa34bd7bc9c087e4c27943f9835f111
 runc:
  Version:          1.1.13
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I guess it is because Docker is managed by systemd. You can create a slice using systemd. /sys/fs/cgroups will be populated by the systemd slice unit.

Create a test folder anywhere called cgroups.
Go to the folder:

cd cgroups

Create ./demo-dev.slice

[Unit]
Description=Demo Dev Slice

[Slice]
CPUQuota=20%
# MemoryLimit=60M
MemoryMax=60M
MemorySwapMax=0

Create the install script (optional, I did it to be able to add other demo slices easily)

./install.sh

#!/usr/bin/env bash

dir="$(cd "$(dirname "$0")" && pwd)"

sudo cp "$dir/demo-dev.slice" /etc/systemd/system/demo-dev.slice

sudo systemctl daemon-reload

Run the install script

chmod +x install.sh
./install.sh

Optionally you can also create an uninstaller

Create ./uninstall.sh

#!/usr/bin/env bash

deletable_files=(
  /etc/systemd/system/demo-dev.slice
)

echo "The following files will be deleted: "
echo

for i in "${deletable_files[@]}"; do
  echo "$i"
done
echo

echo -n "Are you sure? [y/N] "
read -r answer

if [[ "$answer" == "y" ]]; then
  echo "Deleting files: "
  echo
  for i in "${deletable_files[@]}"; do
    echo "Delete $i"
    sudo unlink "$i"
  done
  echo
  echo "Reload systemd daemon"
  sudo systemctl daemon-reload
fi
chmod +x uninstall.sh

Create a script that will allocate memory in the container: ./allocate.sh

#!/usr/bin/env bash

set -eu -o pipefail

fallocate -l "$ALLOCATE" /app/test
echo $(( $(stat /app/test -c "%s") / 1024 / 1024))

exec sleep inf
chmod +x  allocate.sh

Creatre the following compose.yml

x-allocate-dev-1:  &allocate-dev-1  ${ALLOCATE_DEV_1:-30m}
x-allocate-dev-2:  &allocate-dev-2  ${ALLOCATE_DEV_2:-30m}

x-service-base: &service-base
  image: bash:5.2
  command:
    - -c
    - /allocate.sh
  volumes:
    - ./allocate.sh:/allocate.sh
  mem_swappiness: 0
  memswap_limit: 0
  tmpfs:
    - /app
  init: true

services:
  dev-1: &service-dev
    <<: *service-base
    cgroup_parent: demo-dev.slice
    environment:
      ALLOCATE: *allocate-dev-1
  dev-2:
    <<: *service-dev
    depends_on:
     - dev-1
    environment:
      ALLOCATE: *allocate-dev-2

Create a script that runs the demo

#!/usr/bin/env bash

set -eu -o pipefail

dir="$(cd "$(dirname "$0")" && pwd)"

cd "$dir"

export ALLOCATE_DEV_1=${1:-30}m
export ALLOCATE_DEV_2=${2:-30}m


function status() {
  # https://serverfault.com/questions/907857/how-get-systemd-status-in-json-format
  systemctl show --no-page demo-$1.slice \
    | jq --sort-keys \
      --slurp \
      --raw-input \
      'split("\n")
        | map(select(. != "")
        | split("=")
        | {"key": .[0], "value": (.[1:] | join("="))})
        | from_entries
      '
}

function status_short() {
  status $1 \
    | jq 'with_entries(
            select(.key
              | in({"MemoryMax": 1, "CPUQuotaPerSecUSec": 1})))
              | .MemoryMax = (((.MemoryMax | tonumber) / 1024 / 1024 | tostring) + "m"
            )
         '
}

echo "Demo dev: "
status_short dev

echo "compose up"
docker compose up
chmod +x run.sh

Run the demo

./run.sh 30 30

It should fail

Demo:
{
  "CPUQuotaPerSecUSec": "200ms",
  "MemoryMax": "60m"
}
compose up
[+] Running 2/0
 ✔ Container cgroups-dev-1-1  Created                                                             0.0s
 ✔ Container cgroups-dev-2-1  Created                                                             0.0s
Attaching to dev-1-1, dev-2-1
dev-1-1  | 30
dev-2-1 exited with code 137

You can stop by pressing CTRL + C.

Run the following:

./run.sh 30 25

It should succeed:

Demo:
{
  "CPUQuotaPerSecUSec": "200ms",
  "MemoryMax": "60m"
}
compose up
[+] Running 2/0
 ✔ Container cgroups-dev-1-1  Created                                                             0.0s
 ✔ Container cgroups-dev-2-1  Recreated                                                           0.1s
Attaching to dev-1-1, dev-2-1
dev-1-1  | 30
dev-2-1  | 25

Note that although I allowed 60 megbytes, there is some extra memory usage in addition to what I allocated using fallocate so 30 + 29 would probably still fail.

You don’t need to use compose to make it work of course. The important part is how you create and install the systemd slice.

Thanks for your detailed reply. I’ve checked the cgroup driver with docker info and it is systemd. I followed your steps and I can now limit the memory.