How do we use cgroup freezer-subsystem release_agent node on docker container?

Hi team,

I’m having problems developing inside a docker container.

We are using the standard ubuntu:20.04 image.

We used the freezer feature of cgroup and utilized the release_agent node.

However, we don’t find the release_agent node in the docker container.

We forced a freezer mount in the container to a directory such as /root/mnt. At this point, we saw the release_agent node, but the script pointed to in the node pointed to the path of the HOST, not the CONTAINER.

Is there any limitation? Or are we missing something?

How can we get processes inside the container to use release_agent and look for scripts in the container environment?

Thanks

To be honest I haven’t tried release_agent until today and I didn’t have access to the cgroup filesystem even as root, so my answer is based on what I know about containers and what I have read today. Containers don’t have access everything on the host so some files will only be visible from the host. If you can see somthing in the container that was configured outside of the container you could see a content which is based on paths on the host., not in the container. The container is basically an isolation for a process. It usually doesn’t matter what the references are when the one process in the container has nothing to do with it and the host will handle it. In case of the release_agent file, it has to be handled by the host.

It is possible that I misunderstood you. In that case, please share more information what you exactly did, what commands you used in what order so we can try it and understand the issue better then based on descriptions which I am sure you understand clearly, but it might not be as obvious for us…

Thanks for your reply @rimelek

I show more information here so that we can understand each other.

Let’s do an experiment, first we get an ubuntu 20.04 image and bring up.

More specifically, we need to add some capabilities, e.g: –cap-add CAP_SYS_ADMIN --cap-add CAP_CHOWN

[on container side]
Inside the container, we create a new cgroup’s group, named ‘bar’:

mkdir bar && mount -t cgroup -o none,name=bar bar bar/

Then we create a new sub-group named ‘foo’:

mkdir bar/foo

Now, its directory structure will look like this:

bar
|-- cgroup.clone_children
|-- cgroup.procs
|-- cgroup.sane_behavior
|-- foo
|   |-- cgroup.clone_children
|   |-- cgroup.procs
|   |-- notify_on_release
|   `-- tasks
|-- notify_on_release
|-- release_agent
`-- tasks

Then we create an executable script inside the container, the full name is: ‘/tmp/hello.sh’:

cat << EOF > /tmp/hello.sh
#!/bin/bash
touch /tmp/on___container
exit 0
EOF

We add permissions to it to make sure it can be executed:

chmod a+x /tmp/hello.sh

After that, we write the full name of the script to the ‘bar/release_agent’ node.
For example:

echo "/tmp/hello.sh" > bar/release_agent

To trigger it, we enable 'bar/foo/notify_on_release’, which is set to 1:

echo 1 > bar/foo/notify_on_release

Finally, the trigger script is as follows:

sh -c "echo \$\$ > bar/foo/tasks"

I found that the ‘/tmp/on___container’ file is NOT generated inside the container.

Let’s try another experiment on the host.

[on host side]

I make a new script with the same name: /tmp/hello.sh on the host.

cat << EOF > /tmp/hello.sh
#!/bin/bash
touch /tmp/on___host
exit 0
EOF

caution: in this script, when it is called, a file on___host will be generated (NOT on___container)

Likewise, we add exceutable permissions to this script.

chmod a+x /tmp/hello.sh

We go back to the container and try to trigger release_agent again.

[on container side]

sh -c "echo \$\$ > bar/foo/tasks"

In the container environment, the ‘/tmp/on___host’ file is NOT generated, which is to be expected.

[on host side]

But, we can get the file ‘/tmp/on___host’ in the host.

It turns out that the file ‘/tmp/on___host’ has been generated in the host environment.

[QUESTION]

Can we really not use the release_agent node in a container and ONLY in the container environment NOT host environment?

Thanks.

1 Like

Thank you for the additional details!

A container is not a virtual machine. As I wrote before, everything is running on the host. The process inside the container could see files or processes and communicate with other processes depending on the configuration, but that’s all. Even if you can create a file inside the container, you need to reference other files as if you were on the host, because you actually are. You can refer to path inside the container only when that reference will be used by a process running inside the container but there is nothing in the container to execute the release agent script. As far as I know the kernel is responsible for doing it. I still can’t create any file under /sys/fs/cgroup, so I can’t test it.

I always say that think of a container as walls around a process. Even if you completely isolate yourself by walls, the world is still around you and there will be no new universe between the walls :slight_smile:

1 Like

I apologize for the late reply, we are on vacation.

Thank you for your reply. As you said, I looked at the kernel source code (version from 4.14.xxx) and tried to find the root cause.

When release_agent is triggered, the kernel execution entry is here:

void cgroup1_release_agent(struct work_struct *work)
{
    struct cgroup *cgrp =
        container_of(work, struct cgroup, release_agent_work);
    char *pathbuf = NULL, *agentbuf = NULL;
    char *argv[3], *envp[3];
    int ret;
 
    mutex_lock(&cgroup_mutex);
 
    pathbuf = kmalloc(PATH_MAX, GFP_KERNEL);
    agentbuf = kstrdup(cgrp->root->release_agent_path, GFP_KERNEL);
    if (!pathbuf || !agentbuf || !strlen(agentbuf))
        goto out;
 
    spin_lock_irq(&css_set_lock);
    ret = cgroup_path_ns_locked(cgrp, pathbuf, PATH_MAX, &init_cgroup_ns);
    spin_unlock_irq(&css_set_lock);
    if (ret < 0 || ret >= PATH_MAX)
        goto out;
 
    argv[0] = agentbuf;
    argv[1] = pathbuf;
    argv[2] = NULL;
 
    /* minimal command environment */
    envp[0] = "HOME=/";
    envp[1] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin";
    envp[2] = NULL;
 
    mutex_unlock(&cgroup_mutex);
    call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC);
    goto out_free;
out:
    mutex_unlock(&cgroup_mutex);
out_free:
    kfree(agentbuf);
    kfree(pathbuf);
}

Regarding the function cgroup1_release_agent, we see that it is triggered by the kernel and uses the host environment and does not use the isolated environment approach, e.g. namespace.
So the script specified by release_agent will run in the host environment.

A brief list of its execution flow is as follows:

> cgroup1_release_agent
 
-- call_usermodehelper
  -- call_usermodehelper_setup
    -- INIT_WORK(&sub_info->work, call_usermodehelper_exec_work);
  -- call_usermodehelper_exec
    -- queue_work(system_unbound_wq, &sub_info->work);
      -- call_usermodehelper_exec_work
        -- kernel_thread(call_usermodehelper_exec_async, sub_info, CLONE_PARENT | SIGCHLD);
          -- do_execve(getname_kernel(sub_info->path), (const char __user *const __user *)sub_info->argv, (const char __user *const __user *)sub_info->envp);

So, release_agent can ONLY be executed in the host environment, NOT in the docker container internal environment.

Do you agree with me? :slight_smile:

1 Like

I guess I wasn’t clear, but this is what I wanted to say. Thanks for the sourcecode reference.

@rimelek Thanks for your support :slight_smile: