Build does not persist chmod (SUID/SGID) changes when using Docker SDK, unlike docker build CLI

I encountered an issue when building a Docker image using the Python Docker SDK. The behavior occurs only with a specific base image so far: gradle:7.6-jdk17-alpine.

In my Dockerfile, I include the following command in a RUN instruction to find and remove files with SUID/SGID permissions:

  • RUN find / -type d -name proc -prune -o -perm /u=s,g=s -exec chmod -s ‘{}’ ;

This step executes successfully during the build (as confirmed in the build logs), but the permission changes do not persist in the resulting image when using the Docker SDK.

However, when I build the exact same Dockerfile using the Docker CLI (docker build), the permission changes are correctly applied and persist in the final image.

This discrepancy suggests that the Docker SDK’s APIClient().build() method may not properly track or commit permission-only changes to the layer (e.g., chmod -s). The issue might be related to Docker’s layer diffing or snapshot mechanism, especially for metadata-only changes (like permissions, ownership, or timestamps).

I’ve verified that:

  • The DOCKER_BUILDKIT=1 environment variable is set before using the SDK.
  • The issue is not present in other images I’ve tested — only with gradle:7.6-jdk17-alpine.

Please confirm if this is a known limitation of the Docker SDK build process, or if there’s a workaround to ensure such changes persist in the final image when built via the SDK.

Thanks in advance!

docker build is using buildx now which is a plugin so it is not included in any Docker SDK. If you want the same behavior you will need to build images using the same command, not the SDK.

On the other hand, permissions should have always been persisted, so there could be a missing detail in this issue or a bug in the SDK. The point of building an image is that everything is persisted in that image. Including permission changes, but I never tried SUID and GUID.

I’ve developed a script that uses the Docker SDK to build images, apply security hardening steps, and run a validation workflow to ensure the final image complies with security standards, even when a custom Dockerfile is provided.

One of the hardening steps involves modifying file permissions. This process works correctly with most base images, including those from Docker Hub and even from scratch. However, I encountered an issue specifically with the gradle image: file permissions are not modified when building via the script using the Docker SDK. Interestingly, when I run the exact same Dockerfile manually using the Docker CLI, the permissions are updated as expected.

This leads me to think there may be a subtle behavior difference or limitation in how the Docker SDK handles certain permission changes, especially since everything else in the pipeline works correctly for other images.

If it happens with only the gradle image, we should probably find out what the difference is between that and other images. I’m not aware of any intentional difference in the image build compared to the new buildx that would break permissions, especially if only in one image.

I could imagine changes only when something in the image starts and changes it back or copies something from somewhere which has the original permissions. If nothing runs before you check it, I don’t see what would change it.

Have you tried changing a file for example in a directory that you wouldn’t need otherwise just to see if it happens with every single file or just those you work with and has nothing to do with gradle?

Thanks!

I might have found the root cause. I asked ChatGPT for suggestions, and it pointed out that permission changes might not persist if the target directory is defined as a volume mount.

So I inspected the gradle image, and indeed, the directory I’m trying to modify (/home/gradle/.gradle) is declared as a volume:

        "Volumes": {
            "/home/gradle/.gradle": {}
        },
        "WorkingDir": "/home/gradle",
        "Entrypoint": [
            "/__cacert_entrypoint.sh"
        ],
        "OnBuild": null,
        "Labels": null
    },
    "Architecture": "amd64",

This would explain why permission changes made via the Docker SDK don’t persist. For Docker CLI maybe the process is litle bit diffrent and can modify file permissions even if is declared as volume!.

I thought about volumes, that is why I asked my last question, but just because of the files are on a volume, permissions shouldn’t be changed. In fact, when you have a volume, files are copied out from the container to the volume while keeping the permissions if there is any file in the folder you set as mount point as long as the volume folder on the host is empty. but the volume is used only when you start a container, not while building the image so how the image was built should not matter.

I do remember one thing though, which I could not explain later. I remember I tried to create a Docker image which had a volume definition in the Dockerfile (Now I think defining a volume in a Dockerfile is a bad idea and bad practice anyway) and depending on whether my commands wee before it or after it, it worked differently.

So I asked Gordon now specifically about how the legacy builder handled volume definitions and I got this:

When a VOLUME is declared in a Dockerfile, it creates a mount point and marks it as holding externally mounted volumes. However, any changes made to the data within the volume after it has been declared in the Dockerfile are discarded when using the legacy builder. This means that if you modify the contents of a volume during the build process, those changes will not be reflected in the final image when using the legacy builder.

I share it only becuse I experienced something like it once, but I can’t confirm that every statement is right in the quoted text. But let’s say it is true. Than if you define volume before you run a command that changes the permission, that could be lost. But it is not about permissions than but about any change in the folder that you define as a volume. In this case, you can move the volume definition to the end of the Dockerfile where I usually had those, so I guess that’s why I had the mentioned issue probably only once.

But the volume definition you refer to is not in your Dockerfile but was in the Dockerfile of the base image. Since you cannot undo a volume, if it was the problem, you won’t be able to solve it in the image.

You could create an entrypoint which does what I mentioned before as a possible cause of the issue. The entrypoint script could change the permissions when the container starts. Or you can create an init container and do it from another container as I described here:

If the init container is a dependency of the main containr, it could fix the permissions before the main container needs it.

1 Like

Thanks a lot for the detailed explanation and for checking!

That clears things up for me. I understand now that changes to a volume after it’s declared, especially in a base image, won’t be preserved, which explains the behavior I’m seeing.

I’ll look into handling this differently if possible.

Thanks again for your help! That’s enough to close the topic on my side.