What is the purpose of adding user and group in these official Dockerfiles?

Official Docker Node Image Based on Debian

FROM buildpack-deps:bullseye

RUN groupadd --gid 1000 node \
  && useradd --uid 1000 --gid node --shell /bin/bash --create-home node

ENV NODE_VERSION 18.15.0

RUN ARCH= && dpkgArch="$(dpkg --print-architecture)" \
  && case "${dpkgArch##*-}" in \
    amd64) ARCH='x64';; \
    ppc64el) ARCH='ppc64le';; \
    s390x) ARCH='s390x';; \
    arm64) ARCH='arm64';; \
    armhf) ARCH='armv7l';; \
    i386) ARCH='x86';; \
    *) echo "unsupported architecture"; exit 1 ;; \
  esac \
  # gpg keys listed at https://github.com/nodejs/node#release-keys
  && set -ex \
  && for key in \
    4ED778F539E3634C779C87C6D7062848A1AB005C \
    141F07595B7B3FFE74309A937405533BE57C7D57 \
    74F12602B6F1C4E913FAA37AD3A89613643B6201 \
    DD792F5973C6DE52C432CBDAC77ABFA00DDBF2B7 \
    61FC681DFB92A079F1685E77973F295594EC4689 \
    8FCCA13FEF1D0C2E91008E09770F7A9A5AE15600 \
    C4F0DFFF4E8C1A8236409D08E73BC641CC11F4C8 \
    890C08DB8579162FEE0DF9DB8BEAB4DFCF555EF4 \
    C82FA3AE1CBEDC6BE46B9360C43CEC45C17AB93C \
    108F52B48DB57BB0CC439B2997B01419BD92F80A \
  ; do \
      gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys "$key" || \
      gpg --batch --keyserver keyserver.ubuntu.com --recv-keys "$key" ; \
  done \
  && curl -fsSLO --compressed "https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION-linux-$ARCH.tar.xz" \
  && curl -fsSLO --compressed "https://nodejs.org/dist/v$NODE_VERSION/SHASUMS256.txt.asc" \
  && gpg --batch --decrypt --output SHASUMS256.txt SHASUMS256.txt.asc \
  && grep " node-v$NODE_VERSION-linux-$ARCH.tar.xz\$" SHASUMS256.txt | sha256sum -c - \
  && tar -xJf "node-v$NODE_VERSION-linux-$ARCH.tar.xz" -C /usr/local --strip-components=1 --no-same-owner \
  && rm "node-v$NODE_VERSION-linux-$ARCH.tar.xz" SHASUMS256.txt.asc SHASUMS256.txt \
  && ln -s /usr/local/bin/node /usr/local/bin/nodejs \
  # smoke tests
  && node --version \
  && npm --version

ENV YARN_VERSION 1.22.19

RUN set -ex \
  && for key in \
    6A010C5166006599AA17F08146C2130DFD2497F5 \
  ; do \
    gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys "$key" || \
    gpg --batch --keyserver keyserver.ubuntu.com --recv-keys "$key" ; \
  done \
  && curl -fsSLO --compressed "https://yarnpkg.com/downloads/$YARN_VERSION/yarn-v$YARN_VERSION.tar.gz" \
  && curl -fsSLO --compressed "https://yarnpkg.com/downloads/$YARN_VERSION/yarn-v$YARN_VERSION.tar.gz.asc" \
  && gpg --batch --verify yarn-v$YARN_VERSION.tar.gz.asc yarn-v$YARN_VERSION.tar.gz \
  && mkdir -p /opt \
  && tar -xzf yarn-v$YARN_VERSION.tar.gz -C /opt/ \
  && ln -s /opt/yarn-v$YARN_VERSION/bin/yarn /usr/local/bin/yarn \
  && ln -s /opt/yarn-v$YARN_VERSION/bin/yarnpkg /usr/local/bin/yarnpkg \
  && rm yarn-v$YARN_VERSION.tar.gz.asc yarn-v$YARN_VERSION.tar.gz \
  # smoke test
  && yarn --version

COPY docker-entrypoint.sh /usr/local/bin/
ENTRYPOINT ["docker-entrypoint.sh"]

CMD [ "node" ]

Official Docker Node Image Based on Alpine

FROM alpine:3.16

ENV NODE_VERSION 18.15.0

RUN addgroup -g 1000 node \
    && adduser -u 1000 -G node -s /bin/sh -D node \
    && apk add --no-cache \
        libstdc++ \
    && apk add --no-cache --virtual .build-deps \
        curl \
    && ARCH= && alpineArch="$(apk --print-arch)" \
      && case "${alpineArch##*-}" in \
        x86_64) \
          ARCH='x64' \
          CHECKSUM="6c53e3e6a592dc8b304632b63d616978a06d9daad3157f063deabee8245e1541" \
          ;; \
        *) ;; \
      esac \
  && if [ -n "${CHECKSUM}" ]; then \
    set -eu; \
    curl -fsSLO --compressed "https://unofficial-builds.nodejs.org/download/release/v$NODE_VERSION/node-v$NODE_VERSION-linux-$ARCH-musl.tar.xz"; \
    echo "$CHECKSUM  node-v$NODE_VERSION-linux-$ARCH-musl.tar.xz" | sha256sum -c - \
      && tar -xJf "node-v$NODE_VERSION-linux-$ARCH-musl.tar.xz" -C /usr/local --strip-components=1 --no-same-owner \
      && ln -s /usr/local/bin/node /usr/local/bin/nodejs; \
  else \
    echo "Building from source" \
    # backup build
    && apk add --no-cache --virtual .build-deps-full \
        binutils-gold \
        g++ \
        gcc \
        gnupg \
        libgcc \
        linux-headers \
        make \
        python3 \
    # gpg keys listed at https://github.com/nodejs/node#release-keys
    && for key in \
      4ED778F539E3634C779C87C6D7062848A1AB005C \
      141F07595B7B3FFE74309A937405533BE57C7D57 \
      74F12602B6F1C4E913FAA37AD3A89613643B6201 \
      DD792F5973C6DE52C432CBDAC77ABFA00DDBF2B7 \
      61FC681DFB92A079F1685E77973F295594EC4689 \
      8FCCA13FEF1D0C2E91008E09770F7A9A5AE15600 \
      C4F0DFFF4E8C1A8236409D08E73BC641CC11F4C8 \
      890C08DB8579162FEE0DF9DB8BEAB4DFCF555EF4 \
      C82FA3AE1CBEDC6BE46B9360C43CEC45C17AB93C \
      108F52B48DB57BB0CC439B2997B01419BD92F80A \
    ; do \
      gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys "$key" || \
      gpg --batch --keyserver keyserver.ubuntu.com --recv-keys "$key" ; \
    done \
    && curl -fsSLO --compressed "https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION.tar.xz" \
    && curl -fsSLO --compressed "https://nodejs.org/dist/v$NODE_VERSION/SHASUMS256.txt.asc" \
    && gpg --batch --decrypt --output SHASUMS256.txt SHASUMS256.txt.asc \
    && grep " node-v$NODE_VERSION.tar.xz\$" SHASUMS256.txt | sha256sum -c - \
    && tar -xf "node-v$NODE_VERSION.tar.xz" \
    && cd "node-v$NODE_VERSION" \
    && ./configure \
    && make -j$(getconf _NPROCESSORS_ONLN) V= \
    && make install \
    && apk del .build-deps-full \
    && cd .. \
    && rm -Rf "node-v$NODE_VERSION" \
    && rm "node-v$NODE_VERSION.tar.xz" SHASUMS256.txt.asc SHASUMS256.txt; \
  fi \
  && rm -f "node-v$NODE_VERSION-linux-$ARCH-musl.tar.xz" \
  && apk del .build-deps \
  # smoke tests
  && node --version \
  && npm --version

ENV YARN_VERSION 1.22.19

RUN apk add --no-cache --virtual .build-deps-yarn curl gnupg tar \
  && for key in \
    6A010C5166006599AA17F08146C2130DFD2497F5 \
  ; do \
    gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys "$key" || \
    gpg --batch --keyserver keyserver.ubuntu.com --recv-keys "$key" ; \
  done \
  && curl -fsSLO --compressed "https://yarnpkg.com/downloads/$YARN_VERSION/yarn-v$YARN_VERSION.tar.gz" \
  && curl -fsSLO --compressed "https://yarnpkg.com/downloads/$YARN_VERSION/yarn-v$YARN_VERSION.tar.gz.asc" \
  && gpg --batch --verify yarn-v$YARN_VERSION.tar.gz.asc yarn-v$YARN_VERSION.tar.gz \
  && mkdir -p /opt \
  && tar -xzf yarn-v$YARN_VERSION.tar.gz -C /opt/ \
  && ln -s /opt/yarn-v$YARN_VERSION/bin/yarn /usr/local/bin/yarn \
  && ln -s /opt/yarn-v$YARN_VERSION/bin/yarnpkg /usr/local/bin/yarnpkg \
  && rm yarn-v$YARN_VERSION.tar.gz.asc yarn-v$YARN_VERSION.tar.gz \
  && apk del .build-deps-yarn \
  # smoke test
  && yarn --version

COPY docker-entrypoint.sh /usr/local/bin/
ENTRYPOINT ["docker-entrypoint.sh"]

CMD [ "node" ]

They are both Dockerfile for node official image (the difference is one is based on debian and one on alpine). Is adding a user and group needed in connection with node working properly or something else? Also a home directory is being created in at least the Dockerfile for the debian based image (the alpine based one may do so as well but I’m unable to discern that).

In contrast this article describing installation of node via binary does not include creating a group or user (in fact what is prescribed is very minimal and simple) IBM Documentation

And the most important (but certainly not least) question - if I write a Dockerfile to create a node image based on Ubuntu (something not provided by the offical node on docker hub) should I also include adding a group and user or even a home directory? And, if so, why?

Official Docker Node Image Based on Debian
Official Docker Node Image Based on Alpine

Generally: If an image creates and unprivileged user (and a group), either the Dockerfile uses it with the USER instruction, or the entrypoint script exec`s the main process using that user. In the later case, tools like gosu or su-exec are used to start the process as unprivileged user.

Why would you want to start your main application as unprivileged user? As the process runs with the permissions of the unprivileged user, a hacker that exploits the application in the container, is constraint by whatever the unprivileged user is allowed to do. It is easier to gather further information and attempt further exploits, if the exploited application is running as root.

Though, what does it mean for the node image? I don’t see any evidence that the unprivileged user is actually used to execute node. I guess the unprivileged user is just prepared, so you can use it in your own Dockerfile, by adding the instruction USER node to your Dockerfile (after everything that required root privileges) is done.

@meyay

Ok… I think I get that. But immediately it begs the question of how to implement such a concept in a more complex use case. If all you are running is one container (node for example) then you just have one thing to consider but if your ultimate goal is to run several containers together (node, some database, nginx perhaps) then which of those users do you use? Then you also have the base image that they all share (ubuntu perhaps). What I’m saying is it seems like you would have to consider how to organize this. I hope I’m not sounding crazy since this is my first encounter with this concept. I mean, it makes sense, just that I hadn’t thought about it before I read your response. I suppose my intuitive reaction would be to run everything under a user in the base operating system? Or to make the users of each container (node, lets call it mariadb, nginx) part of a group on the base operating system? I’m not really sure how to talk about it since system admin was never really my forte but now am having to work in that area a bit in order to do Docker properly.

If security is what you are thinking about then would it be best to log in as the same user in all of the containers you’re using or as a different user for each one? And what about making it so the person accessing the container has to supply credentials to gain access? Even to the extent of some kind of token or even encryption key being required? What are the parameters of this concept? What does the landscape of this concept look like?

Obviously it’s better to use different user UIDs for each image, and it’s even better if those UIDs and GIDs don’t even exist on the host. Ask yourself, whether you want a hacker who finds a way to break out of a container (there never is absolute security that someone doesn’t find the right combination of exploits to make it happen) to be able to access all the data the other containers use, or even whatever the user is allowed to do on the os? In that scenario would you prefer that only the exploited container’s data is at risk, or is it okay to risk data of other containers and the os user as well? The drawback of this approach is that it’s inconvenient compared to using a single UID. In home environments, most people are not willing to accept the inconvenience and use an os UID for all their containers. Though, In most corporate environments this would be a direct violation of governance and compliance policies.

Containers that only communicate with each other over the network don’t really need to use the same UID/GID - they are completely independent of each other. Though, If you have containers that access the same volume (e.g. one container downloads files, the other detects the files in the volume and performs some transformation), then you need to make sure the processes inside the containers use the same UID, or at least GID and proper permission to support that scenario.

There is “no login” into a container. Accessing a container from the command line, means to create a new shell process by exec`ing into the container (which is a new sibling process to the main process). You can not even prevent a user to start the terminal inside the container with whatever UID they want.
A container is an isolated process(!), which unlike a vm does not run system services in the background. Thus said, I am not really sure what to make of your last paragraph, as I am not sure how those could apply to containers.

@meyay

If security is what you are thinking about then would it be best to log in as the same user in all of the containers you’re using or as a different user for each one? And what about making it so the person accessing the container has to supply credentials to gain access? Even to the extent of some kind of token or even encryption key being required? What are the parameters of this concept? What does the landscape of this concept look like?

  1. I think I was concerned about how all this effects using containers. Since I’m just getting started learning about all these things I wasn’t sure how to state my concern. I guess I’m thinking that every time you introduce the need to log in which becomes a burden. I think you addressed this near the end of your reply but I’m not sure how much of it I was able to absorb. What I think I heard you say is that when its just containers interacting with one another it doesn’t matter (they do so automatically). And I think you said that even when the person interacts with a container there is no requirement to log in - but I’m not for sure if that’s what you meant.

fwiw I have used the -it flag with docker run - but the result was being logged in as root. I did watch a video on docker container security that explained something about creating a non-priviledged user combined with disabling root login where the result was you had to specify the username in order to get in. Am I remembering that right?

  1. When I mentioned …

And what about making it so the person accessing the container has to supply credentials to gain access? Even to the extent of some kind of token or even encryption key being required?

I was thinking about a use case where the person using the image (or several images in a connected system) have to be supplied with some token and / or encryption key in order to do anything with them. For example, a developer pulls an image from the organization’s repository (needs to supply login creds to do this in the first place), then, even though they have the image locally they can not do anything with it unless they have some token that they were supplied with that they get from the organization as a dev. So like every dev gets like a project token based on the team they are on and / or their unique identity and they need that to do anything with any image or images they get from the organization’s repo.

Put it another way…

You’re a dev and you get hired on to a new company. They assign you to work on project abc. As part of the resources you are supplied with you are given an access token that you will then use in order to use the container/s you are supplied with by the organization. No token / no access.

Basically I was just exploring the possible limits of container security thinking about things like that.

^ Just trying to give a sense of where I’m at and what’s on my mind and shed some light on that “last paragraph”

What makes it so damn hard to learn things for me is that there isn’t a source (that you don’t have to pay for) that teaches by hands on doing. It’s - read official docs, read some articles, ask questions, stumble around in the dark banging your head against walls trying to implement what little you got from it, binding up and healing from mortal wounds after all your gross failures. Weeks go by and if you come up with something that works its more of a fluke than anything intentional. Finally you learn some tiny little bit of something (because finally it has come by means of hands on application) but its not enough. Weeks have gone by.

Accomplishing anything feels like fighting through quicksand.

Its like being in HELL!