Setting up dockerized development environment

I would like to make it easier for contributors to my project to get started by dockerizing not only the deployment of the application, but also the development environment. The second thing is proving to be much harder than the first thing.

Goals of the setup:

1 Users can edit repo files on their native file system, then compile or otherwise operate on those files in a Docker container

2 Builds cache persists beyond individual containers

3 Program outputs persist beyond individual containers

4 No files owned by root are left on the host file system after working in the container

5 (preferred) - container doesn’t run as root - runs with least possible privilege

I think these are ordinary goals that anyone setting up a dockerized dev environment would have, but it’s honestly not clear to me how to set this up. Docker is fighting me every step of the way. I don’t see how any of my problems are unique to me, any dev environment would need these things - right?

Suppose the root of my project’s repo is $REPO.

(1) I think means I want to bindmount $REPO:/repo

(2) Because it’s hard to know where each dev’s system build cache is located, it’s hard to bindmount that. I would prefer the build cache to be a managed volume rather than a bindmount. So, I’ve been trying to define a managed volume, buildcache:/path/to/buildcache. I think any dev environment that has cacheable build objects needs to do something like this. We don’t want to download all of our project’s packages and rebuild the world every time we launch a container.

My build system happens to be Rust’s cargo, and cargo caches 3rd party packages (crates) at $CARGO_HOME, and other build objects at $REPO/target. I’m bindmounting $REPO, so that will persist. I just have to have a buildcache volume and make sure that $CARGO_HOME points to the mount point.

(Another problem, but this one is really I think this is Cargo’s fault, is that $CARGO_HOME is not only a place where things get cached. It’s also where Rust stores the installed toolchain. So if I change the toolchain in an image layer and then mount a clean $CARGO_HOME on top of it when starting a container … I blow away installed toolchain. Cargo really ought to give me a single directory that serves as a build cache and nothing else, so I can mount a volume there. There’s already a bug filed about this on Cargo’s github.)

(3) I can chose to have my program output files somewhere underneath /repo, say /repo/program_outputs, which is bindmounted to $REPO/program_outputs. That takes care of this requirement.

(4) This is where everything goes to hell. Docker wants to run everything as root. Argh. Argh! Why? Can’t there be a flag that says “I want to run this as myself”?

As it is, by default Docker leaves root-owned files in $REPO/target and $REPO/program_outputs. The simplest things, like analyzing my program_outputs with an analysis program on the host system, are now a colossal pain.

I want to run with the same UID and GID in the container as I have on the host. Isn’t this a normal thing to want? But in order to do it I have to do some sort of janky ARG thing to pass UID and GID to the image layer

ARG HOST_UID
ARG HOST_GID
RUN useradd myuser
RUN usermod -u ${HOST_UID}
RUN usermod -g ${HOST_GID}

USER myuser
 

Now instead of letting all my contributors use docker-compose run I have to provide some sort of shell script that makes sure the image gets built with the right UID and GID passed in.

But even when I do that, it’s not enough to get this container working right - the managed volume buildcache is always owned by root, so, when I run as an unprivileged user with my host UID, the buildcache volume is unwriteable.

I read somewhere in the bowels of docker’s github issues that when a managed volume is created, it inherits the UID and GID from the image that first used it, and this is the blessed way to handle the problem. Well, unfortunately, it doesn’t seem to work at all. My managed volume is always owned by root. Is it because I’m defining the volumes in my docker-compose.yml?

# this doesn't make the buildcache volume owned by myuser
ARG HOST_UID
ARG HOST_GID
RUN useradd myuser
RUN usermod -u ${HOST_UID}
RUN usermod -g ${HOST_GID}

RUN mkdir /path/to/buildcache
RUN chown myuser /path/to/buildcache

USER myuser

Thanks in advance for any advice. Sorry if this came across as a whine. I have been headbanging for a while now.

Hi.

Found this thread, might be useful:

1 Like

This really took some doing, and I wasn’t able to achieve one of my goals of making this work in “pure docker”. I had to write a script to wrap the invocation of docker-compose. But I eventually did get all the must-have requirements.

To run with the correct UID and GID for the bindmount, I exported those values from the host when building the image. A potential issue is that every developer has to build his own image, so the image can’t be shared from a registry. That is not yet a problem for us.

In what follows, passing WHO is not necessary, but doing it this way makes it clear to the user that they are operating with the same UID inside and outside of the container, and I think is the least surprising thing.

develop.sh

#!/bin/bash

export HOST_UID=$(id -u)
export HOST_GID=$(id -g)
export WHO=$(whoami)

docker-compose run play

A few tricks I learned about docker-compose.yml made things easier.
First of all, that I can pass args from docker-compose to Dockerfile with the args attribute made a big difference. This way I can allow the default behavior of docker-compose --build to work.

For convenience, I made a point of hosting bash history on a separate persistent volume. Usually you want to keep your command history when developing, even if you blow away a container, or even rebuild an image. I also mounted Rust’s toolchain and build caches, and any dev environment that has a build step will have to do something similar.

docker-compose.yml

version: "3.2"

services:
  play:
    build: 
      args:
        # run the container as the current user so that permissions will work across the 
        # bindmount and so that the container doesn't have escalated privs.
        HOST_UID: "${HOST_UID}"
        HOST_GID: "${HOST_GID}"
        WHO: "${WHO}"
      context: "."
      dockerfile: "play/Dockerfile"
    volumes:
      - '.:/seraphim'
      - 'rust:/rust' # rust toolchain & build cache
      - 'bash:/bash' # bash history
      - 'data:/data'
    environment:
      - SERAPHIM=/data # seraphim binaries look for and put their data in $SERAPHIM
      - HISTFILE=/bash/history
    working_dir: /seraphim

volumes:
  rust:
  bash:
  data:

play/Dockerfile

FROM rust:1.32

ARG HOST_UID
ARG HOST_GID
ARG WHO

RUN groupadd ${WHO} -g ${HOST_GID}
RUN useradd -d /home/${WHO} -ms /bin/bash -g ${WHO} ${WHO} 
RUN usermod -u ${HOST_UID} ${WHO}
RUN usermod -g ${HOST_GID} ${WHO}

# all of this chowning is necessary so that the volumes that will be created to mount 
# in these locations inherit the correct ownership bits
RUN mkdir /bash
RUN chown -R ${WHO}:${WHO} /bash 

RUN mkdir /data
RUN chown -R ${WHO}:${WHO} /data

RUN mkdir /rust
RUN mkdir /rust/rustup
RUN mkdir /rust/cargo
RUN chown -R ${WHO}:${WHO} /rust

# this is where Rust puts 3rd party packages and the actual toolchain binaries, respectively
# we want them to go into our persistent mount
ENV CARGO_HOME /rust/cargo
ENV RUSTUP_HOME /rust/rustup

USER ${WHO}

RUN rustup install nightly
RUN rustup default nightly

Thanks for this link. I knew that this was the only endorsed way to change volume ownership, but it was convenient to have the reference.

In the end I was able to do this by using the ARGS to pass the correct UID from docker-compose to the Dockerfile - that was the missing piece. Before that, I had no way cause the Dockerfile to build the correct image the first time that docker-compose run was invoked.