Trying to separate dev stage image from prod stage build using mult-stage building

PLEASE follow this link to see the images, I’m a new user and can’t use more than 2 links or 1 embedded media: docker images · GitHub

Hey, I’ve been looking for any guide or article on internet with a similar situation, but found nothing.

Well, there’s this Node.js app that I built using Typescript + ExpressJS, and now I’m trying to setup docker to have both a working development container and a production one.

I’d like to have only one Dockerfile, once it is the recommended, and two docker-compose files, that would be docker-compose.yml for development, and docker-compose.prod.yml for production.

My project consists of the root folder, inside of it there are all these dockerfile and compose files, and also my package.json, src/ folder, yarn.lock and etc.

When I start the development container using compose, the goal is to “bind” my local src/ folder with the in container src/ folder, so when I make changes to the local source files, ts-node-dev that is running inside the container would update the app automatically. I’m not sure if it’s good to make a volume to the container node_modules, should I?

The problem starts when it comes to the production image.
I’m trying to use multi-stage build, even not completely understanding how it works.
For now I have this:

Dockerfile

FROM node:14.18.1-alpine3.14 as base
ENV APP=/home/node/app
WORKDIR $APP
RUN chown node:node $APP
USER node
COPY --chown=node:node package.json yarn.lock $APP/
RUN yarn install --frozen-lockfile &&\
    rm -rf "$(yarn cache dir)"
COPY --chown=node:node . $APP/
RUN yarn build

FROM base as prod
ENV APP=/home/node/app
WORKDIR $APP
RUN chown node:node $APP
USER node
COPY --chown=node:node --from=base "$APP/package.json" "$APP/yarn.lock" "$APP/dist/" $APP/
RUN NODE_ENV=production yarn install --frozen-lockfile

My development docker-compose is:

version: "3.8"

services:
    web-dev:
        container_name: selfapi
        build:
            context: .
            dockerfile: Dockerfile
            target: base
        ports:
            - ${PORT}:${PORT}
        environment:
            MONGODB_URL: ${MONGODB_URL}
            NODE_ENV: development
            AUTH_TOKEN: ${AUTH_TOKEN}
        volumes:
            - node_modules:/home/node/app/node_modules
            - ./src/:/home/node/app/src
            - ./tests/:/home/node/app/tests
        command: yarn dev2
volumes:
    node_modules:

Firstly, I’m not sure why it works without EXPOSE instruction in the Dockerfile, and without the key PORT in the environment in the compose file.
But anyways, if I run docker-compose up -d --build, this is what I get:

Image: Dev Image
As you see, 358,13 MB.

Volumes: Dev Volume
Acceptable size for node_volumes folder, but I’m not sure if this volume is necessary or I should only let the node_modules inside the container image, any opinion?

Container: Dev Container
If I make changes to the source files locally, the container does update the running app using ts-node-dev, but it only works if I use the flag --poll in the command - which is ts-node-dev --poll --respawn --transpile-only --inspect -- ./src/index.ts-, is there any way to avoid this flag and keep the thing working?

Now, the production part.
This is the docker-compose.prod.yml:

version: "3.8"

services:
    web:
        container_name: selfapi
        build:
            context: .
            dockerfile: Dockerfile
            target: prod
        ports:
            - ${PORT}:${PORT}
        environment:
            MONGODB_URL: ${MONGODB_URL}
            NODE_ENV: production
            AUTH_TOKEN: ${AUTH_TOKEN}
        command: yarn start

Running docker-compose -f ./docker-compose.prod.yml up -d --build, this is what I get:

Image: Prod image
Why is the production image size the double of development image? I mean, I know the node_modules folder is not in a separate volume anymore, but it seems that the dev image is somehow mixed with the production image, is it?

Here is the problem, if I check the node_modules folder inside the prod container, the devDependencies are there!!! I want to NOT have them in a production environment. With that multi-stage building, shouldn’t this work as I am thinking?

Also, couldn’t this be unified into a single COPY instruction?
COPY --chown=node:node package.json yarn.lock $APP/ and COPY --chown=node:node . $APP/

And also (again), isn’t this (in the production stage) unecessary?
COPY --chown=node:node --from=base "$APP/package.json" "$APP/yarn.lock" "$APP/dist/" $APP/
I’m looking at the prod container folders, and apparently those files AND the dist folder already exist in that directory. I thought since this is a multi-stage building process, the stuffs in dev stage wouldn’t be in the prod stage image.

Please, help me with that!!!

I felt there were so many questions unorganized that I couldn’t find time to try to understand. Now I can see I was not alone with this feeling (stackoverflow).

And this is one of the problems. Let me give you a friendly advice. When you don’t understand something which is very important part of your solution, learn it before you continue. Multi-stage build is something that is documented and people wrote many articles about it. If that is not enough, ask about that and you don’t have to write such a long question so people can answer more quickly.

Let’s see what the following code means:

FROM myimage as base
COPY myfile /myfile

FROM base as prod

1.) Use myimage to build a new image
2.) Copy myfile into the image
3.) Name the newly built image base internally so you can refer it later

4.) Use the previously built base image to build a new image
5.) Name the new image prod internally even though you will never use it later

OK, then what does the following code mean?

FROM myimage as base
COPY myfile /myfile

FROM base as prod
COPY --from=base /myfile /myfile

1.) Create a base image containing /myfile
2.) Create a prod image containing everything that base contains
3.) Copy myfile from the base image which is already in the prod since everything in base is also in prod

Well, I think you got the answer to this. You just copied $APP/dist into $APP again. Even if you copy the same files it won’t override those, it will create a new layer hiding the original files. Don’t use FROM base as pod. base is not an alias of node:14.18.1-alpine3.14. It’s an alias of the image you build in that stage. Of course you knew it since you copied files which were not in the node image. If you want to create an alias for the node image without extra content, you can do it this way:

FROM "node:14.18.1-alpine3.14" as base

FROM myimage as dev
COPY myfile /myfile

FROM base as prod
COPY --from=base /myfile /myfile

The same reason as I explained above. Everything in base will be in prod the way you built the image.

I know I didn’t answer all of your questions but some of the answers depend on the use case and I think you need to think it through what you are doing, what you want and ask a more specific question otherwise we would have to analyse an unknown code and investigate what different node modules do to give you multiple answers and explain whether different parts of your code are necessary or not.

2 Likes

Hey @rimelek I hope you’re doing great!

Please follow this to see the images (still new user): images reply · GitHub

Well, you are actually totally correct. I did read the multi-stage documentation, but it’s not clear for a typescript project, once I don’t know how other languages work for deploying a production app.

Well, let’s see if I can diminish the size of the question:

Having a single Dockerfile with those stages, can’t I make the production image NOT have the node_modules from the previous stage?

I’m not sure if you are familiar (probably you are) with Typescript/Javascript, but in Nodejs we have the dependencies and devDependencies.
Looking at this first stage:

FROM node:14.18.1-alpine3.14 as base

ENV APP=/app
WORKDIR $APP
COPY . $APP/
RUN yarn install --frozen-lockfile &&\
    rm -rf "$(yarn cache dir)"
RUN yarn build

I’m installing all my dependencies And devDependencies to the image, creating the node_modules folder with all of it, and then buildint the raw javascript code in the dist folder.
In the second stage:
Dockerfile
I am just copying that dist folder, and then installing only the necessary dependencies for production.
As you will see, this is how big the node_modules is for development and production, respectively:
Image 1
Image 2

If there is a way to have only the final node_modules folder in the production image, that’s what I’d like to know, because using dive in both my images, that’s what I see:
Image 3
On development image, there are 237 MB in the node_modules, shown in the last RUN before yarn build.

Now in the production:
Image 4
You can see now that in the production image there are 237 MB of that previous installation, and then only 35 MB in the production node_modules, therefore I could diminish from 237 to 35 MB from dev to production dependencies. BUT, as you said, all in the base image are present in the prod build, however you will see in the image below, that more than 90% of the production image size is the first installation, having all those bloat dependencies.
Image 5

Please let me know if I have written a lot of unnecessary stuffs again. From my perspective, all of this is kind of important to know.

It was alright, but I feel I have already answered the question you asked again. It doesn’t matter which programming language you use. The multi-stage build is the same. Don’t inherit the previous stage with FROM if you don’t want the next stage to contain everything from the previous stage. This is the only case I can think of in which it is helpful:

FROM image1 as production
RUN installing common softwares

FROM production as development
RUN installing debugging tools

If this logic suits your needs, you can use it. If there are some stuffs you want to use in production but not in development and. there are some other stuffs you want to use in development but not in production, you can do what I suggested in my previous message. Use at least three stags. One common stage, one for development and one for production.

If that is not enough either, than multi-stage build might not be for you.

I always think of multi-stage build as multiple stage of the build process, not the development and operation.

1 Like

Hey, I guess you opened my mind here… looking at you example dockerfile I just realized that I don’t actually need to start the second stage from a previous stage, i could just start it from an image, so that’s the result:

FROM node:14.18.1-alpine3.14 as base
ENV APP=/app
WORKDIR $APP
COPY . $APP/
RUN yarn install --frozen-lockfile &&\
    rm -rf "$(yarn cache dir)"
RUN yarn build

FROM node:14.18.1-alpine3.14 as prod
ENV APP=/home/node/app
WORKDIR $APP
RUN chown node:node $APP
USER node
COPY --chown=node:node --from=base ./app/dist/ $APP/dist/
COPY --chown=node:node package.json yarn.lock $APP/
RUN NODE_ENV=production yarn install --frozen-lockfile &&\
    rm -rf "$(yarn cache dir)"

Now the development image(created in the base stage) is 355 MB, and the production (final build) is only 155MB!

thanks for that, dude!