How to create mongodb image and initialize data via dockerfile

Hi everyone,
I tried to generate a mongodb image with initialization data using dockerfile, but it’s not found the imported database when running the final image.
generate data to import

mongodump -d Test -o dump
tar zcvf db.tar.gz dump

Dockerfile

FROM mongo
MAINTAINER Ding
VOLUME /data/db

ADD ./db.tar.gz /home
WORKDIR /home
RUN mongod --fork --dbpath /data/db --logpath mongo.log && mongorestore --nsInclude ‘Test.*’ dump/ --drop

EXPOSE 27017

docker build

ding@VM-0-13-ubuntu:~$docker build -f Dockerfile -t mongodb:test .
Sending build context to Docker daemon 59.59MB
Step 1/7 : FROM mongo
—> ccf4b4ee3bee
Step 2/7 : MAINTAINER Ding
—> Running in 8943389d180f
Removing intermediate container 8943389d180f
—> e076456d2b9d
Step 3/7 : VOLUME /data/db
—> Running in 7c5827a01419
Removing intermediate container 7c5827a01419
—> 812949e1d62f
Step 4/7 : ADD ./db.tar.gz /home
—> f8d296d114f6
Step 5/7 : WORKDIR /home
—> Running in 90adbd40bcee
Removing intermediate container 90adbd40bcee
—> 120da328f9cd
Step 6/7 : RUN mongod --fork --dbpath /data/db --logpath mongo.log && mongorestore --nsInclude ‘Test.*’ dump/ --drop
—> Running in 4ab4e61aa3b3
about to fork child process, waiting until server is ready for connections.
forked process: 8
child process started successfully, parent exiting
2021-09-25T06:33:00.443+0000 preparing collections to restore from
2021-09-25T06:33:00.443+0000 reading metadata for Test.users from dump/Test/users.metadata.json
2021-09-25T06:33:00.456+0000 restoring Test.users from dump/Test/users.bson
2021-09-25T06:33:00.467+0000 finished restoring Test.users (0 documents, 0 failures)
2021-09-25T06:33:00.468+0000 no indexes to restore for collection Test.users
2021-09-25T06:33:00.468+0000 0 document(s) restored successfully. 0 document(s) failed to restore.
Removing intermediate container 4ab4e61aa3b3 1,1 All
—> 9499467a37e0
Step 7/7 : EXPOSE 27017
—> Running in f7148454faf7
Removing intermediate container f7148454faf7
—> 3b7ce739d419
Successfully built 3b7ce739d419
Successfully tagged mongodb:qnc

The running result seems to be successful,as follows:
mongod is run in the background

about to fork child process, waiting until server is ready for connections.
forked process: 8
child process started successfully, parent exiting

mongorestore

2021-09-25T06:33:00.443+0000 preparing collections to restore from
2021-09-25T06:33:00.443+0000 reading metadata for Test.users from dump/Test/users.metadata.json
2021-09-25T06:33:00.456+0000 restoring Test.users from dump/Test/users.bson
2021-09-25T06:33:00.467+0000 finished restoring Test.users (0 documents, 0 failures)
2021-09-25T06:33:00.468+0000 no indexes to restore for collection Test.users
2021-09-25T06:33:00.468+0000 0 document(s) restored successfully. 0 document(s) failed to restore.

I run the image from step 6 (id 9499467a37e0) , but mongodb did not contain the Test database. It shows that the RUN execution failed. Then I run mongod --fork --dbpath /data/db --logpath mongo.log && mongorestore --nsInclude 'Test.*' dump/ --drop in bash again, the Test database is found in mongodb.

Why didn’t dockerfile build as I expected? How to solve this problem?

How often have you tested? Any chance that at some point the volume for /data/db/ was created and was no longer empty?

The RUN commands are executed when creating the image, not when running the container for the first time. So, if all is well, your database in the image’s /data/db/ should indeed include the test data. But you’re also using VOLUME /data/db. So, whatever is changed in the image’s /data/db during RUN, will be invisible when that volume is mounted into that same location. Now, Docker may populate new, empty volumes with data from the image, on first run. But maybe testing multiple times already put something in that volume, making Docker skip that copy-to-empty-volume when starting the container, hence ignoring what is in the image’s /data/db/?

You could test by removing the volume and restart the container?

In my early tests, I did not use the volume field, but the test db was not successfully imported into mongodb. Each time docker build succeeds, I execute docker run from two different images,after RUN (such as id 9499467a37e0) and sccessfully built(such as id 3b7ce739d419) . I just input docker run -it imageid /bin/bash without -v,the volume is not actually mounted.
I deleted the volume field and rebuild the image by new dockerfile. I use intermediate image, after RUN, to start the container.Although it shows that mongodb has successfully run in the background and executed the Import command, it is strange that /data/db is empty.

mongod --fork –dbpath /data/db --logpath mongo.log
about to fork child process, waiting until server is ready for connections.
forked process: 8
child process started successfully, parent exiting

Then I immediately execute mongod --fork --dbpath /data/db --logpath mongo.log && mongorestore --nsInclude 'Test.*' dump/ --drop in bash, many data files have been added to /data/db. The same happens when the container is run for the first time.

If “the volume is not actually mounted” is the case (which could be true), then what is the use for VOLUME in the Dockerfile? :thinking:

Anyway, as you seem to have tested without that too, I’m out of ideas. You could import the testdata when actually starting the container for the first time (using a custom ENTRYPOINT), but I feel that the RUN command to put it into the image should work too.

Thank you for your reply.
I really want to use the volume mount to implement mongodb persistence, but I need to understand the dockerfile mechanism first, because I have no experience before. So you can see that I used the volume field in the dockerfile, but I did not add -v to docker run command. :sweat_smile:
At present, my solution is that after the mongodb container is started, another service container remotely creates the core collections and basic documents through the network bridge. The script, includes dockerfile building and micro service initialization, seems to meet our project requirements normally now.
But I still want to know what is wrong with my previous operation.

Are you now saying you’re not sure about your earlier:

I’d start with reading the Dockerfile reference, which explains for an example:

[…] causes docker run to create a new mount point at /myvol and copy the greeting file into the newly created volume.

I’d interpret that as: a volume is being created at runtime regardless using -v or not. But I may be wrong.

More interesting, it also says:

Changing the volume from within the Dockerfile: If any build steps change the data within the volume after it has been declared, those changes will be discarded.

As you have VOLUME /data/db almost on top in your Dockerfile, I guess that explains your problems for the test cases where you did use VOLUME. (But not for your “In my early tests, I did not use the volume field, but the test db was not successfully imported into mongodb.”)

(Aside: I’ve never used VOLUME. And when not using VOLUME when creating the image, one can surely still use docker run -v to mount any part of the container’s file system to a volume.)

It is an important message, and I will use Volume correctly in future building:

Changing the volume from within the Dockerfile : If any build steps change the data within the volume after it has been declared, those changes will be discarded.

In my early tests means that I used dockerfile without adding Volume field,and executed docker run without -v, but when I called bin/bash to access the docker, /data/db had no data.