Is there a simple pattern I’m missing for using a docker-compose and Dockerfiles to populate a data volume container within a VM from a dockerfile that orchestrates the import of the data into the database from raw datafiles so that I can then distribute the VM with the seeded data volume container and at a later point fire up a different database container that can mount that data volume?
I posted a related question on SO that has a bit more detail around what I tried so far:
I am trying to distribute a set of connected applications running in several linked containers that includes a mongo database that is required to:
- be distributed containing some seed data;
- allow users to add additional data.
Ideally the data will also be persisted in a linked data volume container.
I can get the data into the mongo
container using a mongo
base instance that doesn’t mount any volumes (dockerhub image: psychemedia/mongo_nomount
- this is essentially the base mongo Dockerfile without the VOLUME /data/db
statement) and a Dockerfile
config along the lines of:
ADD . /files
WORKDIR /files
RUN mkdir -p /data/db && mongod --fork --logpath=/tmp/mongodb.log && sleep 20 && \
mongoimport --db testdb --collection testcoll --type csv --headerline --file ./testdata.csv #&& mongod --shutdown
where ./testdata.csv
is in the same directory (./mongo-with-data
) as the Dockerfile.
My docker-compose config file includes the following:
mongo:
#image: mongo
build: ./mongo-with-data
ports:
- "27017:27017"
#Ideally we should be able to mount this against a host directory
#volumes:
# - ./db/mongo/:/data/db
#volumes_from:
# - devmongodata
#devmongodata:
# command: echo created
# image: busybox
# volumes:
# - /data/db
Whenever I try to mount a VOLUME it seems as if the original seeded data - which is stored in /data/db
- is deleted. I guess that when a volume is mounted to /data/db
it replaces whatever is there currently.
That said, the docker userguide suggests that: Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization? So I expected the data to persist if I placed the VOLUME command after the seeding RUN
command?
So what am I doing wrong?
The long view is that I want to automate the build of several linked containers, and then distribute a Vagrantfile
/docker-compose YAML file that will fire up a set of linked apps, that includes a pre-seeded mongo
database with a (partially pre-populated) persistent data container.
What I ultimately want is a recipe for constructing a VM box that contains a seeded data volume container I can mount against a new mongo database container. (It would also be useful to have a way to save backups of the data volume container to host as well as a simple, non-command line recipe to restore those backups.)
The seeded data volume container will contain database files generated by a DBMS (mongo, or perhaps postgresql); the seeded database files themselves need to be constructed from raw data files imported into the DBMS.
What I had in mind was:
- use vagrant/dcoker-compose to fire up DBMS container, import data, saving it to /data/db in a linked data volume container;
- package up the VM box that contains the populated db container
- use a separate docker-compose script to fire up a mongo container and link it to the prepopulated docker volume container.