Is there a simple pattern I’m missing for using a docker-compose and Dockerfiles to populate a data volume container within a VM from a dockerfile that orchestrates the import of the data into the database from raw datafiles so that I can then distribute the VM with the seeded data volume container and at a later point fire up a different database container that can mount that data volume?
I posted a related question on SO that has a bit more detail around what I tried so far:
I am trying to distribute a set of connected applications running in several linked containers that includes a mongo database that is required to:
- be distributed containing some seed data;
- allow users to add additional data.
Ideally the data will also be persisted in a linked data volume container.
I can get the data into the
mongo container using a
mongo base instance that doesn’t mount any volumes (dockerhub image:
psychemedia/mongo_nomount - this is essentially the base mongo Dockerfile without the
VOLUME /data/db statement) and a
Dockerfile config along the lines of:
ADD . /files WORKDIR /files RUN mkdir -p /data/db && mongod --fork --logpath=/tmp/mongodb.log && sleep 20 && \ mongoimport --db testdb --collection testcoll --type csv --headerline --file ./testdata.csv #&& mongod --shutdown
./testdata.csv is in the same directory (
./mongo-with-data) as the Dockerfile.
My docker-compose config file includes the following:
mongo: #image: mongo build: ./mongo-with-data ports: - "27017:27017" #Ideally we should be able to mount this against a host directory #volumes: # - ./db/mongo/:/data/db #volumes_from: # - devmongodata #devmongodata: # command: echo created # image: busybox # volumes: # - /data/db
Whenever I try to mount a VOLUME it seems as if the original seeded data - which is stored in
/data/db - is deleted. I guess that when a volume is mounted to
/data/db it replaces whatever is there currently.
That said, the docker userguide suggests that: Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization? So I expected the data to persist if I placed the VOLUME command after the seeding
So what am I doing wrong?
The long view is that I want to automate the build of several linked containers, and then distribute a
Vagrantfile/docker-compose YAML file that will fire up a set of linked apps, that includes a pre-seeded
mongo database with a (partially pre-populated) persistent data container.
What I ultimately want is a recipe for constructing a VM box that contains a seeded data volume container I can mount against a new mongo database container. (It would also be useful to have a way to save backups of the data volume container to host as well as a simple, non-command line recipe to restore those backups.)
The seeded data volume container will contain database files generated by a DBMS (mongo, or perhaps postgresql); the seeded database files themselves need to be constructed from raw data files imported into the DBMS.
What I had in mind was:
- use vagrant/dcoker-compose to fire up DBMS container, import data, saving it to /data/db in a linked data volume container;
- package up the VM box that contains the populated db container
- use a separate docker-compose script to fire up a mongo container and link it to the prepopulated docker volume container.