Seeding data volume containers - mongodb

Is there a simple pattern I’m missing for using a docker-compose and Dockerfiles to populate a data volume container within a VM from a dockerfile that orchestrates the import of the data into the database from raw datafiles so that I can then distribute the VM with the seeded data volume container and at a later point fire up a different database container that can mount that data volume?


I posted a related question on SO that has a bit more detail around what I tried so far:

I am trying to distribute a set of connected applications running in several linked containers that includes a mongo database that is required to:

  • be distributed containing some seed data;
  • allow users to add additional data.

Ideally the data will also be persisted in a linked data volume container.

I can get the data into the mongo container using a mongo base instance that doesn’t mount any volumes (dockerhub image: psychemedia/mongo_nomount - this is essentially the base mongo Dockerfile without the VOLUME /data/db statement) and a Dockerfile config along the lines of:

ADD . /files
WORKDIR /files
RUN mkdir -p /data/db && mongod --fork --logpath=/tmp/mongodb.log && sleep 20 && \
mongoimport  --db testdb --collection testcoll  --type csv --headerline --file ./testdata.csv  #&& mongod --shutdown

where ./testdata.csv is in the same directory (./mongo-with-data) as the Dockerfile.

My docker-compose config file includes the following:

mongo:
  #image: mongo
  build: ./mongo-with-data
  ports:
    - "27017:27017"
  #Ideally we should be able to mount this against a host directory
  #volumes:
  #  - ./db/mongo/:/data/db
  #volumes_from:
  #  - devmongodata

#devmongodata:
#    command: echo created
#    image: busybox
#    volumes: 
#       - /data/db

Whenever I try to mount a VOLUME it seems as if the original seeded data - which is stored in /data/db - is deleted. I guess that when a volume is mounted to /data/db it replaces whatever is there currently.

That said, the docker userguide suggests that: Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization? So I expected the data to persist if I placed the VOLUME command after the seeding RUN command?

So what am I doing wrong?

The long view is that I want to automate the build of several linked containers, and then distribute a Vagrantfile/docker-compose YAML file that will fire up a set of linked apps, that includes a pre-seeded mongo database with a (partially pre-populated) persistent data container.


What I ultimately want is a recipe for constructing a VM box that contains a seeded data volume container I can mount against a new mongo database container. (It would also be useful to have a way to save backups of the data volume container to host as well as a simple, non-command line recipe to restore those backups.)

The seeded data volume container will contain database files generated by a DBMS (mongo, or perhaps postgresql); the seeded database files themselves need to be constructed from raw data files imported into the DBMS.

What I had in mind was:

  • use vagrant/dcoker-compose to fire up DBMS container, import data, saving it to /data/db in a linked data volume container;
  • package up the VM box that contains the populated db container
  • use a separate docker-compose script to fire up a mongo container and link it to the prepopulated docker volume container.

So… to answer my own question:

  • simple YAML to create a mongob container and a linked data volume container

  • use vagrant docker compose to create the containers

  • in Vagrantfile, lines of the form:

    config.vm.provision :shell, :inline => <<-SH
    docker exec -it -d vagrant_mongo_1 mongoimport --db a5 --collection roads --type csv --headerline --file /files/AADF-data-minor-roads.csv
    SH

to import the data.

Then package the box…

For the distribution, use the box as the basebox, simple YAML file to launch the containers and mount the data volume container…