Docker Community Forums

Share and learn in the Docker community.

Need design help on DB approach: named volume vs data containers


(Alexandre) #1

Hi,

I am in the process of Dockerizing some applications and their dependencies. One of these dependency is a Neo4j database (including some bootstrap data inside the DB).

My general approach for this project is to provide the developers with a self-sufficient docker-compose.yml file. What I am by that is that upon running “docker-compose up”, the applications should be up, running and fully working, which is not anything shocking.

So, as I said, Neo4j is a dependency for one of these application. There’s a ~4MB graph.db required in order to start the application. I would also like developers to be able to override this graph.db file if they need to do do.

What I thought in order to tackle this problem was:

  • Build a neo4jdbdata image and push it on our private registry
  • The Neo4j instance (DB runtime process) uses volumes_from neo4jdbdata
  • If the developer need to overwrite the DB, the can use the -v flag to mount their own graph.db into the Neo4j instance

Now my questions are:

  • Would that really work (especially the overwrite with -v part)
  • Is it a good practice?
  • I’ve read that named volumes are preferred now. How can docker-compose pull an already built named volume that would be… somewhere?

Thanks!


(Alexandre) #2

If anyone is interested, here is the approach I have implemented:

I publish an image called neo4j-data-bootstrap on our private registry. This images has the data I want to provide the developers with, if they don’t have a local Neo4j instance. The data is stored within the image at /bootstrap-files and when the image is run it checks if the /opt/neo4j/data directory exists and, if not, the image initializes this directory with the bootstrapping data in /bootstrap-files.

The docker-compose has 2 services related to neo4j:
1- The neo4j-data-bootstrap which runs the container based on the image of the same name. It also binds the neo4-data named volume to /opt/neo4j
2- The neo4j-db which runs a simple neo4j instance. This service also binds the neo4-data named volume to /opt/neo4j. This service depends_on neo4j-data-bootstrap.

The net results looks like this:

  • A developer run docker-compose up.
  • Docker-compose computes the dependencies and establish that neo4j-data-bootstrap service should be started first.
  • The neo4j-data-bootstrap:latest is pull from the remote or local registry.
  • The neo4j-data-bootstrap image is instance into a container which populates /opt/neo4j directory (binded to neo4j-data named volume) if it’s empty.
  • Shortly after neo4j-data-bootstrap start-up, the neo4j-db service is started by docker-compose. This service is also binded to the neo4j-data named volume and has either the initialization data from last step if this is the first run, or whatever data was in the from the previous runs otherwise.

Now, I know that there is no enforcement by the depends_on to wait for the bootstrap container to be “done” before starting the neo4j-db container. In practice, the bootstrap container unzip two files (total size of about 50MB) and those unzip operations are done long before neo4j tries to read its config file.
If there’s ever a concurrency problem, I’ll simple use an entrypoint on the neo4j-db process to delay the startup of the process within the container until the data is done unzipping.

Dockerfile-neo4j-data-bootstrap:

COPY neo4j.zip /bootstrap-files/neo4j_bootstrap.zip
COPY graph.db.tar.gz /bootstrap-files/neo4j_data.tar.gz

WORKDIR /opt
CMD if [ ! -d “neo4j/conf” ]; then
unzip -qq -n /bootstrap-files/neo4j_bootstrap.zip ‘neo4j/*’ -d /opt &&
echo “Unzipped Neo4j configuration”;
else
echo “Neo4j configuration found.”;
fi &&
if [ ! -d “neo4j/data/graph.db” ]; then
tar -xzf /bootstrap-files/neo4j_data.tar.gz --directory /opt/neo4j/data &&
echo “Unzipped Neo4j data”;
else
echo “Neo4j data found.”;
fi &&
echo “yes” > /opt/neo4j/ready.txt &&
echo “Data-only container for workjam-services bootstrapped”


Best practice about named volume in Swarm