If anyone is interested, here is the approach I have implemented:
I publish an image called neo4j-data-bootstrap on our private registry. This images has the data I want to provide the developers with, if they don’t have a local Neo4j instance. The data is stored within the image at /bootstrap-files and when the image is run it checks if the /opt/neo4j/data directory exists and, if not, the image initializes this directory with the bootstrapping data in /bootstrap-files.
The docker-compose has 2 services related to neo4j:
1- The neo4j-data-bootstrap which runs the container based on the image of the same name. It also binds the neo4-data named volume to /opt/neo4j
2- The neo4j-db which runs a simple neo4j instance. This service also binds the neo4-data named volume to /opt/neo4j. This service depends_on neo4j-data-bootstrap.
The net results looks like this:
- A developer run docker-compose up.
- Docker-compose computes the dependencies and establish that neo4j-data-bootstrap service should be started first.
- The neo4j-data-bootstrap:latest is pull from the remote or local registry.
- The neo4j-data-bootstrap image is instance into a container which populates /opt/neo4j directory (binded to neo4j-data named volume) if it’s empty.
- Shortly after neo4j-data-bootstrap start-up, the neo4j-db service is started by docker-compose. This service is also binded to the neo4j-data named volume and has either the initialization data from last step if this is the first run, or whatever data was in the from the previous runs otherwise.
Now, I know that there is no enforcement by the depends_on to wait for the bootstrap container to be “done” before starting the neo4j-db container. In practice, the bootstrap container unzip two files (total size of about 50MB) and those unzip operations are done long before neo4j tries to read its config file.
If there’s ever a concurrency problem, I’ll simple use an entrypoint on the neo4j-db process to delay the startup of the process within the container until the data is done unzipping.
COPY neo4j.zip /bootstrap-files/neo4j_bootstrap.zip
COPY graph.db.tar.gz /bootstrap-files/neo4j_data.tar.gz
CMD if [ ! -d “neo4j/conf” ]; then
unzip -qq -n /bootstrap-files/neo4j_bootstrap.zip ‘neo4j/*’ -d /opt &&
echo “Unzipped Neo4j configuration”;
echo “Neo4j configuration found.”;
if [ ! -d “neo4j/data/graph.db” ]; then
tar -xzf /bootstrap-files/neo4j_data.tar.gz --directory /opt/neo4j/data &&
echo “Unzipped Neo4j data”;
echo “Neo4j data found.”;
echo “yes” > /opt/neo4j/ready.txt &&
echo “Data-only container for workjam-services bootstrapped”