Docker Community Forums

Share and learn in the Docker community.

Data Volume Recommendation

(Safaci2000) #1

I have a large postgres database that i’m building into a docker container.

Right now, I’m downloading a compressed databaes dump and restoring it during the docker build process. So it takes a 1.7 gb compressed file and generates a docker image of about 40GB.

The down side of this is of course deployment of the docker image or pulling it from dtr is super slow.

I was wondering if anyone had any suggestions on how to better manage the volume data.

This is my current docker file.

FROM internal_db:base

COPY *.Fc /tmp/sql/
COPY /tmp/sql/

RUN /usr/pgsql-10/bin/pg_ctl -D /postgresql/pg_data/10/ start && \
  /tmp/sql/ /tmp/sql && \
  /usr/pgsql-10/bin/pg_ctl -D /postgresql/pg_data/10/ stop

USER root

RUN /usr/bin/rm -Rf /tmp/sql

USER postgres

the script it calls is this does a few sanity checks then ends up running this line.

pg_restore -p 5432 --dbname=postgres -Fc --create --verbose --jobs=4 --no-tablespaces /tmp/sql/akdb-extract-for-docker.Fc which does most of the work.

On the one hand I like the fact the data is in a container because i can take advantage of the image reset to restore the DB data to an original state, but at the same time having a container that large seems ilke an anti-pattern.

Anyone have a better suggestion on how to do this? (This also cause serious issues in the past with docker VMs running out fo space / memory and so on while building the image)

(Xtrasimplicity) #2

Is there any particular reason why you aren’t storing the postgres data into a persistent volume/mount? In this situation, I would usually have the image form a basic database server, and I’d import the data by calling the postgres import commands (via docker run).

Then, to add persistence, I would mount a persistent volume/host folder to the postgres data directory (SHOW data_directory;).

Or have I misunderstood what you’re trying to do?

(Safaci2000) #3

so, a few things. Please correct me on any of this if i’m just being naive or misunderstanding your suggestion.

  1. loading the data can take anywhere from 20-30 which is why it’s part of the image currently.

  2. Persistent volume I assume is just using the VOLUME keyword to declare it in the image and then mounting it via the -v parameter or the appropriate docker-compose tag.

I know docker has added more refinement to the volume but I believe it’s still essentially mapping a local directory to a folder on the container.

My issue with volumes is that if I want to ‘reset’ the image to the original state, then I would have to re-build the image once more to restore. Is there the support for a immutable volume? or basically

volume state 0 + delta, upon reset wipes delta and reverts of state 0?

Thanks for any help and I hope this clears things up a bit?