Docker commit creates large images

mihab · October 6, 2016, 11:31am

We are using docker to handle our test database.
We have a DB2 database that is about 100GB big. Our integration tests depend on the database.

The goals are :

every developer should have his copy of the database, so he can run tests on it. Running instance of database should be fast and cheap in terms of memory.
if database gets corrupted due to developer error or stale (because other developer changed tests) it should be easy to get another (or updated) copy of the database.
if developer changes database because he changed the tests it should be fast and cheap in terms of memory to commit changes to an image so others can use the updated image.
handling of images process should be as simple as possible (one image that contains everything you need, if possible)

We made an image with database instance and all the data inside. The size of an image is about 100 GB.

A running container made from this image takes only up additional 2 GB of disk. That means if we need 10 test databases, the cost is 100GB + 2 x 10 GB = 120 GB.
Starting the database takes under 3 minutes. So we’ve got goal #1 covered.
Developer can easily start a new container from the image and remap it to another port. The costs are 2GB and 3 minutes. So that covers goal #2.
Committing to an image is one instruction (docker commit) and creating container from the image is one instruction (docker run). That is all an average developer needs, to handle his database. So goal #4 is covered.

Problem lies in goal #3. When we want to commit (docker commit) the changes in the container to an image, the costs are too big. It seems that docker has no way to store only the difference between the image and the container.
What we end up with after commit is an 100 GB + 70 GB = 170 GB image. As if docker copied entire datafiles of database not only the block that changed.
So committing changes is neither fast nor cheap. If every commit takes up 70 GB we’ll run out of space soon.
We’ve found a workaround to export container to .tar and import it to an image. That produces an image of size 100 GB, which is OK in terms of size, but is to slow to do often.
Another drawback is that we loose history (you only get one commit, that is initial import)

I’ve seen advice to use Data volumes and Data volume containers, but i don’t really see the upside.
As far as i can understand it uses data outside union file system but has several disadvantages for our use case:
a) data is persistent after container is finished (we usually don’t want to save changes, changes should be saved only when developer decides he wants to change the test)
b) we must handle this volumes separately and backup it, that violates goal #4. This is not simple.
c) as far as i can understand, if we ran 10 databases, each would need it’s own data volume. That means 100 GB x 10. That is a lot more than the current 120 GB for 10 databases.
The time to copy this data can’t be much better than the time currently needed to save changes of the container

I’ve seen advice to use flocker. I’m not sure if that would solve our problem. Either way, we don’t want a technology we are not sure it will survive.
Docker has it’s share of users and it’s growing, so that means somebody is working on bugs and features.

Do you have any advice, how to solve our problems with goal #3 ?
Is there another way or a Storage Driver that handles commits more efficiently ?

history of our database
[root@docker2 /]# docker history be31572cc1f7
IMAGE CREATED CREATED BY SIZE COMMENT
be31572cc1f7 22 hours ago db2start 78.94 GB Saved container running for 5 days
11da702b2a52 9 days ago 109.7 GB Imported from -

Our settings:

docker info
Containers: 6
Running: 6
Paused: 0
Stopped: 0
Images: 1
Server Version: 1.12.1
Storage Driver: devicemapper
Pool Name: docker-thinpool
Pool Blocksize: 524.3 kB
Base Device Size: 161.1 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 173.2 GB
Data Space Total: 237.3 GB
Data Space Available: 64.04 GB
Metadata Space Used: 29.34 MB
Metadata Space Total: 1.929 GB
Metadata Space Available: 1.9 GB
Thin Pool Minimum Free Space: 23.73 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: host bridge overlay null
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.1.12-61.1.10.el7uek.x86_64
Operating System: Oracle Linux Server 7.2
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 31.17 GiB
Name: docker2
ID: 223J:GCMH:LJ4K:LIJ2:ALXL:UYBZ:EULI:TTX6:6W2D:7ITO:74GS:3B4V
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:

sidatdocker543 · May 9, 2019, 5:44pm

@mihab
Did you have any advantage with docker export ? I had same issue on docker commit.

My initial volume when I spawn a instance from my dockerfile is just 400M. But The docker commit is giving a volume of 5G. Docker export even creating a huge volume. Wondering what could be the cause!