Some way to clean up / identify contents of /var/lib/docker/overlay

Hi there.

Well, first of all, I would like to say that this kind of bug is the reason many people complain and say bad thing about open source / free software. This kind of problem that sticks forever with no solution most of times brings infeasibility of use.

Anyway, I am facing this problem and I am proceeding with some investigation. I discover some things about this issue. In my case, I have a dockerfile with Wildfly image that copies a .war application and rebuilds it every deploy and the image is reloaded into docker and started again after removing the old container.

I tried the above clean up command but nothing works. In the docker inspect command I found that section:

"GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/fa8c221a6be224f62cc89b01ed2e3332d039a973315497a1fd08eeadbfd8e89e-init/diff:/var/lib/docker/overlay2/4b6ccf942c76c0ea2c228afaff989f000847be1fed361ed2dabc38f15df2fe1f/diff:/var/lib/docker/overlay2/c309a80b5965c0b65308cd766fc1a996da08f96bf920c4b7ff69f05337f61535/diff:/var/lib/docker/overlay2/005195bc5bfb582da94a6054a2f0fdb1e214075916a2d759a42c801ad2175897/diff:/var/lib/docker/overlay2/1e8844c07e7e3b2ae6504bd85d4e3cfa2bcdfb73b9c54107e56404ecb66c784f/diff:/var/lib/docker/overlay2/0ade4795c69216431991716ea03448a9b79a2e2a72c03513a87b3b8e30f7fe79/diff:/var/lib/docker/overlay2/866e1f845c6ba69501b16e24441e74e80291f03b29446d10977fcd4f27afd1fc/diff:/var/lib/docker/overlay2/438f88a38ba85e10106e971ba966085b37770dfc547dc4b43fe424bf1aad4480/diff",
                "MergedDir": "/var/lib/docker/overlay2/fa8c221a6be224f62cc89b01ed2e3332d039a973315497a1fd08eeadbfd8e89e/merged",
                "UpperDir": "/var/lib/docker/overlay2/fa8c221a6be224f62cc89b01ed2e3332d039a973315497a1fd08eeadbfd8e89e/diff",
                "WorkDir": "/var/lib/docker/overlay2/fa8c221a6be224f62cc89b01ed2e3332d039a973315497a1fd08eeadbfd8e89e/work"
            },
            "Name": "overlay2"
        },

So, in the overlay folder you many folders with content of different containers. It seems docker keeps track of all runs of the same image:version plus the container name. Everytime I redeploy my image is with the same container name, image name and tagged version (latest), so it keeps track of the diff folder of the previous runs.

When I found the container owning the 40GB folder inside overlay, I stopped it and removed it with docker stop and docker rm and the folder was cleaned up to 0 bytes. I redeployed everything and the initial size was small. I discovered that it was a problem with my app that grows the log too much, but removing the container and redeploying solved the problem.

I suspect this problem is related with problems of the images, not necessarily a docker problem. Anyway, there is no easy path to find and clean this up. Let’s do some more investigation.

4 Likes

I was able to clean it up while upgrading the Docker version.

Stopped Docker (sudo systemctl stop docker)
Uninstalled Docker (sudo yum remove docker-ce)
Nuked the directory (sudo rm -r /var/lib/docker/overlay2),
then re-created the directory (sudo mkdir /var/lib/docker/overlay2).

Then re-installed Docker (sudo yum install docker-ce)
And restarted the Daemon (sudo systemctl start docker)

You need to shut down everything before doing this, and you may lose containers/volumes/etc. I have no idea if you can just do this without uninstalling and re-installing.

I was wondering the same thing some time ago.
It’s not a bug, it’s a feature :slight_smile:

du -sh /var/lib/docker/overlay2
is not showing objective value because merge folders have been mounted using overlay driver and du output is not actual disk allocation size.

You can see the actual disk allocation size by examining only diff folders like:
du -shc /var/lib/docker/overlay2/*/diff

You can test this in your environment like this:
run
df -h /dev/sd*
du -shc /var/lib/docker/overlay2/*/diff
du -sh /var/lib/docker/overlay2
Now start 20 centos containers and observe what has change:
for i in {1..20}; do docker run -itd centos bash; done
df -h /dev/sd*
du -shc /var/lib/docker/overlay2/*/diff
du -sh /var/lib/docker/overlay2

You can see that the actual disk allocation (df command) is just cca 200MB more than before, but “du” on whole folder outputs 4.2G allocation.
“du” on “diff” folders shows 212M what is correct.

This is how Docker works and what makes it great!

4 Likes

So we had the same issue (using Gitlab + private registry) and found this solution to work.

sudo gitlab-ctl registry-garbage-collect

This command will remove images in the directory
/var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/blobs

We reclaimed our disk space usage from 89% to 17%.

Note: we also remove old unused images (tags) from the Gitlab web ui under project settings > registry > list containers in repo.

Edit: more info to be found here https://github.com/sciapp/gitlab-registry-cleanup

1 Like

Good to hear, but this topic is about docker itself, not about registry - it’s a separate application

1 Like

@dpatekar

Thank you very much for your reply!

Nice article and very informative for me, also visit my blog https://bolasportsbo.com too guys. Thank you…

Thank you for your explanation.
I tested on my Project and its true.

But the Problem is that my mysql stops working cause of full Disk … even the Disk is not really full

With du -sh i have 4.8 GB with df -h 18 GB and my mysql container stops working cause there is no free space …

1 Like

I’m having the same problem. Any solution?

Using Docker version 18.09.3 and general solution works properly:
docker system prune --all --volumes --force
Just freed 80+ Gb on the server

8 Likes

Thanks to @renatorro, I was able to identify what causes that Disk leak.
For me it was because my Images were writing jar onto disk and every time I push a new version with like :

sudo kubectl set image deployment/imageName imageName=imageName:0-1-X

So the overlay2 folder increased each time.
I was able to clean it up by cordon my node, drain old unused image, and uncordon :

kubectl get nodes

Choose the your node in DiskPressure in the list, then detach it :

kubectl cordon

(It should appear as SchedulingDisabled if you run kubectl get nodes again)
Then drain all pods in the node :

kubectl drain --ignore-daemonsets

This could take a while if you have like me a lot of Evicted Pods…
You can now check the overlay2 folder size in the node :

cd /var/lib/docker
du -h --max-depth=1 | sort -hr

You should see the size diminishing while the drain is in progress…
When the drain is finished, you can re-attach your Node :

kubectl uncordon

The overlay folder should now re-increase to the normal size ( while your pods are loading)

Hope this can help someone !!

So I just went through this. I read a lot of posts about how overlay doesn’t use space. This is not an entirely accurate statement. Depending on what you are doing in your container and how it and the application it runs are configured, you could end up with a large overlay directory. Overlay becomes large because the data written to disk by a process inside a container writes to the overlay unless you mount a volume to the location in the container where you are writing. If you start a container with just ssh and no volume mounts, and use dd to write a 1gb file in the container, running du will only show you the top layer and will not include the size of that file even though ls -lh does. This means running through directories and using du like I did today won’t help you.
If seeing is believing, head over to /var/lib/docker/overlay2 on the docker host and run ls | xargs -I {} du -shx {}. Go into the largest directory and then the associated diff directory. It should look super familiar. Run that command again and chase directories until you find what is using the data. That data is likely necessary for the workload in the container.

Julien’s approach works, but after the same amount of run time and the same amount and type of activity in the container, I’d imagine those steps will need to be taken again. Going and looking through the diff (again, don’t delete stuff in there. your application probably needs it) will help you determine the best approach to resolve the issue long term.

2 Likes

I was having some issues with a large overlay and the thread I’m replying to helped me identify the diff which was chewing up a lot of disk space. In my case it had some large back-up files sitting inside it. I just deleted those back-up files and freed up a ton 'o space. Thanks!

Hello,

I just want to share a tad of experience I gathered today with this issue, which might hopefully help some people here to understand what might be going on.

We used a docker server for continuously deploying new software commits to a testing environment. After some time I found that around 25-30 GB where blocked in a place in the file system where I had no explanation what resource allocated the space (this is how I found out about this post).

Therefore I checked the system with du -shc /var/lib/docker/overlay2/*/diff as dpatekar suggested (see his answer from September 2018)

Our next step after discovering this issue was docker system df which showed that the most space taken by docker was in images with a value of 91% Reclaimable
We did some additional research and found out that our “docker image prune” or “docker container prune” did not work because we were having issues with dangling images (quick resource here for the difference between unused and dangling images: https://stackoverflow.com/a/45143234 ).

Therefore we used docker system prune -a which cleared up 25 GB of space.

Edit: Using this command has been fine for our test server, but you might not want to use it in production (see the response below the stackoverflow answer I linked, it also provides an argument for a better approach to clear dangling images).

Therefore, if you are having issues on a production server: Please read the documentation before you use this and make sure you won’t delete elements you need: https://docs.docker.com/engine/reference/commandline/system_prune/

I hope this helps.

Cheers

Hi,

Just joining to add something to this discussion. I had this similarly problem with my servers. I had like 10 apps running with two containers each. Even with all of their volumes and data, I should have only 30-35GB of used storage, but instead, I was disk full at 155GB.

What I did to solve this, besides the traditional docker system prune --all, was removing all docker logs files from my containers…

My containers are running mostly webservers and my access logs are printed on stdout. So they became huge after a couple months.

So, what I did was:

#to remove all log files
find /var/lib/docker/containers/ -type f -name “*.log” -delete

Then I executed a docker-compose down && docker-compose up -d in all my applications to have those log files created again.

104GB was freed by doing this.

Hope this helps someone.

Cheers,

Matheuscmpm

2 Likes

Thanks,
I can confirm that the solution of deleting the logs files works:

#to remove all log files
find /var/lib/docker/containers/ -type f -name “*.log” -delete

Do not forget to restart the docker containers:

docker-compose down && docker-compose up -d

or Reboot the server to complete the clean-up process

shutdown -r now

50

Worked like a charm for me! Thank u! :vulcan_salute::love_you_gesture:

to identify what overlay belongs to what image, container you may use: https://gist.github.com/epcim/cbe1e51b1f8ae011d84ce7a754401398

1 Like

I can confirm that in my case docker system prune --all --volumes --force fixed the problem. Running the interactive docker system prune didn’t help. Either the force flag was needed or the --volumes (not sure which).

Thanks I’ll add that to the clean up list and see what happens. It’d still be nice to be able to map stuff from the overlay dir to what “owns” it though.

This is a handy little one-liner I came up with to identify which image(s) own a particular folder in the overlay2 directory:

for I in $(docker image ls |grep -v IMAGE |awk '{print $3}' |sort |uniq); do F=$(docker image inspect $I | grep "ed420aa193d1533d2be0b6799af7434805b990ea963c7ae282ae067dbd1f2b95"); if [ -n "$F" ]; then echo $I; fi; done

That gives you the hash that you can grep for in a docker image ls