Some way to clean up / identify contents of /var/lib/docker/overlay

So we had the same issue (using Gitlab + private registry) and found this solution to work.

sudo gitlab-ctl registry-garbage-collect

This command will remove images in the directory
/var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/blobs

We reclaimed our disk space usage from 89% to 17%.

Note: we also remove old unused images (tags) from the Gitlab web ui under project settings > registry > list containers in repo.

Edit: more info to be found here https://github.com/sciapp/gitlab-registry-cleanup

1 Like

Good to hear, but this topic is about docker itself, not about registry - it’s a separate application

1 Like

@dpatekar

Thank you very much for your reply!

Nice article and very informative for me, also visit my blog https://bolasportsbo.com too guys. Thank you…

Thank you for your explanation.
I tested on my Project and its true.

But the Problem is that my mysql stops working cause of full Disk … even the Disk is not really full

With du -sh i have 4.8 GB with df -h 18 GB and my mysql container stops working cause there is no free space …

1 Like

I’m having the same problem. Any solution?

Using Docker version 18.09.3 and general solution works properly:
docker system prune --all --volumes --force
Just freed 80+ Gb on the server

8 Likes

Thanks to @renatorro, I was able to identify what causes that Disk leak.
For me it was because my Images were writing jar onto disk and every time I push a new version with like :

sudo kubectl set image deployment/imageName imageName=imageName:0-1-X

So the overlay2 folder increased each time.
I was able to clean it up by cordon my node, drain old unused image, and uncordon :

kubectl get nodes

Choose the your node in DiskPressure in the list, then detach it :

kubectl cordon

(It should appear as SchedulingDisabled if you run kubectl get nodes again)
Then drain all pods in the node :

kubectl drain --ignore-daemonsets

This could take a while if you have like me a lot of Evicted Pods…
You can now check the overlay2 folder size in the node :

cd /var/lib/docker
du -h --max-depth=1 | sort -hr

You should see the size diminishing while the drain is in progress…
When the drain is finished, you can re-attach your Node :

kubectl uncordon

The overlay folder should now re-increase to the normal size ( while your pods are loading)

Hope this can help someone !!

So I just went through this. I read a lot of posts about how overlay doesn’t use space. This is not an entirely accurate statement. Depending on what you are doing in your container and how it and the application it runs are configured, you could end up with a large overlay directory. Overlay becomes large because the data written to disk by a process inside a container writes to the overlay unless you mount a volume to the location in the container where you are writing. If you start a container with just ssh and no volume mounts, and use dd to write a 1gb file in the container, running du will only show you the top layer and will not include the size of that file even though ls -lh does. This means running through directories and using du like I did today won’t help you.
If seeing is believing, head over to /var/lib/docker/overlay2 on the docker host and run ls | xargs -I {} du -shx {}. Go into the largest directory and then the associated diff directory. It should look super familiar. Run that command again and chase directories until you find what is using the data. That data is likely necessary for the workload in the container.

Julien’s approach works, but after the same amount of run time and the same amount and type of activity in the container, I’d imagine those steps will need to be taken again. Going and looking through the diff (again, don’t delete stuff in there. your application probably needs it) will help you determine the best approach to resolve the issue long term.

2 Likes

I was having some issues with a large overlay and the thread I’m replying to helped me identify the diff which was chewing up a lot of disk space. In my case it had some large back-up files sitting inside it. I just deleted those back-up files and freed up a ton 'o space. Thanks!

Hello,

I just want to share a tad of experience I gathered today with this issue, which might hopefully help some people here to understand what might be going on.

We used a docker server for continuously deploying new software commits to a testing environment. After some time I found that around 25-30 GB where blocked in a place in the file system where I had no explanation what resource allocated the space (this is how I found out about this post).

Therefore I checked the system with du -shc /var/lib/docker/overlay2/*/diff as dpatekar suggested (see his answer from September 2018)

Our next step after discovering this issue was docker system df which showed that the most space taken by docker was in images with a value of 91% Reclaimable
We did some additional research and found out that our “docker image prune” or “docker container prune” did not work because we were having issues with dangling images (quick resource here for the difference between unused and dangling images: https://stackoverflow.com/a/45143234 ).

Therefore we used docker system prune -a which cleared up 25 GB of space.

Edit: Using this command has been fine for our test server, but you might not want to use it in production (see the response below the stackoverflow answer I linked, it also provides an argument for a better approach to clear dangling images).

Therefore, if you are having issues on a production server: Please read the documentation before you use this and make sure you won’t delete elements you need: https://docs.docker.com/engine/reference/commandline/system_prune/

I hope this helps.

Cheers

Hi,

Just joining to add something to this discussion. I had this similarly problem with my servers. I had like 10 apps running with two containers each. Even with all of their volumes and data, I should have only 30-35GB of used storage, but instead, I was disk full at 155GB.

What I did to solve this, besides the traditional docker system prune --all, was removing all docker logs files from my containers…

My containers are running mostly webservers and my access logs are printed on stdout. So they became huge after a couple months.

So, what I did was:

#to remove all log files
find /var/lib/docker/containers/ -type f -name “*.log” -delete

Then I executed a docker-compose down && docker-compose up -d in all my applications to have those log files created again.

104GB was freed by doing this.

Hope this helps someone.

Cheers,

Matheuscmpm

2 Likes

Thanks,
I can confirm that the solution of deleting the logs files works:

#to remove all log files
find /var/lib/docker/containers/ -type f -name “*.log” -delete

Do not forget to restart the docker containers:

docker-compose down && docker-compose up -d

or Reboot the server to complete the clean-up process

shutdown -r now

50

Worked like a charm for me! Thank u! :vulcan_salute::love_you_gesture:

to identify what overlay belongs to what image, container you may use: https://gist.github.com/epcim/cbe1e51b1f8ae011d84ce7a754401398

1 Like

I can confirm that in my case docker system prune --all --volumes --force fixed the problem. Running the interactive docker system prune didn’t help. Either the force flag was needed or the --volumes (not sure which).

Thanks I’ll add that to the clean up list and see what happens. It’d still be nice to be able to map stuff from the overlay dir to what “owns” it though.

This is a handy little one-liner I came up with to identify which image(s) own a particular folder in the overlay2 directory:

for I in $(docker image ls |grep -v IMAGE |awk '{print $3}' |sort |uniq); do F=$(docker image inspect $I | grep "ed420aa193d1533d2be0b6799af7434805b990ea963c7ae282ae067dbd1f2b95"); if [ -n "$F" ]; then echo $I; fi; done

That gives you the hash that you can grep for in a docker image ls

Actualy you can get all the details from docker inspect.

This oneliner lists the mapping of overlay2 folders to exact RepoDigest information (this helps to distinguish folders even for mutable tags like “latest”)

docker image inspect $(docker image ls -q)  --format '{{ .GraphDriver.Data.MergedDir}} -> {{.RepoDigests}}' | sed 's|/merged||g'

I found out that easiest way to clean up that directory (that in my case grew to 52GB in about 2 months) was to clean up the builder cache by issuing a:
docker builder prune

If you want to go one step further, use:
docker builder prune --all

Docs: https://docs.docker.com/engine/reference/commandline/builder_prune/

2 Likes

clean log file to clean up 10GB for me :stuck_out_tongue:

truncate -s 0 /var/lib/docker/containers/*/*-json.log

You may need sudo

sudo sh -c "truncate -s 0 /var/lib/docker/containers/*/*-json.log"
1 Like