Some way to clean up / identify contents of /var/lib/docker/overlay

luntzel · November 17, 2022, 3:04pm

This!

for i in find /var/lib/docker/containers/ -type f -name “*.log” ; do > $i ; done

because I don’t care about the logs

mpenchev · December 16, 2022, 12:22pm

Thanks for taking the time to share this, you’re a lifesaver. Managed to free up 117GB of space from the logs that I really didn’t need.

leetwanker · March 5, 2023, 5:29pm

Total reclaimed space: 248.6GB

Thanks

fpc7063 · May 5, 2023, 3:23pm

In my case gitlab-runner for building docker images, I remembered a problem I had when building images where docker should have been rebuilding the images after I destroyed the older images. But it still locks a lot of cache (intermediary images), not caring about their final image being removed.
docker builder prune
Causes clean up of:
Total: 29.01GB

mysablehats · July 26, 2023, 10:04am

This is the solution and it didn’t seem to break anything (as far as I noticed).

muratcabuk · August 2, 2023, 12:51pm

it worked thanks. reclaimed space 80GB

noemisxene · November 15, 2023, 9:11am

I encountered a similar problem where /var was reaching a 99% mark along with overlay2. This problem in our system is due to the messages file in the /var/log directory, when you delete that file and stop or restart the rsyslog you will clear most of the taken space from /var and overlay2 where the docker prune was not working. This is most likely a misconfiguration in our system but you can try, it could solve your problem.

dataheadless · November 15, 2023, 4:46pm

lyte:

I’m looking for some way to clean up the contents of /var/lib/docker/overlay (or /var/lib/docker/overlay2 with overlay2 - I run both, but on different nodes, both seem to have this issue).

Is there any way to map the contents in there to what owns it?

We have some CI runners that nightly do the equivalent of:
docker rm -vf $(docker ps -aq)
docker rmi -f $(docker images -aq)
docker volume prune -f
but the contents of /var/lib/docker/overlay grows ever larger.

It’s worth noting that sometimes (I can’t purposefully reproduce this) if I run the above clean up commands, stop docker, remove /var/lib/docker/overlay and restart docker when pulling a base image again something still references it, e.g:
# docker run -it --rm ringo/scientific:6.8 bash
Unable to find image 'ringo/scientific:6.8' locally
6.8: Pulling from ringo/scientific
89afeb2e357b: Already exists
Digest: sha256:cb016e92a510334582303b9904d85a0266b4ecdb176b68ccb331a8afe136daf4
Status: Downloaded newer image for ringo/scientific:6.8
docker: Error response from daemon: lstat /var/lib/docker/overlay/ea96bccc6b502595f1b127e5007a8a79c180173e14aba5b1b15703f38e8b5bd4: no such file or directory.
Is there some cache of pulled images that doesn’t show up in docker images -a?

If anyone still having issues, this article is helpful Optimizing Docker Storage

jonahyeoh · November 27, 2023, 2:55am

I was able to claim with

62.8GB with docker system prune -a -f
another 13GB with `docker volume rm $(docker volume ls -qf dangling=true)

priamx · April 12, 2024, 9:52pm

Been running docker for a couple years w/o cleanup, variable number of containers, but typically just over 80. 45 TB drive got up to 95% full, when looking for logs and things to clean up…saw the MASSIVE size of overlay2. Searching brought me here. No amount of the pruning, nor any other friendly cleanup method suggested significantly worked other than cleaning a few gig. Ended up doing the following:

Made sure I had a backup of everything, including “mapped” drives.
Made sure I had a copy of all the dockerfiles for each stack.
Stopped all containers.
Removed all containers.
Removed all images.
Stopped docker.
Removed the docker software.
Nuked the /var/docker/root/overlay2 directory
Rebooted the server.
Reinstalled the docker software
Deployed the portainer container.
Redeployed all the containers from the dockerfiles

Ended up recovering 23 TB of the near 45 TB consumed.

So, this worked for me, notable downtime, but less than an hour. I can accept this process, I’ll add to the list of things to do when upgrading the OS to a new version (seems like a good time to do it).

But a couple things that bother me:

I had NO idea this was a thing. We’ve been watching the RAID grow over the years and had assumed it was legit data. We’d even purchased drives to upgrade the RAID from 45TB to 132TB. How do we get the word out that this “feature” of Docker really sucks?
How can I monitor what is “bloat” and what is “legit” data? I can’t seem to figure out way to distinguish between the two.

Thanks y’all. And great thread, I’m glad I found it, saved me about $3000.

rimelek · April 13, 2024, 7:53pm

If you are using an old Docker version, who knows what bug it had, but let’s assume there was no bug and maybe you even kept Docker up to date.

docker system prune -a -f removes only “unused data” including containers, images. By adding the --volumes flag you can also remove anonymous volumes, which can be done by another suggested command docker volume prune -f, but it both keep named volumes. Those are volumes you may want to reusae later even if you deleted the container, that’s why you assigned a name to those.

So maybe there were some named volumes forgotten
or somehow an existing and running container wrote a huge amount of data to the container filesystem
or the container logs didn’t have proper limits and grew to a huge size.
if at some point during the years there was an upgrade which changed some settings like the storage driver, you could do anything affecting only the active storage driver leaving the old one untouched
Maybe the database files of Docker were corrupted so it didn’t know about some files
Maybe other files were corrupted which confused Docker. It is not something I ever experienced so this one is just an idea.

It is hard to tell now what it was caused by. Bugs can happen, but most of the times there is an explanation even if it is sometimes goes beyond well-known and well-documented cases.

Depends on what you mean by legit, but I don’t know any tool that tells you that. If you have a monitoring system which alerts you when the used disk size increases beyond a limit without any obvious reason, you can investigate, but if your monitoring system monitors volume sizes, container filesystems and so on, you can find out if something shouldn’t be as big. If you want to recognize corrupted docker data dir, that is a hard one. It’s like a corrupted disk which can be saved by a normal user, but specialists might be able to recover the data. You or a tool would need to know exactly how the Docker data root works, which file stores what in what format, refers to what and so on. Then follow the references, save the filenames and eventually show files only that were not referred to anywhere. It is actually not impossible so maybe someone has already done it. I don’t know.