Some way to clean up / identify contents of /var/lib/docker/overlay

This!

for i in find /var/lib/docker/containers/ -type f -name “*.log” ; do > $i ; done

because I don’t care about the logs :slight_smile:

Thanks for taking the time to share this, you’re a lifesaver. Managed to free up 117GB of space from the logs that I really didn’t need.

Total reclaimed space: 248.6GB

Thanks

In my case gitlab-runner for building docker images, I remembered a problem I had when building images where docker should have been rebuilding the images after I destroyed the older images. But it still locks a lot of cache (intermediary images), not caring about their final image being removed.
docker builder prune
Causes clean up of:
Total: 29.01GB

This is the solution and it didn’t seem to break anything (as far as I noticed).

it worked thanks. reclaimed space 80GB

I encountered a similar problem where /var was reaching a 99% mark along with overlay2. This problem in our system is due to the messages file in the /var/log directory, when you delete that file and stop or restart the rsyslog you will clear most of the taken space from /var and overlay2 where the docker prune was not working. This is most likely a misconfiguration in our system but you can try, it could solve your problem.

If anyone still having issues, this article is helpful Optimizing Docker Storage

I was able to claim with

  • 62.8GB with docker system prune -a -f
  • another 13GB with `docker volume rm $(docker volume ls -qf dangling=true)

Been running docker for a couple years w/o cleanup, variable number of containers, but typically just over 80. 45 TB drive got up to 95% full, when looking for logs and things to clean up…saw the MASSIVE size of overlay2. Searching brought me here. No amount of the pruning, nor any other friendly cleanup method suggested significantly worked other than cleaning a few gig. Ended up doing the following:

  1. Made sure I had a backup of everything, including “mapped” drives.
  2. Made sure I had a copy of all the dockerfiles for each stack.
  3. Stopped all containers.
  4. Removed all containers.
  5. Removed all images.
  6. Stopped docker.
  7. Removed the docker software.
  8. Nuked the /var/docker/root/overlay2 directory
  9. Rebooted the server.
  10. Reinstalled the docker software
  11. Deployed the portainer container.
  12. Redeployed all the containers from the dockerfiles

Ended up recovering 23 TB of the near 45 TB consumed.

So, this worked for me, notable downtime, but less than an hour. I can accept this process, I’ll add to the list of things to do when upgrading the OS to a new version (seems like a good time to do it).

But a couple things that bother me:

  1. I had NO idea this was a thing. We’ve been watching the RAID grow over the years and had assumed it was legit data. We’d even purchased drives to upgrade the RAID from 45TB to 132TB. How do we get the word out that this “feature” of Docker really sucks?

  2. How can I monitor what is “bloat” and what is “legit” data? I can’t seem to figure out way to distinguish between the two.

Thanks y’all. And great thread, I’m glad I found it, saved me about $3000. :wink:

1 Like

If you are using an old Docker version, who knows what bug it had, but let’s assume there was no bug and maybe you even kept Docker up to date.

docker system prune -a -f removes only “unused data” including containers, images. By adding the --volumes flag you can also remove anonymous volumes, which can be done by another suggested command docker volume prune -f, but it both keep named volumes. Those are volumes you may want to reusae later even if you deleted the container, that’s why you assigned a name to those.

  • So maybe there were some named volumes forgotten
  • or somehow an existing and running container wrote a huge amount of data to the container filesystem
  • or the container logs didn’t have proper limits and grew to a huge size.
  • if at some point during the years there was an upgrade which changed some settings like the storage driver, you could do anything affecting only the active storage driver leaving the old one untouched
  • Maybe the database files of Docker were corrupted so it didn’t know about some files
  • Maybe other files were corrupted which confused Docker. It is not something I ever experienced so this one is just an idea.

It is hard to tell now what it was caused by. Bugs can happen, but most of the times there is an explanation even if it is sometimes goes beyond well-known and well-documented cases.

Depends on what you mean by legit, but I don’t know any tool that tells you that. If you have a monitoring system which alerts you when the used disk size increases beyond a limit without any obvious reason, you can investigate, but if your monitoring system monitors volume sizes, container filesystems and so on, you can find out if something shouldn’t be as big. If you want to recognize corrupted docker data dir, that is a hard one. It’s like a corrupted disk which can be saved by a normal user, but specialists might be able to recover the data. You or a tool would need to know exactly how the Docker data root works, which file stores what in what format, refers to what and so on. Then follow the references, save the filenames and eventually show files only that were not referred to anywhere. It is actually not impossible so maybe someone has already done it. I don’t know.