I have my laptop and a server running Docker, both using the ZFS storage driver. Having spent a bit of time recently learning more about how container and images are organised on the file system, I realise that they both have large numbers of orphaned datasets - by that I mean datasets that are not referenced, directly or indirectly via a snapshot or clone, by any containers or images.
Questions of this ilk come up from time to time - I’ve done my research and I understand that a small number of images can give rise to large numbers of datasets. That’s fine, but there are significantly more datasets than what the containers and images reference.
I assume something happens that breaks the relationship - whether that be system reboots, upgrades, reinstalls, I don’t know. All of these things happen from time to time.
I can easily write a script to destroy the orphaned datasets but I’d like to be sure that what I plan to do is correct and that I am not missing something.
Here are some figures:
| C [1] | I [2] | D [3] | DR [4] | O [5]
--------------------------------------------------
Laptop | 14 | 6 | 117 | 44 | 73
Server | 20 | 18 | 576 | 128 | 448
- Containers
docker container -qa | wc -l
- Images
docker image ls -q | wc -l
- Datasets
zfs list -Hr system/docker | awk 'NR>1' | wc -l
(doesn’t count root dataset) - Datasets referenced, see bash script listing below.
- Orphans =
D - DR
# Get dataset ancestors
dataset_ancestors() {
d="$1"
until [[ "$d" == '-' ]]
do
echo "$d"
d="$(zfs get -H origin "$d" | awk -F"[\t@]" '{print $3}')"
done
}
export -f dataset_ancestors
# All datasets associated with Docker image and/or container
( docker image ls -q | xargs -I {} sh -c "dataset_ancestors \$(docker image inspect \$1 | jq -r '.[].GraphDriver.Data.Dataset')" _ {}
docker container ls -qa | xargs -I {} sh -c "dataset_ancestors \$(docker container inspect \$1 | jq -r '.[].GraphDriver.Data.Dataset')" _ {}
) | sort -u | wc -l
I’ve already done a docker system prune -a
and I don’t believe there’s any build cache (I think those datasets have shorter names so are easily recognisable). All the orphaned datasets are named like the example below, some have -init
appended - most don’t
ff9f4eb9964469d8399b93c705509306d3c18a7de368c53b74546c754557fa06
Am I missing anything? Is there anything else in Docker that could own those orphan datasets that I should check? If not then I should be able to just zfs destroy
the orpahed datasets (in clone/snapshot order), right?