How to effectively cache an expensive operation?

nuc1eon · January 28, 2019, 8:34pm

My image is meant to be a game server for CounterStrike:GO. There is a tool called steamcmd that automaticly downloads and installs the game files and server files. Unfortunately when doing a full rebuild, the build process pulls ~18GB of data which takes a lot of time. But normally steamcmd just pulls delta updates based on the previous game files and it even validates the output.

There is a (not very docker-like) workaround to this: Install the the game data on our host server and use it as a source for our Dockerfile. However since we’re using Docker we also need a “docker solution” for this, or the whole docker paradigma woudn’t make any sense…

So far I had considered the following methods:

1) Using multi stage builds. Unfortunately it’s not really the solution: The base image basically needs to be “kept-out-of-date” to play out it’s perks and avoid pulling 18 GB of files everytime. Secondly, if kept out of date too long the delta would get bigger and bigger, also slowing down build time. So either way this solution sucks.

2) Second alternative that came to mind is using “rolling release containers” (my own terminology since I don’t how how you’d call this), which basically hold all the game server data inside a dedicated volume and keep it constantly up-to-date. This volume would basically be used as “cache” for building the new image. The current content of the volume would simply need to be copied during build. Considering it is a dedicated Docker container just for this one task, we can be quite sure the data is valid. In the rare case that something goes wrong one can still restart the container and refill the volume from scratch.
So one downside is that data needs to be copied which takes time. The other downside is that volumes cannot simply be mounted during build time, so it would require a more complex mechanism than that. Lastly using volumes as “cache” for building images seems to be discouraged by the Docker developers – I guess there a reasons for that but currently I would probably be forced to use this method.

To be honest, both approaches seem like rather overcomplicated workarounds for something that can be achieved in a simpler fashion.

Would it be possible to use an already build image as build cache? Or some other better way? Not quite sure but as I understand it the new buildkit might bring this feature with RUN --mount. I am already using buildkit but since it is badly documented as of yet, I don’t know how I would implement something like this.

Am I on the right track here or am I totally wrong?
My current Dockerfile is here if you wanna take a look, the game data gets pulled in line #27. Currently it uses none of the above methods as I am still looking for the best solution.

Topic		Replies	Views
Caching images and layers on GH Actions workflow General docker	1	234	April 2, 2024
Huge build cache handling General docker , build	4	526	July 27, 2024
The Best Strategies to Slim Docker Images General tutorial , tips	6	1991	July 21, 2023
A case for volumes during 'docker build': pip 7 General	0	1595	May 27, 2015
Appears build cache is busted once the image is used in a container. Why? How to avoid? General	3	387	January 23, 2023

How to effectively cache an expensive operation?

Related topics