I have set up some automated builds off some “docker” folders in my github repo, and they are working fine, but they rebuild the images every time I make a commit anywhere in my master branch, even if there’s no change to the /docker/myimage folder or /docker/myimage/Dockerfile.
This isn’t strictly a problem, but it does mean my images get rebuilt far more than necessary (not a problem for me, but your server is doing a lot of unnecessary work and repeatedly downloading the same things over and over), and any images that are “FROM” or linked to them also get rebuilt. It also makes it tricky to know what version of the image is running on each server because they all appear to have different imageIds, despite being identical. Is there any reason to trigger a rebuild on a github commit trigger if the monitored folder (ie. /docker/myimage) didn’t change? Are there situations where a change in a parent / sibling folder could alter the image build?
Obviously I could move my /docker folder into a different github repo as a workaround, but I’ve found it handy to have everything required to build & deploy my project in a single repo.
A second question is why don’t the automated builds appear to take advantage of caching like they do when I build images myself? When I do “sudo docker build .” usually it uses the cached image up to the point where my Dockerfile has changed. Since I have some steps that take >10mins (downloading lots of stuff), this makes the build process MUCH faster. Curious as to why this doesn’t seem to happen on a Docker Hub automated build?
The behavior you’re describing is expected right now (October 2014). You’re right, it is not optimal.
Is there any reason to trigger a rebuild on a github commit trigger if the monitored folder (ie. /docker/myimage) didn’t change? Are there situations where a change in a parent / sibling folder could alter the image build?
I think you’re right – the build context should not go any further up than the directory containing the
Dockerfile. But we’re not doing a lot of analysis on the github webhook. We only recently started paying attention to the branch where the change occurred, for example. We’ll keep adding optimizations as we go, and if the webhook tells us which files changed, we’ll be able to be smarter about the build some day.
why don’t the automated builds appear to take advantage of caching like they do when I build images myself?
It is a problem of scale. We do 1000’s of builds per day and keeping all those intermediate layers around on the build servers is difficult. We have some ideas about optimizing this, but it remains a difficult problem.
As a new user I also came across this behavior. And to be honest, I find this very problematic. It means, that even a fixed tag like myrepo/myrpoject:1.2.3 will not guarantee, that the according image will not constantly change. At least if it’s a vital project with many releases.
So say, you have a project, based on such an image. And you use a carefully crafted workflow to make best use of caching and reuse of layers in your project. All this is rendered useless, if the base image constantly changes. There will be massive downloads across the team all the time, each time a new release is built because all the layers in your chain get invalidated!
You could now say, “Well, then don’t use automated builds to create a tagged version” - but that’s what most projects on docker hub do! So this already is a big problem. IMO there should be a big warning sign accompanied with a “DON’T DO THAT!” somewhere on the site, whenever someone creates an automated build for a semver tag, or something.
Actually I’m still somewhat shaking my head about this situation and still hope, that I’m missing something.
So is this a known issue? And will there something be done about it?
Are there any plans for Docker Hub to change this ? I.e. to trigger a build only in directories which holds files affected by a git commit ? I have a single git repository with 25 template-generated Dockerfiles and obviously I don’t want to automatically trigger 25 build when I change only one of them. See also https://github.com/docker/docker/issues/7480 for a similar issue.
I’m also still waiting for a solution. In my case it would be sufficient if we had a button on docker hub to trigger a build for a defined tag.
Hi Mike – There is a “Trigger Build” button on your repo, though it will rebuild all the tags.
@Rufus Thanks, I know that button. As you mentioned, this will rebuild all tags. Correct me if I’m wrong, but this will also generate new hash IDs for all layers in each and every tag. So users of an old unchanged tag will see huge downloads if they do a
docker fetch on that tag, even though the content has not changed at all.
That is correct. Since there is no cache used on the automated builds, each layer is created new on each build, so the layer IDs are new as well. The result can be all-new downloads when pulling the same tag again.
How is it looking today? How for away are we from builds only triggered from context path changes?
Is there anything that could be helped to achieve this? I’d love to contribute to this, as this feature has caused multiple issues in our organisation