Docker on windows with Azure DevOps Server degrades server connections (DNS?)

Shot in the dark here, maybe someone knows something I don’t.

We’ve got a docker server on windows (unfortunately) due to the need to build windows based images and software. Our server was recently refreshed and after only a few days saw the same issues again, so it isn’t related to the system itself.

What we see is a period of stability followed by degrading failures, which get so bad ultimately I’m forced to reboot the server. It acts to me like some form of rate limit for API access or something, but the one thing I can’t shake is that it only occurs in jobs that use a docker container.

So, for given jobs which attempt REST API queries to Azure DevOps (which we’re running on…), it will suddenly get all sorts of issues like can’t access, EAI_AGAIN, Timeout to //api/Location, etc. Except for our corporate hosted DNS which we don’t have access to, I’ve investigated everything I can think of to try to determine why this happens and how to fix it. Our corporate DNS seems to work for everything else, and given it works some of the time means it isn’t a setup issue.

Does anyone know of any docker engine related issue of degradation or anything that would behave this way? I can’t share specific examples or code, but it really is as simple as using Azure DevOps Server 2022, any version of docker engine cli, and just using images using the internalized Azure DevOps ability to create a container on the fly (which seems to be a glorified docker run –rm).

Worth a shot…

If it is rate limiting, Isn’t it an Azure Devops Server rate limit? But I don’t know why it occurs only when using Docker containers.

You added DNS to the title. EAI_AGAIN indeed points to DNS, but I never met any DNS limit in Docker. It forwards DNS requests to external DNS servers, but then it wouldn’t happen only with Docker containers.

To be honest I don’t use Windows containers often, so 'Im not exactly sure how differently that handles DNS.

Can you check Docker daemon logs in Azure Devops Server?

Nothing that’s super helpful on the server. I see windows event viewer entries, lots of infos, and a few errors involving DNS timeouts. This is why i included DNS in the title…it does seem related, except that the errors are inconsistent and don’t necessarily result in a failed build step. I really have not been able to have any kind of reliable data between finding an error and seeing a current build fail. And if it were a scenario where I could reproduce, say, 5 retries, and the first one failed, I don’t get granular enough info that I can specifically see the first fail and the subsequent succeed. Instead I just get a generic failure to query external DNS server, the IP, the “question”, and that’s it.

Container logs would also be helpful except that these containers are automatically destructed on build fail. Wish they could be set to stay around for some kind of delay or something.

Still, grasping at straws, maybe someone on here has some neat tricks to help diagnose or found a solution…

Just to try to find some similar issues, I looked for what can cause DNS resolution issues on Windows in containers and I found this, which is not Docker, but Kubernetes. It could still be related

Except it says DNS resolution failed both internally and externally.

Then I found this in the moby repo, which also mentions “HNS = Host Networking Service”

So I would think about these kind of issues

Thanks much on looking these up for me. The first one is a go no - we’ve seen issues with a single agent running a single build before, so it isn’t strictly load based. But the second one for DNS forwarding has a ton of information and led to some things to try - we absolutely do not have a local DNS resolver on the same server/gateway. Very much appreciate it, I’ll try to respond with some info as we try some things.