Windows server 2019 based image crashes host when run

Hello,
I am having some very weird errors trying to run some gitlab-runner jobs using docker-windows runner.
Current behavior.
After cleaning everything docker related from the machine and rebuilding the image and pulling them on the machine, jobs run just fine (after rebuilding images I ran close to 40 gitlab-runner jobs in a row from 2 windows server core based images). All jobs ran without problems. A few days later (no other jobs were ran on the server in this time), when trying to run the same jobs, the whole HOST OS just crashes instantly, with no logs available. After the first crash happens, no windows based image is runnable anymore, even the base mcr.microsoft.com/dotnet/framework/sdk:4.8-windowsservercore-ltsc2019 from which the images i’m running (and crashing)
The problems started happening after a windows update to 17753.2931, before this update we had no crashes
Docker was installed on the server by downloading the 20.10.17 static binaries, registering docker as a service and creating a config in C:/ProgramData/Docker/config/daemon.json to set up the dockerdata folder on another partition.
Host OS version: Windows Server 2019 Standard Version 1809 (OS Build 17763.2931)
Image OS version : Windows server 2019 10.0.17763.3046
docker version
Client:
Version: 20.10.17
API version: 1.41
Go version: go1.17.11
Git commit: 100c701
Built: Mon Jun 6 23:09:02 2022
OS/Arch: windows/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.17
API version: 1.41 (minimum version 1.24)
Go version: go1.17.11
Git commit: a89b842
Built: Mon Jun 6 23:03:58 2022
OS/Arch: windows/amd64
Experimental: false

docker system info
Client:
Context: default
Debug Mode: false

Server:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 32
Server Version: 20.10.17
Storage Driver: windowsfilter
Windows:
Logging Driver: json-file
Plugins:
Volume: local
Network: ics internal l2bridge l2tunnel nat null overlay private transparent
Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
Swarm: inactive
Default Isolation: process
Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
Operating System: Windows Server 2019 Standard Version 1809 (OS Build 17763.2931)
OSType: windows
Architecture: x86_64
CPUs: 36
Total Memory: 383.6GiB
Name: mdc-bld01
ID: KHPX:QZ7Q:WKPR:CNVP:UALT:Z2XB:DLVU:ZR3H:OWTG:HLS3:ZRP5:5LTQ
Docker Root Dir: D:\dockerdata
Debug Mode: true
File Descriptors: -1
Goroutines: 24
System Time: 2022-06-27T06:01:14.4909031-04:00
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

Any idea why this is happening, how can I debug why this is happening and how to fix it?
Thank you.

Returning with some extra information, seems that the image just can’t be run if a day or so passes since it has been pulled, no antivirus is active on the server (Disabled for testing).
Pulled a older version from here : https://mcr.microsoft.com/v2/dotnet/framework/sdk/ yesterday, i was able to run the image in a container with “winpty docker run -it image hash”
Today I executed the same command and the os crashed

Hello, turns out my problem was caused by the deduplicate functionality from windows server core. If you have something like this active, make sure it doesn’t touch the dockerdata folder.

2 Likes