Issue: Cannot invoke web request after some time within docker swarm - Windows Server 1709

Description

I have set up 5 docker services on two nodes, based on microsoft/windowsservercore:1709, which are principally working OK.

docker service ls
ID                  NAME                        MODE                REPLICAS            IMAGE                                                                    PORTS
1kkrdzrsjkh3        admin-vz-next-8501          replicated          1/1                 qontiscommoncontainers.azurecr.io/qontis/vega-admin:1709-latest
8yv0mi3zcsk2        frontend-vz-next-9000       replicated          1/1                 qontisvzcontainers.azurecr.io/qontis/vega-frontend-vz:1709-latest
fgo57964aslj        frontend-vz-next-9001       replicated          1/1                 qontisvzcontainers.azurecr.io/qontis/vega-frontend-vz:1709-latest
rr5akqkgmaky        loadbalancer-vz-next-8001   replicated          1/1                 qontiscommoncontainers.azurecr.io/qontis/vega-loadbalancer:1709-latest
s8zsbh9snrcq        backend-vz-next             replicated          2/2                 qontiscommoncontainers.azurecr.io/qontis/vega-backend:1709-latest

Services frontend-vz-next-9000 and frontend-vz-next-9001 are on physical server A, while the other services are placed on phyiscal Server B. The frontend-vz-next-9000 and frontend-vz-next-9001 as well as loadbalancer-vz-next-8001 are built using nginx.

Frontend services generally pass web requests on to the loadbalancer service, which in turn gives them to the backend.

This generally works ok. But, after around 1hour, the frontend services no longer pass requests on to the loadbalancer. In the frontend log, I receive errors like WSARecv() failed (10054: An existing connection was forcibly closed by the remote host) while reading response header from upstream, client: 138.201.135.82, server: localhost, request: "GET /user/v1/public/settings HTTP/1.0", upstream: "http://88.198.8.115:8001/user/v1/public/settings", host: "vz-next.qontis.io", referrer: "https://vz-next.qontis.io/login". If I use Invoke-WebRequest "http://88.198.8.115:8001/user/v1/public/settings" -UseBasicParsing directly within the frontend service, I get a The underlying connection was closed: An unexpected error occurred on a receive.

Note that when I use the same command from the host of the frontend containers (not within the container), it works ok and I receive an answer from the loadbalancer container.

After removing and recreating all services and the overlay network, it works ok again, but only for around 1 hour.

Any advice would be helpful, since we intend to go live with this environment soon.

Additional information you deem important (e.g. issue happens only occasionally):

All docker containers are based on microsoft\windowsservercore:1709.

Code used to create the docker objects:

overlay network

docker network create --driver=overlay --attachable vnet-vz-next

loadbalancer service

docker volume create --name backend_loadbalancer_logs_vz-next
docker service create
            --endpoint-mode dnsrr <# DNS roundrobin load balancing (mandatory for Windows Server docker services) #>
            --publish published=8001,target=80,mode=host
            --constraint
node.labels.qontis.vega.loadbalancer==1 <# place service on nodes with label backend=1 #>
            --with-registry-auth <# save docker login information #>
            --name loadbalancer-vz-next-8001 <# name of service #>
            --network vnet-vz-next <# docker overlay network #>
            --detach <# run service detached from console #>
            --mount type=volume src=backend_loadbalancer_logs_vz-next dst=c:/nginx/logs
            --label qontis.vega.productInstance=vz-next
            qontiscommoncontainers.azurecr.io/qontis/vega-loadbalancer:latest <# docker image #>
                <# the following are parameters for startup.ps1 running inside the docker container #>
                -backendName backend-vz-next
                -backendPublishPort 80
                -urlWhiteList '/' <# forward all requests to the backend #>
                -protocol http
                -serveSwaggerUi
                -debugLog

frontend service

docker service create
            --endpoint-mode dnsrr <# DNS roundrobin load balancing (mandatory for Windows Server docker services) #>
            --publish published=9000,target=80,mode=host
            --constraint
node.labels.qontis.vega.frontend==1 <# place service on nodes with label frontend=1 #>
            --with-registry-auth <# save docker login information #>
            --name frontend-vz-next-9000 <# name of service #>
            --network vnet-vz-next <# docker overlay network #>
            --detach <# run service detached from console #>
            --mount type=volume
src=frontend_9000_logs_vz-next
dst=c:/nginx/logs
            --label qontis.vega.productInstance=vz-next
            qontisvzcontainers.azurecr.io/qontis/vega-frontend-vz:latest <# docker image #>
                <# the following are parameters for startup.ps1 running inside the docker container #>
                -backendName 88.198.8.115
                -backendPublishPort 8001
                -urlWhiteList '/user/' <# allow only requests in the 'user' namespace #>
                -protocol http
                -serveWwwRoot <# serve www root #>

                -debugLog

Output of docker version:

Client:
 Version:      17.06.2-ee-15
 API version:  1.30
 Go version:   go1.8.7
 Git commit:   64ddfa6
 Built:        Mon Jul  9 23:33:36 2018
 OS/Arch:      windows/amd64

Server:
 Engine:
  Version:      17.06.2-ee-15
  API version:  1.30 (minimum version 1.24)
  Go version:   go1.8.7
  Git commit:   64ddfa6
  Built:        Mon Jul  9 23:45:29 2018
  OS/Arch:      windows/amd64
  Experimental: false

Output of docker info:

Containers: 6
 Running: 0
 Paused: 0
 Stopped: 6
Images: 7
Server Version: 17.06.2-ee-15
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd json-file logentries splunk syslog
Swarm: active
 NodeID: vmt7ji8ro8ky028hq52s3cmuq
 Is Manager: true
 ClusterID: 5h1zhy7h2zvkmwderln9bw0w0
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Root Rotation In Progress: false
 Node Address: 88.198.8.115
 Manager Addresses:
  88.198.8.115:2377
Default Isolation: process
Kernel Version: 10.0 16299 (16299.431.amd64fre.rs3_release_svc_escrow.180502-1908)
Operating System: Windows Server Datacenter
OSType: windows
Architecture: x86_64
CPUs: 8
Total Memory: 63.79GiB
Name: HETZNER-S002
ID: KW6X:DVU6:32HG:ZJ3C:C77C:RT4J:4MZO:6W2K:6LGH:HRO7:N7CA:7GQW
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

Windows Server Core, Version 1709, with updates KB4339420 and KB4343897