Description
I have set up 5 docker services on two nodes, based on microsoft/windowsservercore:1709, which are principally working OK.
docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
1kkrdzrsjkh3 admin-vz-next-8501 replicated 1/1 qontiscommoncontainers.azurecr.io/qontis/vega-admin:1709-latest
8yv0mi3zcsk2 frontend-vz-next-9000 replicated 1/1 qontisvzcontainers.azurecr.io/qontis/vega-frontend-vz:1709-latest
fgo57964aslj frontend-vz-next-9001 replicated 1/1 qontisvzcontainers.azurecr.io/qontis/vega-frontend-vz:1709-latest
rr5akqkgmaky loadbalancer-vz-next-8001 replicated 1/1 qontiscommoncontainers.azurecr.io/qontis/vega-loadbalancer:1709-latest
s8zsbh9snrcq backend-vz-next replicated 2/2 qontiscommoncontainers.azurecr.io/qontis/vega-backend:1709-latest
Services frontend-vz-next-9000
and frontend-vz-next-9001
are on physical server A, while the other services are placed on phyiscal Server B. The frontend-vz-next-9000
and frontend-vz-next-9001
as well as loadbalancer-vz-next-8001
are built using nginx.
Frontend services generally pass web requests on to the loadbalancer service, which in turn gives them to the backend.
This generally works ok. But, after around 1hour, the frontend services no longer pass requests on to the loadbalancer. In the frontend log, I receive errors like WSARecv() failed (10054: An existing connection was forcibly closed by the remote host) while reading response header from upstream, client: 138.201.135.82, server: localhost, request: "GET /user/v1/public/settings HTTP/1.0", upstream: "http://88.198.8.115:8001/user/v1/public/settings", host: "vz-next.qontis.io", referrer: "https://vz-next.qontis.io/login"
. If I use Invoke-WebRequest "http://88.198.8.115:8001/user/v1/public/settings" -UseBasicParsing
directly within the frontend service, I get a The underlying connection was closed: An unexpected error occurred on a receive.
Note that when I use the same command from the host of the frontend containers (not within the container), it works ok and I receive an answer from the loadbalancer container.
After removing and recreating all services and the overlay network, it works ok again, but only for around 1 hour.
Any advice would be helpful, since we intend to go live with this environment soon.
Additional information you deem important (e.g. issue happens only occasionally):
All docker containers are based on microsoft\windowsservercore:1709
.
Code used to create the docker objects:
overlay network
docker network create --driver=overlay --attachable vnet-vz-next
loadbalancer service
docker volume create --name backend_loadbalancer_logs_vz-next
docker service create
--endpoint-mode dnsrr <# DNS roundrobin load balancing (mandatory for Windows Server docker services) #>
--publish published=8001,target=80,mode=host
--constraint
node.labels.qontis.vega.loadbalancer==1 <# place service on nodes with label backend=1 #>
--with-registry-auth <# save docker login information #>
--name loadbalancer-vz-next-8001 <# name of service #>
--network vnet-vz-next <# docker overlay network #>
--detach <# run service detached from console #>
--mount type=volume src=backend_loadbalancer_logs_vz-next dst=c:/nginx/logs
--label qontis.vega.productInstance=vz-next
qontiscommoncontainers.azurecr.io/qontis/vega-loadbalancer:latest <# docker image #>
<# the following are parameters for startup.ps1 running inside the docker container #>
-backendName backend-vz-next
-backendPublishPort 80
-urlWhiteList '/' <# forward all requests to the backend #>
-protocol http
-serveSwaggerUi
-debugLog
frontend service
docker service create
--endpoint-mode dnsrr <# DNS roundrobin load balancing (mandatory for Windows Server docker services) #>
--publish published=9000,target=80,mode=host
--constraint
node.labels.qontis.vega.frontend==1 <# place service on nodes with label frontend=1 #>
--with-registry-auth <# save docker login information #>
--name frontend-vz-next-9000 <# name of service #>
--network vnet-vz-next <# docker overlay network #>
--detach <# run service detached from console #>
--mount type=volume
src=frontend_9000_logs_vz-next
dst=c:/nginx/logs
--label qontis.vega.productInstance=vz-next
qontisvzcontainers.azurecr.io/qontis/vega-frontend-vz:latest <# docker image #>
<# the following are parameters for startup.ps1 running inside the docker container #>
-backendName 88.198.8.115
-backendPublishPort 8001
-urlWhiteList '/user/' <# allow only requests in the 'user' namespace #>
-protocol http
-serveWwwRoot <# serve www root #>
-debugLog
Output of docker version
:
Client:
Version: 17.06.2-ee-15
API version: 1.30
Go version: go1.8.7
Git commit: 64ddfa6
Built: Mon Jul 9 23:33:36 2018
OS/Arch: windows/amd64
Server:
Engine:
Version: 17.06.2-ee-15
API version: 1.30 (minimum version 1.24)
Go version: go1.8.7
Git commit: 64ddfa6
Built: Mon Jul 9 23:45:29 2018
OS/Arch: windows/amd64
Experimental: false
Output of docker info
:
Containers: 6
Running: 0
Paused: 0
Stopped: 6
Images: 7
Server Version: 17.06.2-ee-15
Storage Driver: windowsfilter
Windows:
Logging Driver: json-file
Plugins:
Volume: local
Network: l2bridge l2tunnel nat null overlay transparent
Log: awslogs etwlogs fluentd json-file logentries splunk syslog
Swarm: active
NodeID: vmt7ji8ro8ky028hq52s3cmuq
Is Manager: true
ClusterID: 5h1zhy7h2zvkmwderln9bw0w0
Managers: 1
Nodes: 2
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Root Rotation In Progress: false
Node Address: 88.198.8.115
Manager Addresses:
88.198.8.115:2377
Default Isolation: process
Kernel Version: 10.0 16299 (16299.431.amd64fre.rs3_release_svc_escrow.180502-1908)
Operating System: Windows Server Datacenter
OSType: windows
Architecture: x86_64
CPUs: 8
Total Memory: 63.79GiB
Name: HETZNER-S002
ID: KW6X:DVU6:32HG:ZJ3C:C77C:RT4J:4MZO:6W2K:6LGH:HRO7:N7CA:7GQW
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
Windows Server Core, Version 1709, with updates KB4339420 and KB4343897