Solved : Swarm service cannot access to host network

Expected behavior

Try to deploy this image on a swarm. This application use SQL server to store data. I need to use an existing SQL server hosted on a server on the Docker host network (192.168.100.0). From the Docker host (192.168.100.10), I can connect to this SQL server(192.168.100.2) on the same way the container do it.
I want to deploy it on a swarm by running :

docker stack deploy -c .\docker-stack.yml test

Actual behavior

When I deploy it with compose, it works. But when I deploy it on the swarm, the created container cannot access to SQL server. I used the same compose file for both.

Information

Docker-stack.yml

version: '3.3'

services:
    efficient5: 
        image: kronosefficient/efficient:5.3.1-ltsc2016
        deploy:
            replicas: 1
            restart_policy:
                condition: on-failure
                delay: 10s
                max_attempts: 3
                window: 120s
            resources:
                limits:
                    memory: 2000M
                reservations:
                    memory: 200M
        environment:
            - EFFICIENT_URL=http://127.0.0.1
            - SQLSERVER_NAME=192.168.100.2
            - SQLSERVER_DBNAME=database
            - SQLSERVER_USER=user
            - SQLSERVER_PWD=pwd
        volumes:
            - document:c:\efficient5\App\Document
        ports:
            - "8080:80"
        networks:
            - ke5network

volumes:
    document:

networks:
    ke5network:

Docker info

PS c:\docker info
Containers: 6
 Running: 0
 Paused: 0
 Stopped: 6
Images: 6
Server Version: 18.09.3
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: ics l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd gelf json-file local logentries splunk syslog
Swarm: active
 NodeID: -
 Is Manager: true
 ClusterID: -
 Managers: 1
 Nodes: 1
 Default Address Pool: 10.0.0.0/8
 SubnetSize: 24
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 192.168.100.10
 Manager Addresses:
  192.168.100.10:2377
Default Isolation: process
Kernel Version: 10.0 14393 (14393.2791.amd64fre.rs1_release.190205-1511)
Operating System: Windows Server 2016 Standard Version 1607 (OS Build 14393.2791)
OSType: windows
Architecture: x86_64
CPUs: 12
Total Memory: 16GiB
Name: DOCKER01
ID: -
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

I think it is related to the network. I saw docker_gwbridge is not present, maybe it isn’t exists on Windows.
I don’t know if I need to add a bridge network to connect the swarm overlay network to the hosts’ network.

Can you help me please ?

When troubleshooting networks, you’ll want to start with what works and then slowly build up your solution.

  1. Try from a standard alpine non-swarm container, can you ping or telnet to the SQL server port as expected?
  2. If that works, try from the same standard alpine container as a swarm service and see if the connection works.
  3. Then you can try a stack file, but remove a lot of your complexity. You don’t need to define the network, and for now remove restart and resource configs.

Networking can be all sorts of things. Network proxies, dupe IP subnets, etc. so you’ll want to ensure those aren’t an issue first before trying to connect your app.

Hi Bret,

Thanks for the answer (and your amazing courses by the way)

I think something broke when I create the swarm. The issue started after the swarm creation and occurs now on all containers (including standalone).

I followed your advise. I left the swarm,

docker swarm leave -f

reboot and run a simple powershell container

docker run -it mcr.microsoft.com/windows/servercore powershell

SQL server not answering to ping from container but does from host

So I inspect the container and I found out a clue : it has no IP address :frowning:

 "NetworkSettings": {
   "Bridge": "",
   "SandboxID": "**removed**",
   "HairpinMode": false,
   "LinkLocalIPv6Address": "",
   "LinkLocalIPv6PrefixLen": 0,
   "Ports": {},
   "SandboxKey": "**removed**",
   "SecondaryIPAddresses": null,
   "SecondaryIPv6Addresses": null,
   "EndpointID": "",
   "Gateway": "",
   "GlobalIPv6Address": "",
   "GlobalIPv6PrefixLen": 0,
   "IPAddress": "",
   "IPPrefixLen": 0,
   "IPv6Gateway": "",
   "MacAddress": "",
   "Networks": {
     "nat": {
         "IPAMConfig": null,
         "Links": null,
         "Aliases": null,
         "NetworkID": "**removed**",
         "EndpointID": "",
         "Gateway": "",
         "IPAddress": "",
         "IPPrefixLen": 0,
         "IPv6Gateway": "",
         "GlobalIPv6Address": "",
         "GlobalIPv6PrefixLen": 0,
         "MacAddress": "",
         "DriverOpts": null
     }
 }

docker network ls give me that :

PS C:\Windows\system32> docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
--000000000--        nat                 nat                 local
--000000000--        none                null                local

and docker network inspect nat give me :

[
{
    "Name": "nat",
    "Id": "**removed**",
    "Created": "2019-04-09T09:20:23.1456257+02:00",
    "Scope": "local",
    "Driver": "nat",
    "EnableIPv6": false,
    "IPAM": {
        "Driver": "windows",
        "Options": null,
        "Config": [
            {
                "Subnet": "172.23.64.0/20",
                "Gateway": "172.23.64.1"
            }
        ]
    },
    "Internal": false,
    "Attachable": false,
    "Ingress": false,
    "ConfigFrom": {
        "Network": ""
    },
    "ConfigOnly": false,
    "Containers": {},
    "Options": {
        "com.docker.network.windowsshim.hnsid": "**removed**",
        "com.docker.network.windowsshim.networkname": "nat"
    },
    "Labels": {}
}
]

I compared that with a fresh install on Windows 10, I have an additionnal network nammed “Default switch” but inspect nat is the same. I will try to desinstall/reinstall (or repair) docker.

Standalone container have now an IP address and can ping the host’s world after desinstallation/reinstallation.

I installed docker container like that :

Add-WindowsFeature Containers
Install-Module DockerMsftProvider -Force
Install-Package -Name Docker -ProviderName DockerMsftProvider -Force
Install-Package ContainerImage -Force

Is Swarm supported on this mode on Windows 2016 ?

With this compose file, I have error on network

version: '3.3'

services:
    efficient5: 
        image: kronosefficient/efficient:5.3.1-ltsc2016
        environment:
            - EFFICIENT_URL=http://127.0.0.1
            - SQLSERVER_NAME=192.168.100.2
            - SQLSERVER_DBNAME=db
            - SQLSERVER_USER=user
            - SQLSERVER_PWD=pwd
        volumes:
            - document:c:\App\Document
        ports:
            - "8080:80"

volumes:
    document:

The error is

Creating network "testkronosefficientcom_default" with the default driver
ERROR: HNS failed with error : The parameter is incorrect.

I remember I fixed this error by running (https://stackoverflow.com/questions/45394360/hns-failed-with-error-the-parameter-is-incorrect)

get-netnat | remove net-nat

And it broke all network except compose.

So if you run into this issue remove only the nat if you need only compose.

thanks again Bret for your help