Overlay Network not Working on Multi Host environment

Hi Experts …
I am trying to follow a simple overlay network tutorial on my Development environment with TWO MacOS hosts and still could not reach a working Condition
I am following this tutorial : Networking with overlay networks | Docker Documentation

My Environment

Mac1 : Catalina with en0 at 192.168.0.189 and Docker Desktop for Mac 4.1.1 , Engine : 20.10.8
Mac2 : Monterey with en0 at 192.168.0.150 and Docker Desktop for Mac 4.3.2 , Engine : 20.10.11

My Goal would be have a POSTGRES SERVER container running on Mac1 accessible from a
POSTGRES Client container running on Mac 2

I have setup the ports accessible by temporarily turn MacOS Firewall off

I can execute the Step 1 on Mac1 ( I would like this host as a MANAGER )

MAC1> $ docker swarm init

BUT when I try the step 2 on Mac2 to add a WORKER Node wit

MAC2> $ docker swarm join --token <your_token> <your_ip_address>:2377

I GOT this error

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.0.189:2377: connect: no route to host"

Both MACs are on the same Network and I can ping each other by IP and name

I checked a lot of swarm documents and no one mention any other steps to setup so … I would like to check if someone faced a similar scenario

Updating that after get docker daemon network subnet to default 19.2.168.65.0/24 I got a different error during join
desc = “transport: Error while dialing dial tcp 192.168.65.3:2377: connect: connection refused”

But this is crazy because MacOS firewall is TURNED OFF

Are you trying to join the swarm cluster using the IP address of the virtual machine on the manager Mac? Because that won’t work. You have to use an IP address which is available from other machines. It is not clear to me which IP address belongs to which of your networks but based on the interface name (en0) I guess 192.168.0.189 is an IP on your LAN network and yet you got “no route to host” when you used thatIP address. Then you wrote you updated the docker daemon network to the default and the IP address was 192.168.65.3 which is really in the default IP range the virtual machine has.

1 Like

Hey @rimelek thanks per your interest …

Well 192.168.0.189 (Mac1) and 192.168.0.150(Mac2) are the LAN ip Address ( from cabled ethernet interface )

What I have mentioned about subnet is what I can see on Docker Desktop in Preferences → Network screen (Hope that MAC1 Screen shot upload works )

I could see that eve without any change … the subnet value(192.168.65.0/24) are the same on the MAC2 Docker for Desttop

I think that setting is just to avoid using the same LAN IP range as your local network. Use your host IP address to join the cluster without changing the Docker subnet unless your have an other existing network (virtual or physical) with a colliding IP range.

1 Like

yes …that is my current status but … for some reason … I am getting connection refused even with firewall off
Still Researching

Can you show the join command? You can replace the token with a placeholder.

Yes …
so, here is my init

MAC1> $ docker swarm init ( MAC1 have 192.168.0.189 lan address en0) and get the following output

Swarm initialized: current node (i3llbiiyoh1mr9578mlv543u5) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join --token SWMTKN-1-5hr6p8nzxvt8xoe32sh2s40qyzdajltb6lsje31pdsflqnf20y-4h9mlhbof6k2s9kvg8gyhwww9 192.168.0.189:2377

Then I run exactly this command on MAC2 ( 192.168.0.150)
MAC2> $ docker swarm join --token SWMTKN-1-5hr6p8nzxvt8xoe32sh2s40qyzdajltb6lsje31pdsflqnf20y-4h9mlhbof6k2s9kvg8gyhwww9 192.168.0.189:2377
and get this →
Error response from daemon: dial unix docker.raw.sock: connect: connection refused

1 Like

The same happens if I try this way :

MAC1
docker swarm init  :heavy_check_mark:  14:25:08 
Swarm initialized: current node (nyrftxe6vemwns0jxytu7336h) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join --token SWMTKN-1-4b4e1dlz9cqg7kzxoy0kaflh1focr59k0mlf9giekxsn1de8kj-dmrdy07cv83s4urgwhqkjlybz 192.168.65.3:2377

MAC2
docker swarm join --token SWMTKN-1-4b4e1dlz9cqg7kzxoy0kaflh1focr59k0mlf9giekxsn1de8kj-dmrdy07cv83s4urgwhqkjlybz 192.168.65.3:2377
Error response from daemon: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial tcp 192.168.65.3:2377: connect: connection refused”

These are not the same.

The latter will never work. The former message indicates your docker desktop is not running on the machine which you want to join to the cluster.
One thing I don’t understand is the join command.

Is this really what the docker swarm init shows after running? Or you changed it because this was what I was asked for?

In the meantime I tried to join one docker desktop machine to another but it refuses to join because it “thinks” thee node is already part of the swarm cluster. I don’t have two Macs so one of my machines is a MacBook, the other is a Windows 10 machine.

update: I fixed the quote with the code

2 Likes

hi @rimelek ,
yes that output is exaclty what I got after run the swar init command on MAC1 but using the --advertise-addr option .

If I run only : docker swarm init … I got the following output :

Swarm initialized: current node (z10r0yr5jtd4ivmm8e7ghrbsn) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join --token SWMTKN-1-0boqr5zsu31mtws5o1771ssoi6wfd05bcmu78ghaks7207263u-4vtrmnm0mvxy662mkq94o5vqp 192.168.65.3:2377

look that now the ip address indicated to use on join has changed to 192.168.65.3

and I have this
sudo pfctl -vnf /etc/pf.conf  :heavy_check_mark:  15:57:59 
Password:
pfctl: Use of -f option, could result in flushing of rules
present in the main ruleset added by the system at startup.
See /etc/pf.conf for further details.

scrub-anchor “/" all fragment reassemble
nat-anchor "/
” all
rdr-anchor “/" all
anchor "/
” all
pass in proto tcp from any to any port = 2377 flags S/SA keep state
pass in proto tcp from any to any port = 7496 flags S/SA keep state
pass in proto udp from any to any port = 7496 keep state
pass in proto udp from any to any port = 4789 keep state
dummynet-anchor “/*” all

Loading anchor com.apple from /etc/pf.anchors/com.apple
anchor “/" all
anchor "/
” all

Well, I tried to use my Mac as a Swarm manager but could not. Strange because somehow I thought I had already done it before. Then I started to think.

  • You can’t connect to the swarm node using the virtual machines internal IP address.
  • I don’t think you can use --advertise-addr=LAN_IP because the virtual machine will not have the interface with this IP address
  • I found some clues that the MacOS cannot be a swarm manager: https://www.reddit.com/r/docker/comments/p0pr1n/docker_swarm_on_mac_not_listening_on_port/
  • There is no service listening on port 2377 on the Mac so the port forwarding is missing (from the host to the VM)
  • The host machine does not have the interface to connect to the VM directly.
  • We can forward ports from the host to the container using docker
  • Docker containers can communicate with the host using it’s IP address
  • socat can forward TCP traffic
  • Let’s publish a port using Docker and forward the requests to the Docker VM
  • We can’t forward port 2377 because it is already used inside the VM which is the real host of the containers

So I did this on the Mac:

docker run --rm -d -p 2378:2378 --name swarm-manager-proxy alpine/socat tcp-l:2378,fork,reuseaddr tcp:192.168.65.3:2377

Then I run this on an other machine to join the cluster

docker swarm join --token <token> 192.168.4.141:2378

I could join a Linux machine and my othr Docker desktop on Windows 10.

ID                            HOSTNAME         STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
k3rqygomduzvyo8utdvgeko52     docker-desktop   Ready     Active                          20.10.11
olxulfdw70r4kzqhfhcz6tteu *   docker-desktop   Ready     Active         Leader           20.10.11
xbbmnhfgbi2k2d2pce1syhl0x     ta-lteg          Down      Active                          20.10.12

update: I relized my Windows host is “Down”. but I haven’t configured the Windows firewall so maybe it is not inpossible to solve. We just have to make sure the proper port fowarding are working

update2: No, the two desktops are ready and my linux is down probably because it went to sleep in the meantime

1 Like

Hey … i admit that I do not know what socat do … but I will read about …
The only reason I am trying all this is because I understand that this way I would reach my final goal that is :

MAC1 run a Container with Postgres database
MAC2 run a Container with a Node.js application that need connect to the database running on MAC1 container …

I will need to check how to add this forward TCP trafic to my use case …
Thanks a lot per your Help

You can just run the container on a MacOS swarm manager and use --restart=always which I forgot about. Then it should work. socat will do the port forwarding, you just need to use the port 2378 to join the cluster instead of port 2377

1 Like