Network - HA design for Docker

jeremeywise · November 16, 2018, 1:16am

Design plan is docker in production: No Single Point of Failure. Load balance work to spread out load

Within the docker /cluster/config.yaml file their is a section for HA cluster IP. One for Managment and one for Proxy.
#####

High Availability Settings for master nodes

vip_iface: eth0

cluster_vip: 127.0.1.1

cluster_vip: 172.20.4.100

High Availability Settings for Proxy nodes

proxy_vip_iface: eth0

proxy_vip_iface: ens192

proxy_vip: 127.0.1.1

proxy_vip: 172.20.14.102

To provide NSPOF, I enabled these. Set them on to a common named interface used by all master and proxy nodes. But when I go to deploy I get error that IP is missing.

Below is the output from attempt to deploy: (I am confused by "failed notes as 93 is itself… and 95 also worked fine for passwordless ssh so not sure what “failed=1” and “unreachable=1” recap are about because system is up and fine)
###########################

TASK [check : Validating master HA configuration] ******************************
skipping: [172.20.14.93]

TASK [check : Validating proxy HA configuration] *******************************
skipping: [172.20.14.93]

TASK [check : Validating HA VIP configuration] *********************************
skipping: [172.20.14.93]

TASK [check : Validating HA Master node interface configuration] ***************
fatal: [172.20.14.93 -> localhost]: FAILED! => changed=false
msg: The network interface eth0 does not exist on your node.

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
172.20.14.87 : ok=17 changed=6 unreachable=0 failed=0
172.20.14.88 : ok=17 changed=6 unreachable=0 failed=0
172.20.14.89 : ok=17 changed=6 unreachable=0 failed=0
172.20.14.93 : ok=19 changed=7 unreachable=0 failed=1
172.20.14.94 : ok=17 changed=6 unreachable=0 failed=0
172.20.14.95 : ok=12 changed=3 unreachable=1 failed=0
172.20.14.96 : ok=17 changed=6 unreachable=0 failed=0
172.20.14.97 : ok=17 changed=6 unreachable=0 failed=0
172.20.14.98 : ok=17 changed=6 unreachable=0 failed=0

Playbook run took 0 days, 0 hours, 16 minutes, 37 seconds

[root@icpmaster01 cluster]# ping 172.20.14.93
PING 172.20.14.93 (172.20.14.93) 56(84) bytes of data.
64 bytes from 172.20.14.93: icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from 172.20.14.93: icmp_seq=2 ttl=64 time=0.065 ms
64 bytes from 172.20.14.93: icmp_seq=3 ttl=64 time=0.063 ms
^C
— 172.20.14.93 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.063/0.064/0.065/0.000 ms
[root@icpmaster01 cluster]# ping 172.20.14.95
PING 172.20.14.95 (172.20.14.95) 56(84) bytes of data.
64 bytes from 172.20.14.95: icmp_seq=1 ttl=64 time=0.465 ms
^C
— 172.20.14.95 ping statistics —
2 packets transmitted, 1 received, 50% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.465/0.465/0.465/0.000 ms
[root@icpmaster01 cluster]# cat /etc/hosts |grep 172.20.14.93
172.20.14.93 icpmaster01 icpmaster01
[root@icpmaster01 cluster]# ssh icpmaster01
Last login: Thu Nov 15 19:57:13 2018 from 192.168.59.27
[root@icpmaster01 ~]# exit
logout
Connection to icpmaster01 closed.
[root@icpmaster01 cluster]# cat /etc/hosts |grep 172.20.14.95
172.20.14.95 icpmaster03 icpmaster03
[root@icpmaster01 cluster]# ssh icpmaster03
Last login: Thu Nov 15 17:10:51 2018 from icpmaster01
[root@icpmaster03 ~]# exit
logout
Connection to icpmaster03 closed.
[root@icpmaster01 cluster]#

**So am I mistaken and HA on the proxy and master … assume you already have VIPs and or that IP bound to some kind of load balancer? Or does it stage this as part of its control… the mechanism to bind and move the IP as needed amoungst the master or proxy nodes.