While preparing for my DCA exam, I started experimenting with Docker Swarm and ingress publishing in particular.
Let’s imagine the following scenario. I have a single container web application which I have to publish to my users. For example, my application will be running in nginx container, published on port tcp/80. If I deploy it to my swarm cluster, every cluster node will service requests for this application, regardless of whether or not the node is currently hosting the application, which is awesome.
My question is how will I publish such an application to my users in a production environment? What IP address shall I point my A record to?
I can imagine using some sort of load balancer in front of the app (however, I am not quite sure how accurate the health probes will be). Another option I can think of, is using the IP address of a single Docker node in the A record. This approach will work, but if this particular node is down for some reason, I guess the whole app will be unavailable.
Could you guys share some production best practices for publishing web apps to customers in a non-cloud infrastructure?
The normal case would be to use a high available load balancer (n>1) in front:
set the ip of each lb as part of a multivalued a-record entry. As a result not every client would end up on the same lb and after the cached dns entry expires, a different instance could be picked. Useful with a short ttl.
use a single a-record pointing to a failover-ip. A healthy instance will hold the failover-ip, until it becomes unhealthy and another instance will take over the failover-ip (e.g. keepalived or carp)
In both case the lb preferably forwards traffic to the swarm nodes in round-robin manner and is able to query health checks against the target nodes to determine whether the nodes are healthy. Of course the health checks need to have short thresholds to detect a node failure/recovery faster.
I have never seen a multivalued a-record entry to point to swarm nodes at all. Thus, said I also have never seen them directly exposed to the internet, always operated in private subnets behind a public loadbalancer. As docker tends to punch holes in the firewall for published ports, I would be cautious in this scenario
One more thing: if you need to preserve the source ip, you will need to bypass ingress and use the mode: host for that particular port. As a side effect the routing mesh can’t be used for this published port. You can combine node labels + placement constraints that require the node labes + global deplyomenets to restrict where a particular service should be running. This would allow to run a traefik in the swarm cluster on specific nodes, which could then be used as target nodes for the fronting loadbalancer.