Swarm ingress/routing mesh and keepalived interactions - different IPs seen on containers depending on which node received the traffic

I have a tiny docker swarm setup, with keepalived running on the nodes so that applications connecting into containers in the swarm don’t need to know which host the container happens to be running on, and in case of failure of one of the nodes, nothing needs to be updated.

However, when this has happened, the existing tcp sessions still break, because when the vip is picked up by one of the other nodes, the internal ip seen by the container changes. Is there any way to have the swarm routing mesh ‘obfuscate’ that, and use a single ip address regardless of which node traffic came into, so that tcp sessions aren’t broken by that handover?

Can you detail your keepalived setup, what is happening when a node fails?

Keepalived is working fine as far as I can tell. I see the vip show up in node2, and I can see traffic being received on node2 when node1 fails.

The issue as far as I can tell is the incoming IP change on the overlay network because traffic is now coming in a different node.
When traffic comes in node1, it has one ip, when it comes in node 2 it has a different ip

In the logs in the container application I can see the following:

using vip on node1:
10.0.0.13 [02/Jan/2024:20:53:37 +0000] TCP 200 671 2 3.457 “10.0.7.93:8000” “2” “671” “0.001”
using real ip on node1:
10.0.0.13 [02/Jan/2024:20:54:08 +0000] TCP 200 671 5 5.085 “10.0.7.93:8000” “5” “671” “0.001”

using vip on node2 (simulated failed node1 by just stopping keepalived service)
10.0.0.8 [02/Jan/2024:20:56:11 +0000] TCP 200 837 2 5.145 “10.0.7.93:8000” “2” “837” “0.000”
using real ip on node2:
10.0.0.8 [02/Jan/2024:20:56:38 +0000] TCP 200 671 2 2.657 “10.0.7.93:8000” “2” “671” “0.001”

Can you explain what exactly you mean by breaking TCP session? Just because a request is coming from a different IP address, you should still get the response and following requests should arrive as well. Existing connections will break of course, because the original machine will not be there. If your application has a session to identify the user, and breaks because coming from different IP addresses are not allowed, that is another issue, and the application should check the real IP in the HTTP header in which proxies can save it. For example you could have the VIP ip address on which the proxy is listening (could listen on all the ip addresses) and you send the request to the proxy, which could run on all nodes.

Can you explain what exactly you mean by breaking TCP session?

I mean exactly that. The TCP session is broken, because the destination doesn’t see the incoming traffic as part of the same session because it’s coming from a different IP. Hence why I’m asking if there is a way to have all incoming traffic on the overlay network regardless of the node that it comes in on have the same IP.

Existing connections will break of course, because the original machine will not be there.

If the container happens to be running on node3 (or 4 or 5) or anything other than the node that failed, the ‘machine’ (container) is still alive.

HTTP header

There is no HTTP protocol in use.

I would still recommend a proxy, just not HTTP proxy but TCP proxy.

It seems you have a similar problem:

Don’t let yourself to be misled by the title. It is about connecting to a DB from a virtual IP and not the node IP.

Proxy protocol could also help

The Proxy Protocol adds a header to a TCP connection to preserve the client’s IP address. This method solves the lost-client-IP problem for any application-layer protocol that transmits its messages over TCP/IP

But that has to be supported by both sides.