TCP timeout that occurs only in Docker Swarm, not simple "docker run"

jjfraney · September 18, 2018, 1:53pm

From the ipvs project’s ‘how-to’ pages, connection disruption by timeout is intended. ipvs assumes connections are stateless and short lived. Such helps meet high availability goals. Long held connections can exhaust resources, and don’t help high availability. ipvs cannot distinguish legitimate long held connections from mismanaged connections that were not explicitly closed.

Your connection pool library ought to be able to recover from connection loss, no matter how it was dropped (ipvs, maintenance, network failure, node failure, whatever)

To reduce the incidence of connection loss detected by your pool, there are two options:

use dnsrr for the database service to avoid ipvs’s connection timeouts, reducing the incident of connection loss. or,
continue with ipvs, and use a pool option such the connections are dropped by the pool after some shorter idle time.

The length of the timeout is not material…connection drops are inevitable.