TCP timeout that occurs only in Docker Swarm, not simple "docker run"

( I uploaded to Github all the codes shown here: https://github.com/gypark/docker-swarm-tcp-timeout )

Hello,

At first, I am sorry but I am not so good at English. I tried to search articles here and github that seem related to my issue, but there were too many documents and threads about Docker networking for me to read them one by one and check whether it is exactly about my issue. Maybe this post would be duplicated one. If that’s the case, please show me the link of the correspondent post.

Here are some information about my environments:

OS’s

  • CentOS7 Linux kernel 3.10.0-693.17.1.el7.x86_64
  • CentOS7 Linux kernel 3.10.0-862.11.6.el7.x86_64
  • Ubuntu 16.04 Linux kernel 4.4.0-134-generic

Docker version:

  • 18.03.0-ce
  • 18.06.1-ce

I experienced some strange behavior of containers get lost the connection to other container, such as between Java application and Postgresql DB. I wanted to test whether it is a problem of Java or any TCP network. So I built very simple TCP server and client using Perl.

My code:

TCP server:

# This server just receives a message (ex: 'PING (3) at 13:00:25')
# and sends a response (ex: 'PONG (3)')
use strict;
use warnings;
use IO::Socket;
use 5.010;

$| = 1;

$SIG{TERM} = sub {
    say "[@{[scalar localtime]}] >>> recv SIGTERM <<<";
    exit;
};
$SIG{INT} = sub {
    say "[@{[scalar localtime]}] >>> recv SIGINT <<<";
    exit;
};


my $server_port = $ARGV[0] // 24000;

my $server = IO::Socket::INET->new(LocalPort => $server_port,
                                Type      => SOCK_STREAM,
                                Reuse     => 1,
                                Listen    => 10 )   # or SOMAXCONN
    or die "Couldn't be a tcp server on port $server_port : $@\n";

say "[@{[scalar localtime]}] server started at port $server_port...";

while (my $client = $server->accept()) {
    say "[@{[scalar localtime]}] NEW CONNECTION ESTABLISHED.";
    while (1) {
        my $read = <$client>;

        unless (defined $read) {
            say "[@{[scalar localtime]}] FAILED TO RECEIVE MESSAGE...QUIT";
            last;
        }

        chomp $read;
        say "[@{[scalar localtime]}] recv $read";
        my ($count) = $read =~ /\((\d+)\)/;

        print {$client} "PONG ($count)\n";
    }
}

close($server);

TCP client:

# This client repeats
# sending a message (ex: 'PING (5) at 16:23:56') to a server,
# receiving a response (ex: 'PONG (5)') from a server
# and sleeping for a delay(increasing exponentially)
use strict;
use warnings;
use IO::Socket;
use 5.010;

$| = 1;

$SIG{TERM} = sub {
    say "[@{[scalar localtime]}] >>> recv SIGTERM <<<";
    exit;
};
$SIG{INT} = sub {
    say "[@{[scalar localtime]}] >>> recv SIGINT <<<";
    exit;
};

my $remote_host = $ARGV[0] // 'server';
my $remote_port = $ARGV[1] // 24000;

my $socket;

while (1) {
    $socket = IO::Socket::INET->new(PeerAddr => $remote_host,
                                    PeerPort => $remote_port,
                                    Proto    => "tcp",
                                    Type     => SOCK_STREAM);
    last if $socket;

    warn "Couldn't connect to $remote_host:$remote_port : $@ ... wait and retry\n";
    sleep 2;
}


my $delay = 1;
my $count = 0;
while (1) {
    $count++;
    my $now = localtime;
    my $msg = "PING ($count) at $now";
    say "";
    say "[@{[scalar localtime]}] send $msg";

    print $socket "$msg\n";

    my $answer = <$socket>;

    unless (defined $answer) {
        say "[@{[scalar localtime]}] FAILED TO RECEIVE ANSWER...QUIT";
        last;
    }

    chomp $answer;
    say "[@{[scalar localtime]}] recv $answer";

    say "[@{[scalar localtime]}]  sleep $delay secs";
    sleep $delay;
    $delay *= 2;
}

# and terminate the connection when we're done
close($socket)

Dockerfile for server:

FROM perl:5.26
RUN apt-get update && apt-get install -y net-tools
COPY tcp_server.pl /myapp/tcp_server.pl
WORKDIR /myapp
ENTRYPOINT [ "perl", "./tcp_server.pl" ]

Dockerfile for client:

FROM perl:5.26
RUN apt-get update && apt-get install -y net-tools
COPY tcp_client.pl /myapp/tcp_client.pl
WORKDIR /myapp
ENTRYPOINT [ "perl", "./tcp_client.pl" ]

Test 1. Execute as a native application without Docker

# in a terminal
$ perl ./tcp_server.pl
# in a new terminal
$ perl tcp_client.pl localhost

Server log:

[Mon Sep 10 14:39:46 2018] server started at port 24000...
[Mon Sep 10 14:39:57 2018] NEW CONNECTION ESTABLISHED.
[Mon Sep 10 14:39:57 2018] recv PING (1) at Mon Sep 10 14:39:57 2018
[Mon Sep 10 14:39:58 2018] recv PING (2) at Mon Sep 10 14:39:58 2018 <- after 1 sec
[Mon Sep 10 14:40:00 2018] recv PING (3) at Mon Sep 10 14:40:00 2018 <- after 2 secs
...
[Mon Sep 10 23:46:04 2018] recv PING (16) at Mon Sep 10 23:46:04 2018
[Tue Sep 11 08:52:12 2018] recv PING (17) at Tue Sep 11 08:52:12 2018 <- after 32768 secs

Client log:

[Mon Sep 10 14:39:57 2018] send PING (1) at Mon Sep 10 14:39:57 2018
[Mon Sep 10 14:39:57 2018] recv PONG (1)
[Mon Sep 10 14:39:57 2018]  sleep 1 secs

[Mon Sep 10 14:39:58 2018] send PING (2) at Mon Sep 10 14:39:58 2018
[Mon Sep 10 14:39:58 2018] recv PONG (2)
[Mon Sep 10 14:39:58 2018]  sleep 2 secs

[Mon Sep 10 14:40:00 2018] send PING (3) at Mon Sep 10 14:40:00 2018
[Mon Sep 10 14:40:00 2018] recv PONG (3)
[Mon Sep 10 14:40:00 2018]  sleep 4 secs
...
[Mon Sep 10 23:46:04 2018] send PING (16) at Mon Sep 10 23:46:04 2018
[Mon Sep 10 23:46:04 2018] recv PONG (16)
[Mon Sep 10 23:46:04 2018]  sleep 32768 secs

[Tue Sep 11 08:52:12 2018] send PING (17) at Tue Sep 11 08:52:12 2018
[Tue Sep 11 08:52:12 2018] recv PONG (17)
[Tue Sep 11 08:52:12 2018]  sleep 65536 secs
(This is still running now)

As you see, there is no problem. My server and client have been running very well. The TCP session was not disconnected being idle even for 32768 seconds.
(I’m not an expert on networking so I might use incorrect terms describing networking elements)

Test 2. Execute using docker run

# In a terminal:
$ docker run --rm --name myserver gypark/tcp_server:1.0
# In a new terminal:
$ docker run --rm --link myserver:server gypark/tcp_client:1.0

I’ll omit the log. These container are still running for now without any problem.

Test 3. Execute using docker run, between two Docker hosts using an overlay network

# HOST1 and HOST2 are Linux machines and they construct a Docker Swarm

# In HOST1(manager)
$ docker network create --attachable --driver overlay tcptest_net
$ docker run --rm --name myserver_overlay --net tcptest_net gypark/tcp_server:1.0
# In HOST2(worker)
$ docker run --rm --link myserver_overlay:server --net tcptest_net gypark/tcp_client:1.0

Again, the server in HOST1 and the client in HOST2 are still running. The client are sleeping for 65536 seconds currently.

Test 4. Execute as Docker services using Swarm

docker-compose.yml:

version: "3.6"
services:

  server:
    image: gypark/tcp_server:1.0
    networks:
      - net
    deploy:
      mode: global
      placement:
        constraints:
          - node.hostname == HOST1
      restart_policy:
        condition: none

  client:
    image: gypark/tcp_client:1.0
    networks:
      - net
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
# I tried 'server and client in one host', 'in different hosts'
# by changing these constraints
          - node.hostname == HOST2
      restart_policy:
        condition: none

networks:
  net:
    driver: overlay
    attachable: true

My command:

$ docker stack deploy -c ./docker-compose.pl tcptest
$ docker service logs -f tcptest_server
$ docker service logs -f tcptest_client

In this case, the client failed to send PING(12) after sleeping for 1024 seconds:

# client log:
...
[Mon Sep 10 16:02:59 2018] send PING (11) at Mon Sep 10 16:02:59 2018
[Mon Sep 10 16:02:59 2018] recv PONG (11)
[Mon Sep 10 16:02:59 2018]  sleep 1024 secs

[Mon Sep 10 16:20:03 2018] send PING (12) at Mon Sep 10 16:20:03 2018
[Mon Sep 10 16:35:31 2018] FAILED TO RECEIVE ANSWER...QUIT
(client container stopped running)

# server log:
...
[Mon Sep 10 15:54:27 2018] recv PING (10) at Mon Sep 10 15:54:27 2018
[Mon Sep 10 16:02:59 2018] recv PING (11) at Mon Sep 10 16:02:59 2018
(server is still running here. It didn't receive PING(12).)

I modified the client code to find the precise value of $delay that cause a failure. I found that clients failed to send a message after sleeping for 901~903 seconds.

Discussion

I googled with keywords like ‘docker swarm tcp timeout’ etc. Then I found some docs:

They are saying abount network timeout and, especially, the second doc tell about 15 minutes. However, I saw these docs several months ago and I had changed net.ipv4.tcp_keepalive_time parameter to be 600 already.

I also tried to change tcp timeout using ipvsadm but it did not work:

$ ipvsadm --set 36000 0 0

In addition, I’m so confused this problem occurred only in Swarm, never in simple docker run even using an overlay network.

Trouble in production

At the beginning, a few months ago, I was in trouble because Java Spring Boot application loses the connection to Postgresql DB containers occasionally. The Java app was using HikariCP connection pool and HikariCP said it failed to validate the connection:

HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@3655ffa3

In that time, I could resolve this issue by applying net.ipv4.tcp_keepalive_time = 600.

However, a securiy engineer updated Linux kernel of Docker hosts from 3.10.0-693.17.1 to 3.10.0-862.11.6 last week. And this failures came again! :frowning: Even keepalive parameter did not help!

He and I decided downgrade kernel version yesterday. And this error have not shown till now. However, I want to resolve this issued if possible.

I’m not sure the ‘simple TCP server-client’ example above and ‘Java-DB’ problem below are caused by same reason. Especially, ‘validating error’ of HikariCP does not occur everytime, I mean, 15 minutes after launch. It occurs at 30 minutes after launch sometimes, at several hours after launch some other times. But these intervals are always multiplication of 30minutes - 30min, 1.5hour, etc. I guess it’s because the default timeout setting of HikariCP is 30minutes.

Anyway, I guess they are related.

I want to know:

  • If it is possible to fix this, and how

  • URL that I can upload this post so that the developers of Docker swarm can read. (moby/moby github issue is right?)

Thanks for reading.

1 Like

You can try changing the endpoint_mode to dnsrr. It would avoid using the kernel’s vs table. (you mentioned ipvsadmin). I have a different trouble on centos/redhat with vs table, and now use dnsrr. My trouble is that user namespace breaks the vs some how. My issue is with RH right now…no answer yet.

John

2 Likes

Hi.
I had recreated this problem on a single node swarm, without the host constraints, and with your dockerhub containers.

distribution: “Ubuntu 18.04.1 LTS”
docker engine: 18.06.1-ce

Then, I changed the endpoint_mode to dnsrr.

The timeout has not occurred after the 1024 or 2048 second sleeps. I’m letting the 4096 second sleep to finished out.

John

1 Like

Hello,

Thank you for reply. I am now testing dnsrr mode using the same services. Just like you said, the client is still connected after being idle for 2048 seconds.

I hope this could solve the problem with my complicated docker stack including Java and Postgresql. I’ll try to test it also.

Anyway, I still have no idea whether this timeout is a (customizable) feature or a bug. I wish someone tells me.

Thanks.

From the ipvs project’s ‘how-to’ pages, connection disruption by timeout is intended. ipvs assumes connections are stateless and short lived. Such helps meet high availability goals. Long held connections can exhaust resources, and don’t help high availability. ipvs cannot distinguish legitimate long held connections from mismanaged connections that were not explicitly closed.

Your connection pool library ought to be able to recover from connection loss, no matter how it was dropped (ipvs, maintenance, network failure, node failure, whatever)

To reduce the incidence of connection loss detected by your pool, there are two options:

  • use dnsrr for the database service to avoid ipvs’s connection timeouts, reducing the incident of connection loss. or,
  • continue with ipvs, and use a pool option such the connections are dropped by the pool after some shorter idle time.

The length of the timeout is not material…connection drops are inevitable.

2 Likes

Hello,

Thank you very much for your kind explanation.

I tested the connection between Java application and Postgres with dnsrr setting last few days. It seems to work so well.

The biggest problem that used to make me mad has gone :smile:

Thank you!