Service DNS resolution on SLES

Expected behavior

Container to be able to access each others services by hostname resolution

Actual behavior

Container can just access each others services by container IP

Additional Information

Hi all, we try to run Docker within AWS on SLES OS (AMI-ID: suse-sles-12-sp2-v20161214-hvm-ssd-x86_64 (ami-c425e4ab)) and we face some issue while trying to access certain container software which is running in an active swarm cluster, via hostname resolution.

However we tested the same on Ubuntu (AMI-ID:ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20170414 (ami-060cde69) aswell and received expected information.

Installed Docker version:

Docker version 1.12.6, build 78d1802
Client:
Version: 1.12.6
API version: 1.24
Go version: go1.6.1
Git commit: 78d1802
Built: Wed Feb 15 15:00:28 2017
OS/Arch: linux/amd64
Server:
Version: 1.12.6
API version: 1.24
Go version: go1.6.1
Git commit: 78d1802
Built: Wed Feb 15 15:00:28 2017
OS/Arch: linux/amd64

Besides we have upgraded the docker version to latest build on these images aswell but it didn’t fix the issue.

Steps to reproduce the behavior

    SWARM_TOKEN=$(docker run --rm swarm create)
docker-machine create --driver amazonec2 --amazonec2-ami ami-c425e4ab --amazonec2-instance-type t2.micro --amazonec2-region eu-central-1 --amazonec2-ssh-keypath key --amazonec2-security-group development --amazonec2-use-private-address --amazonec2-ssh-user ec2-user --swarm-discovery token://$SWARM_TOKEN New-swarmManager

docker-machine create --driver amazonec2 --amazonec2-ami ami-c425e4ab --amazonec2-instance-type t2.micro --amazonec2-region eu-central-1 --amazonec2-ssh-keypath key --amazonec2-security-group development --amazonec2-use-private-address --amazonec2-ssh-user ec2-user --swarm-discovery token://$SWARM_TOKEN New-node1

    MANAGER_IP=$(docker-machine ip New-swarmManager)
    eval $(docker-machine env New-swarmManager)
    docker swarm init --advertise-addr $MANAGER_IP
    docker network create -d overlay testNetwork
    docker swarm join-token worker >join.sh
    sed -i '1d' join.sh
    eval $(docker-machine env New-node1)
    sh join.sh

eval $(docker-machine env New-swarmManager)
docker service create --name redis --publish 6379:6379--network testNetwork redis:3.0.6
docker service create --name kibana --publish 5601:5601 --network testNetwork kibana

With this script we’ve setup 2 basic SLES machines (New-swarmManager, New-node1) for testing and configured a new swarm aligned with an overlay network.
We’ve installed redis and kibana to test the connectivity.

Kibana is up and running on the swarm manager, redis on node1

~ # docker service ls

 ID            NAME    REPLICAS  IMAGE        COMMAND
7k4qmnu1dq86  kibana  1/1       kibana
drdy220hfytk  redis   1/1       redis:3.0.6

~ # docker ps //swarm manager

CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS               NAMES
cfb14cdaf3d5        kibana:latest       "/docker-entrypoint.s"   About a minute ago   Up About a minute   5601/tcp            kibana.1.0uujtzsfwu8wlfex8rxgnauts

~ # docker ps //node1

CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS               NAMES
4576007fb109        redis:3.0.6         "/entrypoint.sh redis"   About a minute ago   Up About a minute   6379/tcp            redis.1.d03r8t2x25elfwode3n6uxncy

I try to access redis from kibana container:

~ # docker exec -it cfb14cdaf3d5 sh
  curl -i http://redis:6379
curl: (7) Failed to connect to redis port 6379: No route to host

now I do the same with the Container IP which i got with docker inspect

# curl -i http://10.0.0.3:6379
-ERR wrong number of arguments for 'get' command
-ERR unknown command 'User-Agent:'
-ERR unknown command 'Host:'
-ERR unknown command 'Accept:'

Now i atleast get error messages which i expected aswell by the hostname method.
Same i can do vice versa with redis container accessing kibana.
However once using ubuntu images i can access the services via hostname + IP.

So I hope someone can tell me which mistake I’m doing right here and might have some advice to fix the same.

Thank you very much in advance

Could you try upgrading to a more recent Docker version (eg. 17.03 or 17.06 release clients)?

Hi @friism, I’m also having the same issue, but I’m using Docker Enterprise SLES 2.4.0 (the latest one).

Is there a problem with SLES and routing requests inside the same overlay network?

Best regards,
M

If you’re a Docker EE customer, please open a support ticket

Hi @friism I can’t create a support ticket because I’m in the evaluation process :slight_smile:

My setup is:

1 Manager (SLES)
1 Worker (SLES)
1 Overlay network (traefik-net)
1 Traefik (v1.4.2)
Service Whoami

Since I want to deploy in my manager and worker, I’m running 2 replicas, and they are on both machines.

The problem is when Swarm Manager routes the requests to the Worker machine and I’m always getting timeout from my reverse proxy, because it doesn’t find the container.

And then I decided to use another approach, I just entered in my container (manager node) and ping the worker node by IPAddress to access it, however it didn’t work as well.

Do you have any clue about my issue?

@mayconbeserracomplev - Last I checked, SLES on AWS did not come with the HA extensions out of the box. That includes the IPVS module which I would guess is what is breaking the experience on SLES. You might be able to see errors in the daemon logs about being unable to load the kernel modules but if you do not have the cluster-network-kmp-default package installed, then that would explain why.

Further references:
https://www.suse.com/products/highavailability/
https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha/book_sleha.html

thanks for your answer, @mbentley

Is that only for SLES on AWS? And SLES on bare metal?

It applies everywhere.