Description
I have a docker-compose file for zookeeper, it runs ok in the machine of s17
, s18
, s19
but it doesn’t work in the machine of s27
, s52
, s53
because of network problem
here is the ip of services
bash-4.4# nslookup zoo1
nslookup: can't resolve '(null)': Name does not resolve
Name: zoo1
Address 1: 10.0.0.5
bash-4.4# nslookup tasks.zoo1
nslookup: can't resolve '(null)': Name does not resolve
Name: tasks.zoo1
Address 1: 10.0.0.6 0971a1b2357a
bash-4.4#
bash-4.4# nslookup zoo2
nslookup: can't resolve '(null)': Name does not resolve
Name: zoo2
Address 1: 10.0.0.7
bash-4.4# nslookup tasks.zoo2
nslookup: can't resolve '(null)': Name does not resolve
Name: tasks.zoo2
Address 1: 10.0.0.8 zookeeper_zoo2.1.ai49thvfj522s7h9057u2umq5.zookeeper_default
bash-4.4#
bash-4.4# nslookup zoo3
nslookup: can't resolve '(null)': Name does not resolve
Name: zoo3
Address 1: 10.0.0.9
bash-4.4# nslookup tasks.zoo3
nslookup: can't resolve '(null)': Name does not resolve
Name: tasks.zoo3
Address 1: 10.0.0.10 zookeeper_zoo3.1.2ub2pz1daxbkr56jmmj2ju7ts.zookeeper_default
10.0.0.5
is the vip of servicezoo1
,10.0.0.6
is the real ip of the container behind the servicezoo1
10.0.0.7
is the vip of servicezoo2
,10.0.0.8
is the real ip of the container behind the servicezoo2
10.0.0.9
is the vip of servicezoo3
,10.0.0.10
is the real ip of the container behind the servicezoo3
here is the logs of the container behind the service zoo3
[zhujipeng@s53 ~]$ sudo docker logs ddbcd737511f | tail -n 30
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
2018-03-13 02:56:18,186 [myid:3] - WARN [/0.0.0.0:3888:QuorumCnxManager@461] - Exception reading or writing challenge: java.io.EOFException
2018-03-13 02:56:21,640 [myid:3] - INFO [/0.0.0.0:3888:QuorumCnxManager$Listener@743] - Received connection request /10.0.0.10:37186
2018-03-13 02:56:21,641 [myid:3] - WARN [/0.0.0.0:3888:QuorumCnxManager@461] - Exception reading or writing challenge: java.io.EOFException
2018-03-13 02:56:43,852 [myid:3] - WARN [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumCnxManager@584] - Cannot open channel to 1 at election address zoo1/10.0.0.5:3888
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:610)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:845)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:957)
2018-03-13 02:56:43,853 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer$QuorumServer@184] - Resolved hostname: zoo1 to address: zoo1/10.0.0.5
2018-03-13 02:56:48,858 [myid:3] - WARN [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumCnxManager@584] - Cannot open channel to 2 at election address zoo2/10.0.0.7:3888
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:610)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:845)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:957)
2018-03-13 02:56:48,859 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer$QuorumServer@184] - Resolved hostname: zoo2 to address: zoo2/10.0.0.7
2018-03-13 02:56:48,860 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@854] - Notification time out: 60000
the container of service zoo3
can’t access the port 3888 of service zoo1
, zoo2
here is the network status of containers
the container of service zoo1
bash-4.4# ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:0A:00:00:06
inet addr:10.0.0.6 Bcast:10.0.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:10 errors:0 dropped:0 overruns:0 frame:0
TX packets:50 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:420 (420.0 B) TX bytes:3380 (3.3 KiB)
eth1 Link encap:Ethernet HWaddr 02:42:AC:12:00:03
inet addr:172.18.0.3 Bcast:172.18.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:794 (794.0 B) TX bytes:497 (497.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:97 errors:0 dropped:0 overruns:0 frame:0
TX packets:97 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:6248 (6.1 KiB) TX bytes:6248 (6.1 KiB)
bash-4.4# netstat -tlpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:3888 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:40944 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.11:36659 0.0.0.0:* LISTEN -
bash-4.4# nc -z -v -w 3 10.0.0.5:3888
10.0.0.7:3888 (10.0.0.7:3888) open
bash-4.4# nc -z -v -w 3 10.0.0.6:3888
10.0.0.8:3888 (10.0.0.8:3888) open
bash-4.4# nc -z -v -w 3 10.0.0.7:3888
nc: 10.0.0.7:3888 (10.0.0.7:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.8:3888
nc: 10.0.0.8:3888 (10.0.0.8:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.9:3888
nc: 10.0.0.9:3888 (10.0.0.9:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.10:3888
nc: 10.0.0.10:3888 (10.0.0.10:3888): Operation timed out
the container of service zoo2
bash-4.4# ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:0A:00:00:08
inet addr:10.0.0.8 Bcast:10.0.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:16 errors:0 dropped:0 overruns:0 frame:0
TX packets:88 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:672 (672.0 B) TX bytes:6000 (5.8 KiB)
eth1 Link encap:Ethernet HWaddr 02:42:AC:12:00:03
inet addr:172.18.0.3 Bcast:172.18.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3 errors:0 dropped:0 overruns:0 frame:0
TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:223 (223.0 B) TX bytes:148 (148.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:34 errors:0 dropped:0 overruns:0 frame:0
TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:2095 (2.0 KiB) TX bytes:2095 (2.0 KiB)
bash-4.4# netstat -tlpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.11:34171 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:33384 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:3888 0.0.0.0:* LISTEN -
bash-4.4# nc -z -v -w 3 10.0.0.5:3888
nc: 10.0.0.5:3888 (10.0.0.5:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.6:3888
nc: 10.0.0.6:3888 (10.0.0.6:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.7:3888
10.0.0.7:3888 (10.0.0.7:3888) open
bash-4.4# nc -z -v -w 3 10.0.0.8:3888
10.0.0.8:3888 (10.0.0.8:3888) open
bash-4.4# nc -z -v -w 3 10.0.0.9:3888
nc: 10.0.0.9:3888 (10.0.0.9:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.10:3888
nc: 10.0.0.10:3888 (10.0.0.10:3888): Operation timed out
the container of service zoo3
bash-4.4# ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:0A:00:00:0A
inet addr:10.0.0.10 Bcast:10.0.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:19 errors:0 dropped:0 overruns:0 frame:0
TX packets:110 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:798 (798.0 B) TX bytes:7532 (7.3 KiB)
eth1 Link encap:Ethernet HWaddr 02:42:AC:12:00:03
inet addr:172.18.0.3 Bcast:172.18.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:42 errors:0 dropped:0 overruns:0 frame:0
TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:2520 (2.4 KiB) TX bytes:2520 (2.4 KiB)
bash-4.4# netstat -tlpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:43588 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:3888 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.11:44435 0.0.0.0:* LISTEN -
bash-4.4# nc -z -v -w 3 10.0.0.5:3888
nc: 10.0.0.5:3888 (10.0.0.5:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.6:3888
nc: 10.0.0.6:3888 (10.0.0.6:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.7:3888
nc: 10.0.0.7:3888 (10.0.0.7:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.8:3888
nc: 10.0.0.8:3888 (10.0.0.8:3888): Operation timed out
bash-4.4# nc -z -v -w 3 10.0.0.9:3888
10.0.0.9:3888 (10.0.0.9:3888) open
bash-4.4# nc -z -v -w 3 10.0.0.10:3888
10.0.0.10:3888 (10.0.0.10:3888) open
the port 3888 is listening in all the container but can’t be access by another container and can access by self
here is the compose file
version: '3.3'
services:
zoo1:
image: zookeeper
deploy:
placement:
constraints:
- node.hostname==${MY_MASTER_HOSTNAME}
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
volumes:
- zoo_data:/data
- zoo_logs:/datalog
zoo2:
image: zookeeper
deploy:
placement:
constraints:
- node.hostname==${MY_SLAVE1_HOSTANME}
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
volumes:
- zoo_data:/data
- zoo_logs:/datalog
zoo3:
image: zookeeper
deploy:
placement:
constraints:
- node.hostname==${MY_SLAVE2_HOSTANME}
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
environment:
ZOO_MY_ID: 3
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
volumes:
- zoo_data:/data
- zoo_logs:/datalog
volumes:
zoo_data:
driver: local
zoo_logs:
driver: local
this compose file run ok in the machine of s17
, s18
, s19
but it doesn’t work in the machine s27
, s52
, s53
- and i see a warning in
s27
withdocker info
WARNING: bridge-nf-call-ip6tables is disabled
but s52
, s53
doesn’t have the warning
- and i see a warning in
s19
withdocker info
WARNING: bridge-nf-call-ip6tables is disabled
but s17
, s18
doesn’t have the warning
but when i expose port 2888, 3888, it runs ok in the machine s27, s52, s53
version: '3.3'
services:
zoo1:
image: zookeeper
deploy:
placement:
constraints:
- node.hostname==${MY_MASTER_HOSTNAME}
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=${MY_SLAVE1_HOSTANME}:2888:3888 server.3=${MY_SLAVE2_HOSTANME}:2888:3888
volumes:
- zoo_data:/data
- zoo_logs:/datalog
zoo2:
image: zookeeper
deploy:
placement:
constraints:
- node.hostname==${MY_SLAVE1_HOSTANME}
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=${MY_MASTER_HOSTNAME}:2888:3888 server.2=0.0.0.0:2888:3888 server.3=${MY_SLAVE2_HOSTANME}:2888:3888
volumes:
- zoo_data:/data
- zoo_logs:/datalog
zoo3:
image: zookeeper
deploy:
placement:
constraints:
- node.hostname==${MY_SLAVE2_HOSTANME}
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
environment:
ZOO_MY_ID: 3
ZOO_SERVERS: server.1=${MY_MASTER_HOSTNAME}:2888:3888 server.2=${MY_SLAVE1_HOSTANME}:2888:3888 server.3=0.0.0.0:2888:3888
volumes:
- zoo_data:/data
- zoo_logs:/datalog
volumes:
zoo_data:
driver: local
zoo_logs:
driver: local
Output of docker version
:
[zhujipeng@s27 volume]$ sudo docker version
Client:
Version: 17.12.1-ce
API version: 1.35
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:15:20 2018
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.1-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:17:54 2018
OS/Arch: linux/amd64
Experimental: false
Output of docker info
:
[zhujipeng@s27 volume]$ sudo docker info
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 3
Server Version: 17.12.1-ce
Storage Driver: devicemapper
Pool Name: docker-8:1-135990890-pool
Pool Blocksize: 65.54kB
Base Device Size: 10.74GB
Backing Filesystem: xfs
Udev Sync Supported: true
Data file: /dev/loop0
Metadata file: /dev/loop1
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Data Space Used: 1.028GB
Data Space Total: 107.4GB
Data Space Available: 14.28GB
Metadata Space Used: 1.438MB
Metadata Space Total: 2.147GB
Metadata Space Available: 2.146GB
Thin Pool Minimum Free Space: 10.74GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: rgqy6joqk9a0gpof105c9v59l
Is Manager: true
ClusterID: 5u5jbd9r1cyn94jf0ci7nq9vc
Managers: 1
Nodes: 3
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-514.26.2.el7.v7.4.qihoo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 29.45GiB
ID: RYT6:PETV:OY2B:QVG7:DPEE:LJUU:NNVX:7PD6:WDGN:LBYF:CXOS:4G3Q
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
registry.my.com:5000
127.0.0.0/8
Registry Mirrors:
https://registry.docker-cn.com/
Live Restore Enabled: false
WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
WARNING: bridge-nf-call-ip6tables is disabled
Additional environment details (AWS, VirtualBox, physical, etc.):
here is the network status
and i find that the container can ping with each other in the machine of s17, s18, s19
but the container can’t ping with each other in the machine of s27, s52, s53
but the container can ping the vip in the machine of s27, s52, s53
but the port can’t be access by the vip in the machine of s27, s52, s53
and my kafka cluster encounter the same problem in the machine of s27, s52, s53
and @
thaJeztah
reply me that may be because i use a non-standard kernel
the issue in github is here