Query: Network communication among dockers is forbiden by linux server

When I deploy 82 docker containers on a 32 CPU server through docker-compose, TCP communication is forbidden. When I print docker logs, the following error occurs.

However I deploy 82 docker containers on my PC in the same way, these containers can run and communicate together.

This is my docker-compose.yaml

networks:
    eth0:
        driver: bridge
        ipam:
            config:
                - aux_addresses:
                    node0: 172.20.1.44
                    node1: 172.20.1.69
                    node2: 172.20.1.33
                    node3: 172.20.1.29
                    node4: 172.20.1.41
                    node5: 172.20.1.38
                    node6: 172.20.1.18
                    node7: 172.20.1.42
                    node8: 172.20.1.35
                    node9: 172.20.1.39
                    node10: 172.20.1.28
                    node11: 172.20.1.80
                    node12: 172.20.1.11
                    node13: 172.20.1.70
                    node14: 172.20.1.36
                    node15: 172.20.1.45
                    node16: 172.20.1.22
                    node17: 172.20.1.73
                    node18: 172.20.1.32
                    node19: 172.20.1.64
                    node20: 172.20.1.71
                    node21: 172.20.1.53
                    node22: 172.20.1.3
                    node23: 172.20.1.4
                    node24: 172.20.1.57
                    node25: 172.20.1.81
                    node26: 172.20.1.82
                    node27: 172.20.1.46
                    node28: 172.20.1.15
                    node29: 172.20.1.47
                    node30: 172.20.1.60
                    node31: 172.20.1.54
                    node32: 172.20.1.7
                    node33: 172.20.1.8
                    node34: 172.20.1.40
                    node35: 172.20.1.58
                    node36: 172.20.1.12
                    node37: 172.20.1.55
                    node38: 172.20.1.56
                    node39: 172.20.1.83
                    node40: 172.20.1.74
                    node41: 172.20.1.13
                    node42: 172.20.1.37
                    node43: 172.20.1.16
                    node44: 172.20.1.49
                    node45: 172.20.1.30
                    node46: 172.20.1.23
                    node47: 172.20.1.75
                    node48: 172.20.1.24
                    node49: 172.20.1.65
                    node50: 172.20.1.5
                    node51: 172.20.1.31
                    node52: 172.20.1.34
                    node53: 172.20.1.6
                    node54: 172.20.1.9
                    node55: 172.20.1.77
                    node56: 172.20.1.78
                    node57: 172.20.1.25
                    node58: 172.20.1.2
                    node59: 172.20.1.61
                    node60: 172.20.1.72
                    node61: 172.20.1.59
                    node62: 172.20.1.66
                    node63: 172.20.1.50
                    node64: 172.20.1.79
                    node65: 172.20.1.10
                    node66: 172.20.1.14
                    node67: 172.20.1.27
                    node68: 172.20.1.62
                    node69: 172.20.1.19
                    node70: 172.20.1.43
                    node71: 172.20.1.67
                    node72: 172.20.1.20
                    node73: 172.20.1.68
                    node74: 172.20.1.51
                    node75: 172.20.1.21
                    node76: 172.20.1.17
                    node77: 172.20.1.63
                    node78: 172.20.1.26
                    node79: 172.20.1.48
                    node80: 172.20.1.52
                    node81: 172.20.1.76
                  gateway: 172.20.1.1
                  ip_range: 172.20.1.0/24
                  subnet: 172.20.1.0/24
        name: net1
services:
    node0:
        container_name: node0
        entrypoint: ./bin/serv -test.run TestServer -test.v
        expose:
            - "6550"
        hostname: node0
        image: myimage:0.1
        networks:
            - eth0
        ports:
            - 8070:6550
    node1:
        container_name: node1
        entrypoint: ./bin/serv -test.run TestServer -test.v
        expose:
            - "6550"
        hostname: node1
        image: myimage:0.1
        networks:
            - eth0
        ports: []
    node2:
        container_name: node2
        entrypoint: ./bin/serv -test.run TestServer -test.v
        expose:
            - "6550"
        hostname: node2
        image: myimage:0.1
        networks:
            - eth0
        ports: []
    node3:
        container_name: node3
        entrypoint: ./bin/serv -test.run TestServer -test.v
        expose:
            - "6550"
        hostname: node3
        image: myimage:0.1
        networks:
            - eth0
        ports: []
    node4:
        container_name: node4
        entrypoint: ./bin/serv -test.run TestServer -test.v
        expose:
            - "6550"
        hostname: node4
        image: myimage:0.1
        networks:
            - eth0
        ports: []
    node5:
        container_name: node5
        entrypoint: ./bin/serv -test.run TestServer -test.v
        expose:
            - "6550"
        hostname: node5
        image: myimage:0.1
        networks:
            - eth0
        ports: []

Just to be sure, the iprange in your “eth0” network is not used by your lan network, and is not reachable from a route, correct?

Please share the outputs of:

ip addr show scope global
ip route

The aux_addresses are used to inform the bridge’s ipam that those ip’s are already used outside the ipam’s reach. What sense does it make to add those aux_addresses in a docker bridge network? Do you assume they can be used to assign fixed ips to your service containers?

You need to use ipv4_address in the network section of your service to assign a fixed ipv4 ip:

services:
    node0:
        container_name: node0
        entrypoint: ./bin/serv -test.run TestServer -test.v
        expose:
            - "6550"
        hostname: node0
        image: myimage:0.1
        networks:
            eth0:
              ipv4_address: 10.5.0.5
        ports:
            - 8070:6550

This is the execution results of ‘ip addr show scope global’

This is the execution results of ‘ip route’
image

Hello, meyay. I deleted the ipam’s configuration, but containers still have the following errors.

This is my docker-compose.yaml file network information:

networks:
eth0:
driver: bridge
name: net1

Something is not adding up here!

You don’t have a bridge for the 172.18.0.0/24 subnet, and no routing to it, but still your container are attached to the network? This doesn’t make sense. Something must be missing in your output.

We only see the user defined bridge for the 172.20.1.0/24 subnet and of course a routing to the subnet.

Please stop sharing screenshots of text content, share them as text in a preformatted text block instead (use three backticks in the line before and after the text). Screenshots are making it harder to read the text.

However, when I create a network with subnet 172.20.1.0/24, the containers still have the same problem.

  • The results of “ip route” in my host is:
    default via 10.0.2.1 dev enp0s3 proto static metric 100
    10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.4 metric 100
    169.254.0.0/16 dev enp0s3 scope link metric 1000
    172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
    172.20.1.0/24 dev br-11bdb7531dfa proto kernel scope link src 172.20.1.1
  • the logs in the container node0 are:
    === RUN TestServer
    Server (BLFBRB) will be started at 0.0.0.0:6550…
    2023/09/11 02:43:50 Start other process of node
    2023/09/11 02:43:50 Start BLFBRB broadcast function
    2023/09/11 02:43:50 Start MsgBroadcast Channel
    2023/09/11 02:43:50 Start MsgEntrance Channel
    2023/09/11 02:43:50 Start resolveMsg Function
    2023/09/11 02:44:29 Start 0th testing process!
    2023/09/11 02:44:29 Post “http://node18:6550/rec”: dial tcp: lookup node18 on 127.0.0.11:53: write udp 127.0.0.1:59578->127.0.0.11:53: write: invalid argument
    2023/09/11 02:44:29 Post “http://node17:6550/rec”: dial tcp: lookup node17 on 127.0.0.11:53: no such host
    2023/09/11 02:44:29 Post “": dial tcp: lookup node11 on 127.0.0.11:53: write udp 127.0.0.1:51532->127.0.0.11:53: write: invalid argument
    2023/09/11 02:44:29 Post "
    ”: dial tcp: lookup node19 on 127.0.0.11:53: write udp 127.0.0.1:48174->127.0.0.11:53: write: invalid argument
    2023/09/11 02:44:29 Post “": dial tcp: lookup node12 on 127.0.0.11:53: write udp 127.0.0.1:42614->127.0.0.11:53: write: invalid argument
    2023/09/11 02:44:29 Post "
    ": dial tcp: lookup node20 on 127.0.0.11:53: write udp 127.0.0.1:52528->127.0.0.11:53: write: invalid argument
    2023/09/11 02:44:29 Post "
    ": dial tcp: lookup node22 on 127.0.0.11:53: write udp 127.0.0.1:51287->127.0.0.11:53: write: invalid argument
    2023/09/11 02:44:29 Post "
    ": dial tcp: lookup node2 on 127.0.0.11:53: write udp 127.0.0.1:38090->127.0.0.11:53: write: invalid argument
    2023/09/11 02:44:29 Post “****”: dial tcp: lookup node55 on 127.0.0.11:53: write udp 127.0.0.1:35670->127.0.0.11:53: write: invalid argument
    2023/09/11 02:44:29 Post "
    ": dial tcp: lookup node14 on 127.0.0.11:53: write udp 127.0.0.1:57459->127.0.0.11:53: write: invalid argument
    2023/09/11 02:44:29 Post “****”: dial tcp: lookup node24 on 127.0.0.11:53: write udp 127.0.0.1:39099->127.0.0.11:53: write: invalid argument
    2023/09/11 02:44:29 Post "
    ": dial tcp: lookup node49 on 127.0.0.11:53: write udp 127.0.0.1:44555->127.0.0.11:53: write: invalid argument
    2023/09/11 02:44:59 Post “****”: dial tcp 172.20.1.46:6550: i/o timeout
    2023/09/11 02:44:59 Post "
    ": dial tcp 172.20.1.36:6550: i/o timeout
    2023/09/11 02:44:59 Post “****”: dial tcp 172.20.1.32:6550: i/o timeout
    2023/09/11 02:44:59 Post "
    ": dial tcp 172.20.1.35:6550: i/o timeout
    2023/09/11 02:44:59 Post "
    ": dial tcp 172.20.1.60:6550: i/o timeout
    2023/09/11 02:44:59 Post "
    ": dial tcp 172.20.1.21:6550: i/o timeout
    2023/09/11 02:44:59 Post "
    ": dial tcp 172.20.1.13:6550: i/o timeout
    2023/09/11 02:44:59 Post "
    ****”: dial tcp 172.20.1.12:6550: i/o timeout

Remark: “*****” is http post instructions.

  • docker network information:
    NETWORK ID NAME DRIVER SCOPE
    230251a5412c bridge bridge local
    6227197835c0 host host local
    11bdb7531dfa net1 bridge local
    84bede4ed15b none null local
  • ip addr show scope global
    2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:9b:30:f8 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.4/24 brd 10.0.2.255 scope global noprefixroute enp0s3
    valid_lft forever preferred_lft forever
    3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:37:11:17:03 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
    valid_lft forever preferred_lft forever
    497: br-11bdb7531dfa: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:50:c8:1c:cd brd ff:ff:ff:ff:ff:ff
    inet 172.20.1.1/24 brd 172.20.1.255 scope global br-11bdb7531dfa
    valid_lft forever preferred_lft forever

I haven’t noticed it before, because I thought I saw the ip of the docker network internal dns resolver, though on a closer look, it appears that the host system is configured to use a dns resolver that is only accessible by the host, but not the containers.

Usually this is either the systemd-resolved stub resolver or dnsmasq. If this is the case, you could deactivate them or make sure they bind a host ip that is accessible from the host and containers, like the ip of the docker0 network interface.

If you google for it, you should find plenty of blog posts and articles that show how to deactivate the systemd-resolved stub resolver, or how to configure dnsmasq to bind it to a specific ip.