IPVLAN L3 driver, forwarding to external DNS resolvers not working

Trying to get the IPVLAN L3 mode driver working in my test Docker environment. Striking out pretty hard, and I have no idea why. (appolgies, YAML seems to be messing w/the reddit editor)

TL;DR: My Ubuntu docker host has known-good working DNS configuration, and if I do an nslookup (without specifying a server) inside a bridge network container it works fine, but if I do the same thing from within a container configured with the IPVLAN L3 driver the DNS query fails, but all other network traffic inside that container works fine and itā€™s reachable/routable from other subnets.

Environment info:

Ubuntu 22.04 inside a ESXi 7 VM
Docker Host IP: 192.168.200.12
My internal DNS server IP: 192.168.100.2
IPVLAN L3 subnet: 192.168.201.0/24
Docker v25.0.5
Portainer v2.19.4 (I'm using Stacks inside of Portainer for container builds)
network:
  ethernets:
    ens160:
      addresses:
      - 192.168.200.12/24
      nameservers:
        addresses:
        - 192.168.100.2
        - 2.2.2.2
        search:
    - home.myfqdn.com
      routes:
      - to: default
        via: 192.168.200.1
  version: 2

I created an IPVLAN L3 network on the Docker host:

 docker network create -d IPVLAN_L3_201 \
--subnet=192.168.201.0/24 \
-o ipvlan_mode=l3 ens160

I have a Ubiquiti UDM-Pro w/a static route for 192.168.201.0/24 pointing at 192.168.200.12:

If I SSH to the docker host and test networking, specifically NSLOOKUP, everything works.

Docker Host test results - Ping, nslookup against system default, nslookup w/manualy specified DNS server

areaman@dockerhost:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 
64 bytes from 8.8.8.8: icmp_seq=1 ttl=57 time=52.7 ms 
64 bytes from 8.8.8.8: icmp_seq=2 ttl=57 time=43.2 ms 
64 bytes from 8.8.8.8: icmp_seq=3 ttl=57 time=38.6 ms 
^C 
--- 8.8.8.8 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms 
rtt min/avg/max/mdev = 38.635/44.828/52.666/5.844 ms

areaman@dockerhost:~$ nslookup google.com Server:         
127.0.0.53 Address:        127.0.0.53#53
Non-authoritative answer: 
Name:   google.com Address: 142.250.191.110 
Name:   google.com Address: 2607:f8b0:4009:803::200e

areaman@dockerhost~$ nslookup google.com 192.168.100.2 
Server:         192.168.100.2 
Address:        192.168.100.2#53

Non-authoritative answer: 
Name:   google.com Address: 142.250.191.110
Name:   google.com Address: 2607:f8b0:4009:803::200e

No issues here. I see both DNS queries hitting my DNS server w/o issue. DNS works as Iā€™d expect.

So, now letā€™s go create a container using the IPVLAN L3 driver.
nginx container YAML:

version: "3"
services:
  nginx_test:
    container_name: nginx_test
    image: linuxserver/nginx:latest
    networks:
      IPVLAN_L3_201:
        ipv4_address: 192.168.201.118

networks:
  IPVLAN_L3_201:
    external: true

Container starts up just fine, and if I hop inside of it to run the same tests as we did against the host:
Ping, nslookup against system default, nslookup w/manualy specified DNS server

root@20e97f08c7db:/# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 
56 data bytes 
64 bytes from 8.8.8.8: seq=0 ttl=113 time=39.784 ms 
64 bytes from 8.8.8.8: seq=1 ttl=113 time=42.886 ms 
64 bytes from 8.8.8.8: seq=2 ttl=113 time=44.664 ms 
^C 
--- 8.8.8.8 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 39.784/42.444/44.664 ms 
root@20e97f08c7db:/# nslookup google.com 
Server:         127.0.0.11 
Address:        127.0.0.11:53
** server can't find google.com: SERVFAIL
** server can't find google.com: SERVFAIL

root@20e97f08c7db:/# nslookup google.com 192.168.100.2 
Server:         192.168.100.2 
Address:        192.168.100.2:53
Non-authoritative answer: Name:   google.com Address: 142.250.191.110
Non-authoritative answer: Name:   google.com Address: 2607:f8b0:4009:803::200e

You can see SERVFAIL messages when the nslookup command is run against the system default DNS. This, to me at least, represents the driver failing to redirect the DNS query to the system DNS. Is that correct?

Also, to make sure I wasnā€™t crazy, I tested pinging into the container from another subnet that required routing to reach the docker VM (192.168.100.0/24 subnet). This works fine, I get a reply like Iā€™d expect.

So, to see if this was specific to the ipvlan l3 driver mode, I created another container using the same image but using the default bridge networking:

version: "3"
services:
  nginx_test:
    container_name: nginx_bridge
    image: linuxserver/nginx:latest

This container also starts up just fine, and if I hop inside of it to run the same tests we did inside the IPVLAN L3 dirver container (previous tests) it works fine.
Once again weā€™re testing by doing a ping to 8.8.8.8, an nslookup against system default, nslookup w/manualy specified DNS server

root@90b63fdd7260:/# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes 
64 bytes from 8.8.8.8: seq=0 ttl=56 time=43.751 ms 
64 bytes from 8.8.8.8: seq=1 ttl=56 time=42.334 ms 
64 bytes from 8.8.8.8: seq=2 ttl=56 time=39.053 ms 
^C 
--- 8.8.8.8 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss 
round-trip min/avg/max = 39.053/41.712/43.751 ms 

root@90b63fdd7260:/# nslookup google.com*
Server:         127.0.0.11 
Address:        127.0.0.11:53

Non-authoritative answer: 
Name:   google.com 
Address: 142.250.191.110
Non-authoritative answer: 
Name:   google.com 
Address: 2607:f8b0:4009:80b::200e

root@90b63fdd7260:/# nslookup google.com 192.168.100.2 
Server:         192.168.100.2 
Address:        192.168.100.2:53

Non-authoritative answer: 
Name:   google.com 
Address: 142.250.191.110
Non-authoritative answer: 
Name:   google.com 
Address: 2607:f8b0:4009:80b::200e

So, as expected, you can see the non-authoritative answer coming back from my internal DNS server. I can also see that query hit my internal DNS, so I know itā€™s not using some other DNS server on the internet. The ping and external nslookup (with a specified server) also work fine.

Iā€™m baffled. I didnā€™t see anything in the documentation for the IPVLAN L3 driver that explains this or hints at something I missed configuring.

What am I missing here? I know the upstream networking/routing/dns is fine.

I really donā€™t want to abandon the IPVLAN L3 driver (unless we know of a bug or something). I have other things going on in my network that make each container being route-able hugely useful and dramatically simplified some other aspects of my lab.

Any insight is appreciated, thanks!

Just to be sure: you configured your host to use the dns stub resolver, which runs in the network namespace of your host, and has an ip only known to your host.

Then you run a container attached the ipvlan l3, which uses its own network namespace, and expect it to reach the dns stub resolver from the host network namespace?

From my point of view this is not a bug: docker behaves in an expected way. Your host dns configuration is just not compatible with it. Deactivate the systemd dns stub resolver and configure a resolver in your /etc/resolv.conf reachable from your lan, and it will work.

What youā€™re saying makes sense, but I couldnā€™t find jack explaining it. Itā€™s the reason I used the bridge network driver to do some testing, to eliminate issues outside of the network driver in docker. Coming from a traditional virtualization-centric background, I too would assume containers would behave the way youā€™re describing, but if that were the case Iā€™d expect multiple examples and documentation discussing this, but thereā€™s nothing specific to L3 that I can find.

I read this: How Docker Desktop Networking Works Under the Hood | Docker and Iā€™ve read the entire docker documentation on the network drivers and Iā€™m still uncertain as to what should be happening.

The documentation explains that DNS queries are redirected out of he container via the networking driver to, essentially, be proxied by the docker host, and then uses the DNS configured on the host to go do a query. No where in the configuration of the container can I find a requirement to specify a DNS server. I have specified DNS servers and it still doesnā€™t work when using the L3 driver mode. However, none of the documentation makes any differentiation between the network modes when discussing the behavior of docker networking when handling DNS queries.

If you look at lots of the various tutorials and blogs on using the L3 driver mode, none of them seem to require additional configuration beyond what Iā€™ve done for things to work. Additionally, Iā€™ve watch a few tutorials where people go through the entire process w/o any issue.

Hello,

Same issue here with Debian 12 after I upgraded yesterday. This was working before the upgrade. Now, the external FQDNā€™s are not reachable anymore. The internal Docker DNS resolver only is working with IPVLAN L3. External DNS requests will not be forwarded out of the Docker network anymore.

Here a busybox example:

/ # cat /etc/resolv.conf
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

nameserver 127.0.0.11
search lan.mydomain.ch
options ndots:0

# Based on host file: '/etc/resolv.conf' (internal resolver)
# ExtServers: [212.25.1.1 212.25.3.3 172.16.100.3]
# Overrides: []
# Option ndots from: internal

/ # ping zabbix-agent
PING zabbix-agent (172.16.101.20): 56 data bytes
64 bytes from 172.16.101.20: seq=0 ttl=64 time=0.074 ms
64 bytes from 172.16.101.20: seq=1 ttl=64 time=0.211 ms
^C
--- zabbix-agent ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.074/0.142/0.211 ms
/ # ping www.google.com
ping: bad address 'www.google.com'

/ # nslookup www.google.om
Server:		127.0.0.11
Address:	127.0.0.11:53

** server can't find www.google.om: SERVFAIL

** server can't find www.google.om: SERVFAIL


I can change the DNS servers in the container from 127.0.0.11 to an other one, than external resources will be reachable but internalā€™s not.

/ # cat /etc/resolv.conf
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

nameserver 172.16.100.3
search lan.mdomain.com
options ndots:0

# Based on host file: '/etc/resolv.conf' (internal resolver)
# ExtServers: [212.25.1.1 212.25.3.3 172.16.100.3]
# Overrides: []
# Option ndots from: internal

/ # nslookup www.google.com
Server:		172.16.100.3
Address:	172.16.100.3:53

Non-authoritative answer:
Name:	www.google.com
Address: 2a00:1450:400a:808::2004

Non-authoritative answer:
Name:	www.google.com
Address: 142.250.203.100

/ # ping www.google.com
PING www.google.com (142.250.203.100): 56 data bytes
64 bytes from 142.250.203.100: seq=0 ttl=118 time=2.092 ms
64 bytes from 142.250.203.100: seq=1 ttl=118 time=2.530 ms
^C
--- www.google.com ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 2.092/2.311/2.530 ms
/ # ping zabbix-agent
ping: bad address 'zabbix-agent'
/ # nslookup zabbix-agent
Server:		172.16.100.3
Address:	172.16.100.3:53

** server can't find zabbix-agent.lan.josoko.ch: NXDOMAIN

** server can't find zabbix-agent.lan.josoko.ch: NXDOMAIN

This was working before this version:

docker info
Client: Docker Engine - Community
 Version:    26.0.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.13.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.25.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 8
  Running: 4
  Paused: 0
  Stopped: 4
 Images: 8
 Server Version: 26.0.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.1.0-18-amd64
 Operating System: Debian GNU/Linux 12 (bookworm)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 31.24GiB
 Name: geko
 ID: 31466756-f345-41ef-8e38-4c1e517fd994
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

The network inspect:

docker network inspect 60_zbx_net_backend
[
    {
        "Name": "60_zbx_net_backend",
        "Id": "aef0aeb73b06c5c0a3125c420288b937580a25d1fed7d21fae96f6b883617aa4",
        "Created": "2024-03-30T00:18:26.205715177+01:00",
        "Scope": "local",
        "Driver": "ipvlan",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.16.101.16/28"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "025b19d33527aca787065de649f6cd0b9b9dc3eb08505ad2c39d3a523e85764e": {
                "Name": "60-zabbix-web-nginx-pgsql-1",
                "EndpointID": "9963cd234c39156675ff5880f82c834d0706311c32ec0ad570b719243d5ca16c",
                "MacAddress": "",
                "IPv4Address": "172.16.101.18/28",
                "IPv6Address": ""
            },
            "63f894fae9efb4f7d62204824716df66a8162797b62506dd53ad6dab676039c4": {
                "Name": "60-zabbix-server-1",
                "EndpointID": "0b48d688304826978b7193d9c27cad7eb5752994dc79edc0f76a45b4d39b4101",
                "MacAddress": "",
                "IPv4Address": "172.16.101.19/28",
                "IPv6Address": ""
            },
            "bdd102c370cc414e507176c7205ac4e1b3b4864b300ec5ac9ce17533b54074cc": {
                "Name": "60-postgres-server-1",
                "EndpointID": "b7f62cf53ec0b77f09153c80cf93b0a84e29d85657751abdf10edd471189b5d6",
                "MacAddress": "",
                "IPv4Address": "172.16.101.21/28",
                "IPv6Address": ""
            },
            "e1a99d4846a4f6199382a4ae1b80d09b30bace78ae8d48afed7762b28176a32c": {
                "Name": "60-zabbix-agent-1",
                "EndpointID": "397fca45e45bbd96a83010ce599f5833a90874c3702145dbd9fcce7c99172e3b",
                "MacAddress": "",
                "IPv4Address": "172.16.101.20/28",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.enable_ipv6": "false",
            "ipvlan_mode": "l3",
            "parent": "eno1"
        },
        "Labels": {
            "com.docker.compose.network": "zbx_net_backend",
            "com.docker.compose.project": "60",
            "com.docker.compose.version": "2.25.0"
        }
    }
]

The nativ network interface:

ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 94:c6:91:18:80:59 brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
    inet 172.16.100.66/24 brd 172.16.100.255 scope global eno1
       valid_lft forever preferred_lft forever
    inet6 fe80::96c6:91ff:fe18:8059/64 scope link
       valid_lft forever preferred_lft forever
3: wlp58s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 40:a3:cc:77:37:51 brd ff:ff:ff:ff:ff:ff
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:d6:d8:42:58 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
24: br-f226a0c18786: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:ff:a3:5f:16 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.1/16 brd 172.19.255.255 scope global br-f226a0c18786
       valid_lft forever preferred_lft forever
    inet6 fe80::42:ffff:fea3:5f16/64 scope link
       valid_lft forever preferred_lft forever

Iā€™m 99.99% certain this is a bug, or a feature change that I canā€™t find in the documentation.

I got fed up and decided to take my existing host, which at this point was a mess, and did a docker downgrade to v5.24.0.9. After that downgrade, I redeployed a test container and nslookup worked without any issue. I was floored.

I had to verify this was Docker version related, so I nuked everything, went back to scratch. I also took this chance to build a whole new Debian-based VM, rather than Ubuntu (for other reasons I wonā€™t get into). I did a CLI only Debian install, and got the networking configured and everything else working as I needed it.

Same as before, the docker host VM had no problems w/DNS or routing to the internet.

I took a file-system snapshot in VMware prior to even adding the docker repo in to be able to pull it via APT.

I began the Docker install, following the Documentation specified process. Since Iā€™m intending to manage this with Portainer + Stacks, I made sure to specify the Docker version of v5:25.0.5 for my install.

I did the docker install, created the ipvlan l3 mode network, then created a nginx container using that network. Ping to 8.8.8.8 worked fine, but once again I couldnā€™t get ā€˜nslookup google.comā€™ to work inside the nginx container. I even tried another image (Alpine) just to rule out my image as a problem source.

Then I went and reverted to my pre-docker install snapshot.

This time, I decided to drop the version back to v5:24.0.9. I did the install, created the L3 network, created the nginx container, hoped in that container and verified ping to 8.8.8.8 and then did an ā€˜nslookup google.comā€™:

root@087a60a93173:/# nslookup google.com
Server:         127.0.0.11
Address:        127.0.0.11:53

Non-authoritative answer:
Name:   google.com
Address: 172.217.5.14

No problems, everything works exactly like Iā€™d expect.

The only thing I know for sure, is that my initial host that I built was done in late January. I pulled, at the time, the newest version of docker in Apt, I believe v5:25.0.2, but I canā€™t say with certainty. The host I had built blew up when I did an apt-get update/upgrade (and forgot to snapshot the VM first) on Mar 26th. I quickly discovered the bug in Portainer where it wonā€™t play nice with v5.26 of Docker, so I rolled the install back to 25.0.5. I should have noted the version I was on, but I didnā€™t, so I just figured Iā€™d go back to the newest, non-26 build.

But yeah, Iā€™m pretty confident this is a bug somewhere that showed up somewhere between 5.25.0.3 and 0.5. Iā€™ve already sunk too much time into this, Iā€™m not spending anymore trying to find the specific version that caused the issue.

Time to hard-code my repos to not pull anything past the v5.24.x train and get my snapshots in VMware automated for the new VM I built.

It seems to be a change in the DNS behavior due to a vulnerability:

A workaround has been described here:

Run containers intended to be solely attached to internal networks with a custom upstream address (ā€“dns argument to docker run, or API equivalent), which will force all upstream DNS queries to be resolved from the container network namespace.

So, reading through all the info over on the moby github it sure feels like this should work without issue unless you set the -o internal=false value , right? Did I mis-read the workaround info?

Anyways, I tied the work around, but no luck.

You had

"Internal": false,

In your inpsect, which, to me, implies that network driver should not have been preventing external DNS usage. Itā€™s not an internal only network or container, right?

I moved my VM to 25.0.5 and tried this all again. It still fails.

I added the -o=internal=false flag and I specified a parent interface in theipvlan l3 network I created.

[
    {
        "Name": "ipvlanl3",
        "Id": "6aa182eb5eae018d941abf52d423248f7ac5a3a15d54f32eecebfa2cd6bb3707",
        "Created": "2024-04-01T15:38:32.204692837Z",
        "Scope": "local",
        "Driver": "ipvlan",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "192.168.202.0/24"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "cbb566c9eba356084bd21edd95ffd00809e2518c8c8bf95713f84d78a37008db": {
                "Name": "serene_clarke",
                "EndpointID": "99df1e1871f7ea70f98ea0db829a90767baa0c55d2914f1465298a3809309640",
                "MacAddress": "",
                "IPv4Address": "192.168.202.99/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "internal": "false",
            "ipvlan_mode": "l3",
            "parent": "ens160"
        },
        "Labels": {}
    }
]

And this is the nginx container I spun up to rest DNS:

sudo docker run --net=ipvlanl3 --ip=192.168.202.99 --dns=192.168.100.2 -itd linuxserver/nginx /bin/sh

Iā€™ve also tried it w/o the --dns option defined. Neither will allow DNS quiries to work. All of the attempts have the same resolv.conf file in the container. If I manually edit the resolv.conf file inside the container, and give it my local DNS server (192.168.100.2) it works fine.

This feels like they made the change, but something got missed because, from where Iā€™m sitting, thereā€™s no way to get IPVLAN L3 to get DNS to work inside the container w/o doing something to avoid using the 127.0.0.11 DNS server that is defined in the resolv.conf file i.e. avoid relying on the network driver to redirect the DNS queries.

Also, just tried making a network ipvlan L3 driver with the -o internal=true flag set, and hten i created another container with the --dns=192.168.100.2 dns server value specified for the container. Still didnā€™t work.

I changed the /etc/resolv.conf file on the affected Docker container by adding my own DNS server (172.16.100.3):

cat /etc/resolv.conf 
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

nameserver 127.0.0.11
nameserver 172.16.100.3
search lan.mydomain.com
options ndots:0

# Based on host file: '/etc/resolv.conf' (internal resolver)
# ExtServers: [212.25.1.1 212.25.3.3 172.16.100.3]
# Overrides: []
# Option ndots from: internal

My apps are working again now, internal and external DNS resolving, just followed the order (1. Docker resolver, 2. My DNS server). But I donā€™t see the point of this restriction. In a L3 network every user should protect its network by using firewalls etc. anyway. Network protection shouldnā€™t be part of a Docker instance by default, if itā€™s unwanted. There is a reason why I want to use IPVLAN L3 instead of this ugly port forwarding towards the Docker apps. This new feature/bugfix whatever causes more issues than it benefits. Vulnerability prevention is OK, but not at these costs.

If that will not be reverted soon, I need to rewrite the Dockerfiles to add somehow these resolv.conf changes into the Docker containers.

I agree with you 100%

If your expectation of a feature is that it required upstream network routing to work, then leave the network routing and security to me, or at least give me the option.

I still think thereā€™s a bug here. The way I read the write up on this issue is that they see the fact that if a container is flagged as internal only, there was still a way to exfiltrate data outside of the container via DNS queries. Default behavior though, doesnā€™t have containers as being internal=false, so why is the daemon acting that way now?
Making this change to default behavior to address a sub 6 severity CVE feels like an overreach. And I say that as someone who works for an very large Infosec vendor that has a prolific research team. Weā€™re pretty paranoid, and the cloud archs I work with that I talked to about this, think this is sort of over the top.

The CVE score is sort low and maybe not severe enough to justify changing:

a) the default behavior of the daemon/driver
b) not have a very clear explanation of how to disable/work around this new security enhancement in the documentation.

Every tutorial and bit of documentation Iā€™ve read on the L3 mode assumes that DNS will just work like it does in bridge or other modes of operation. Things should have been updated before you make a change this consequential.

@areaman
May I ask you please to change the title of this topic from:
IPVLAN L3 mode driver confusion

to:
IPVLAN L3 driver, forwarding to external DNS resolvers not working

My account canā€™t seem to do that. I canā€™t edit my OP, but youā€™re right, it should be changed.

This is exactly the point.

And what happens now? One cannot even do an internal container apt upgrade without doing a DNS ā€œworkaroundā€. This prevents now from upgrading a container with a simple apt upgrade, which provides every day a bunch of vulnerability fixes.

Someone else posted this issue over on the Moby github: IPVLAN L3 is not forwarding non Docker DNS requests outside of the container anymore Ā· moby/moby Ā· Discussion #47655 Ā· GitHub

Iā€™m going to keep an eye on that, because I think, even if weā€™re wrong in terms of how this should work, the confusion is coming from the fact that no one, right now, is certain how things should even be working and how we should be updating out environments and composer to handle it.

1 Like

Hi all - itā€™s a bug ā€¦ weā€™ll track it as IPVLAN L3 is not forwarding non Docker DNS requests outside of the container anymore Ā· Issue #47662 Ā· moby/moby Ā· GitHub

2 Likes

Additional information for anyone relying on the resolv.conf workaround on existing Docker IPVLAN L3 containers.

These two files are overwritten by Docker in containers after every system reboot/stop/start/restart (bare metal or VM reboot and pure container stop/start/restart):

  • /etc/resolv.conf
  • /etc/hosts

This is by design an valid. Just keep that in mind until the the fix will be released.

The Docker moby team fixed it. Now it works in Docker 26.0.1. Tested some minutes ago. Thanks Docker moby Team!! And thx @areaman for opening the discussion.

Hello, I am facing a similar problem with a Rocky Linux host. I have created my post to mirror the OP values to make it simpler.

Docker package versions

docker-buildx-plugin.x86_64                                        0.14.0-1.el9                                     @docker-ce-stable
docker-ce.x86_64                                                   3:26.1.0-1.el9                                   @docker-ce-stable
docker-ce-cli.x86_64                                               1:26.1.0-1.el9                                   @docker-ce-stable
docker-ce-rootless-extras.x86_64                                   26.1.0-1.el9                                     @docker-ce-stable
docker-compose-plugin.x86_64                                       2.26.1-1.el9                                     @docker-ce-stable
containerd.io.x86_64                                         1.6.31-3.1.el9                                         @docker-ce-stable

ip route Rocky Linux Host

default via 192.168.200.1 dev ens160 proto dhcp src 192.168.200.12 metric 100
192.168.200.0/24 dev ens160 proto kernel scope link src 192.168.200.12 metric 100
172.12.0.0/16 dev docker0 proto kernel scope link src 172.12.0.1 linkdown
172.27.0.0/16 dev br-ef2143e7d302 proto kernel scope link src 172.27.0.1 linkdown

Rocky Linux host /etc/resolv.conf

search lan
nameserver 192.168.200.1

IPVLAN network create command
docker network create -d ipvlan --subnet 192.168.201.0/24 -o parent=ens160 -o ipvlan_mode=l3 IPVLAN_L3_201
Static route on OpenWRT for 192.168.201.0/24 pointing at 192.168.200.12
Everything works in the host as far as DNS is concerned. nslookup, pings, the works

Create a container with an IPVLAN L3 driver without specifying a DNS server
docker run -itd --rm --network IPVLAN_L3_201--ip 192.168.201.118 --name testIPVLAN busybox
cat /etc/resolv.conf inside container

nameserver 127.0.0.11
search lan
options ndots:0

# Based on host file: '/etc/resolv.conf' (internal resolver)
# ExtServers: [192.168.200.1]
# Overrides: [nameservers]
# Option ndots from: internal

Mirroring OP results. I can ping outside my network but DNS doesnā€™t work

/ # ping 9.9.9.9
PING 9.9.9.9 (9.9.9.9): 56 data bytes
64 bytes from 9.9.9.9: seq=0 ttl=59 time=10.582 ms
64 bytes from 9.9.9.9: seq=1 ttl=59 time=9.898 ms
64 bytes from 9.9.9.9: seq=2 ttl=59 time=10.196 ms
^C
--- 9.9.9.9 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 9.898/10.225/10.582 ms
/ # nslookup quad9.net
Server:         127.0.0.11
Address:        127.0.0.11:53

;; connection timed out; no servers could be reached

/ # nslookup quad9.net 9.9.9.9
Server:         9.9.9.9
Address:        9.9.9.9:53

Non-authoritative answer:
Name:   quad9.net
Address: 216.21.3.77

Non-authoritative answer:
Name:   quad9.net
Address: 2620:0:871:9000::77

The results mirror very closely OP results.
In contrast to OP though I get ;; connection timed out; no servers could be reached instead of SERVFAILā€™s.
I can also successfully ping 192.168.200.1 (docker host gateway) from inside the container.

Create a container with an IPVLAN L3 driver with a specified DNS server
docker run -itd --rm --network IPVLAN_L3_201--ip 192.168.201.118 --dns 9.9.9.9 --name testIPVLAN busybox

cat /etc/resolv.conf

nameserver 127.0.0.11
search lan
options ndots:0

# Based on host file: '/etc/resolv.conf' (internal resolver)
# ExtServers: [9.9.9.9]
# Overrides: [nameservers]
# Option ndots from: internal

DNS works

/ # ping 9.9.9.9
PING 9.9.9.9 (9.9.9.9): 56 data bytes
64 bytes from 9.9.9.9: seq=0 ttl=59 time=12.117 ms
64 bytes from 9.9.9.9: seq=1 ttl=59 time=12.013 ms
^C
--- 9.9.9.9 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 12.013/12.065/12.117 ms
/ # nslookup quad9.net
Server:         127.0.0.11
Address:        127.0.0.11:53

Non-authoritative answer:
Name:   quad9.net
Address: 2620:0:871:9000::77

Non-authoritative answer:
Name:   quad9.net
Address: 216.21.3.77

So if I have understood this correctly the workaround works now. I thought that when the fix arrives that --dns will not be necessary but maybe I have misunderstood.

I have a couple more questions:

  1. When we say version which version are we referring to? docker-ce and docker-ce-cli? What about containerd.io?
  2. Can someone explain in simple terms why the container canā€™t communicate with 192.168.200.1 DNS server?
  3. I canā€™t seem to ping 192.168.200.12 from inside the container. Is this normal?
  4. This is specific to my OS but could it be because of Rocky Linuxā€™s firewall? Should I open DNS port?
    Thank you for all your help in advance.