Various problems, but I think it relates to either DNS or Docker sock being inaccessible

(To begin with, I apologise for the lack of links, but new members can only have two links in their posts, so I’ve had to remove a heap of them)

Hi there. I’ve recently installed Debian 12 onto an Apple Mac Mini (I’m not running MacOS at all), and have installed Docker onto it. I’m doing everything through the CLI (controlling via SSH in VSCode on my iMac). I’m not using Docker Desktop at all.

I have a few dozen containers running. Most are working fine. However, as I work my way through Jim’s Garage’s Homelab playlist on YouTube, some of the services I’m trying to set up simply aren’t working. I’ve followed the steps in these videos exactly, so I’m assuming the problem is me/my system, not the services.

I realise this isn’t the place for support for individual services, but the problems I’m encountering don’t seem to be documented anywhere else, so I’m wondering if my issues are being caused by some Docker-specific configuration (or an issue with the Docker host computer).

Specifically, I’m having problems with:

  • Homepage, and getting it to retrieve data from the Traefik API. I have requested support from the Homepage developer here. When I posted about this same issue in the Traefik support forum, someone suggested posting about it here.
    To summarise: Homepage is meant to get data from Traefik and display it on a webpage. I can load the Traefik API directly in my web browser (on my iMac), and I can ping the Docker host’s external IP address from within the Homepage container. However, I cannot curl the Traefik API from within the Homepage container.
    curl https://traefik-dashboard.mydomain.com/api/overview
    curl: (6) Could not resolve host: traefik-dashboard.mydomain.com
    I also noted these lines in the Homepage log file (as seen in Portainer):
[2024-07-16T01:59:01.109Z] error: <httpProxy> Error calling https://traefik-dashboard.mydomain.com/api/overview...
[2024-07-16T01:59:01.110Z] error: <httpProxy> [
  500,
  Error: getaddrinfo ENOTFOUND traefik-dashboard.mydomain.com
      at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:118:26) {
    errno: -3008,
    code: 'ENOTFOUND',
    syscall: 'getaddrinfo',
    hostname: 'traefik-dashboard.mydomain.com'
  }"
  • Using Authentik (SSO) to log into my Portainer installation. I was able to get it to work just fine when I had Portainer set up with IPaddress:port, but once I set up an FQDN for it, Authentik stopped working (despite configuring Authentik to suit the FQDN). Ultimately, when I go to log into Portainer, if I click the “OAuth” button, all I get is an “Unauthorized” error, with no useful information provided. There’s nothing useful in the logs (for either Portainer or Authentik).
  • Netbird (self-hosted VPN) using Authentik for SSO. Again, I followed their official documentation exactly. However, when I reached step 5 of their installation guide, I had to run a script that was meant to access the Authentik API and it was then meant to generate a docker-compose file for me:
./configure.sh 
using provided server's public IP
loading OpenID configuration from https://authentik.mydomain.com/application/o/netbird/.well-known/openid-configuration to the openid-configuration.json file
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: authentik.mydomain.com

Again, I can access that API (https://authentik.mydomain.com/application/o/netbird/.well-known/openid-configuration) directly in a web browser from my iMac without any issues.

It seems to me that these issues have one thing in common: DNS.
So I’m wondering if - for some reason - these problems are being caused by an inability of the various Docker containers/services to resolve the FQDNs.

cat /etc/resolv.conf 
# Generated by NetworkManager
nameserver 192.168.4.1

…I don’t really understand if that’s the right thing or not. My entire network (UniFi) gateway (a UniFi Dream Machine SE) is at 192.168.1.1. My Docker host (the Mac Mini) is in a VLAN 192.168.4.*, and its IP is 192.168.4.7.

It may/may not be relevant, but I’ve also got Pi-Hole installed (in another Docker container), acting as a DNS. That’s where I’ve got all my custom DNS records for my various services. That appears to be working fine, as far as I can tell. That Pi-Hole instance is accessible via 192.168.4.7:500.

As a test, I just added the Docker host’s IP to that file:

cat /etc/resolv.conf 
# Generated by NetworkManager
nameserver 192.168.4.1
nameserver 192.168.4.7

…and then I retried the Netbird script, and got the same error: curl: (6) Could not resolve host: authentik.davesservers.com, so this doesn’t seem to have fixed things. (I didn’t reboot or do anything after changing /etc/resolv.conf… should I have?)

Also, these containers are all on the same network, called “proxy”.

docker network inspect proxy 
[
    {
        "Name": "proxy",
        "Id": "b7c9bbbe3655e262291575826d2915eb398fa11a540ac649f8eb32b8800b1afc",
        "Created": "2024-07-13T17:13:53.904712745+10:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.22.0.0/16",
                    "Gateway": "172.22.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
.... (list of containers here, which definitely includes Authentik, Traefik, and Homepage)

Sorry for the lengthy post but hopefully someone out there knows what I’m doing wrong.

Thank you in advance for any ideas anyone may have.

Since you know that the DNS server cannot resolve the host, you could test if the DNS server that is supposed to resolve it is accessible or not. If there is nslookup in the container, try

nslookup authentik.davesservers.com 192.168.4.1
nslookup authentik.davesservers.com 192.168.4.7

My first gues would be a firewall issue. It can happen when you want to access a container on the same host but with using a hostname that points to the host’s DNS. Since you mentioned you could ping the host, if the pined IP was exactly the IP to which the used hostname pointed, it is possible that Docker’s internal DNS server cannot forward the request to the external DNS servers. How you should solve it, I’m not sure, but you can use the netshoot image to debug network issues

You can also use it to use nslookup if it is not present in your original containers, just attach the proxy network to the nethsoot container or use the original container’s network namespace to use exactly the same network (even localhost)

docker run --rm -it --network container:CONTAINER_NAME nicolaka/netshoot

Hi @rimelek and thank you for taking the time to reply.

nslookup authentik.davesservers.com 192.168.4.1
nslookup authentik.davesservers.com 192.168.4.7

Thanks for that suggestion. I actually tried nslookup a few days ago, but forgot to include it in my post, so I have manually installed it in the Homepage container…

I’ve tried a few things:

docker exec -it homepage /bin/sh

nslookup authentik.davesservers.com
Server:         127.0.0.11
Address:        127.0.0.11:53

** server can't find authentik.davesservers.com: NXDOMAIN
** server can't find authentik.davesservers.com: NXDOMAIN
nslookup authentik.davesservers.com 192.168.1.1
Server:         192.168.1.1
Address:        192.168.1.1:53

** server can't find authentik.davesservers.com: NXDOMAIN
** server can't find authentik.davesservers.com: NXDOMAIN
nslookup authentik.davesservers.com 192.168.4.1
Server:         192.168.4.1
Address:        192.168.4.1:53

** server can't find authentik.davesservers.com: NXDOMAIN
** server can't find authentik.davesservers.com: NXDOMAIN

Those above three attempts were almost instant. The next one took several seconds:

nslookup authentik.davesservers.com 192.168.4.7
;; connection timed out; no servers could be reached

you can use the netshoot image to debug network issues

Sorry for the silly question, but do you have any suggestions as to how, exactly? I have been able to run netshoot with the proxy network: docker run -it --net proxy nicolaka/netshoot, and ran all of the commands that are documented in that link, but honestly… I have no idea what any of the output means.

For instance:

fortio load https://homepage.davesservers.com
11:21:56.810 r1 [INF] scli.go:125> Starting, command="Φορτίο", version="1.63.7 h1:S6e+z36nV6o8RYQSUI9EWYxhCoPJy4VdAB2HQROUqMg= go1.22.2 amd64 linux", go-max-procs=6
Fortio 1.63.7 running at 8 queries per second, 6->6 procs, for 5s: https://homepage.davesservers.com
11:21:56.811 r1 [INF] httprunner.go:121> Starting http test, run=0, url="https://homepage.davesservers.com", threads=4, qps="8.0", warmup="parallel", conn-reuse=""
11:21:56.819 r1 [ERR] network.go:430> Unable to lookup "homepage.davesservers.com": lookup homepage.davesservers.com on 127.0.0.11:53: no such host
Aborting because of lookup homepage.davesservers.com on 127.0.0.11:53: no such host

There was similar output for fortio load https://traefik.davesservers.com.

Am I right in understanding this means that it’s using 127.0.0.11:53 as its DNS, rather than 192.168.4.7:500 (Pi-Hole)?

Assuming that line of thought is correct, I went down the rabbit hole of trying to change the DNS for Docker.
I did the following:

  1. On the Mac (Docker host) itself, I edited /etc/resolv.conf, and added in 192.168.4.7. I then rebooted and noted it had removed that reference in /etc/resolv.conf. I gathered from Googling that some other service must be updating that file on boot. I then went into the Debian desktop UI (GNOME? It’s been over a decade since I’ve used a Linux UI, so I assume it’s GNOME), and opened the “Settings” program, and from there I went to network settings, and manually specified the DNS as 192.168.4.7, and again rebooted. That was successful; the correct IP was now in /etc/resolv.conf.
  2. Within the homepage container’s shell, I noted the contents of /etc/resolv.conf:
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

nameserver 127.0.0.11
options ndots:0

# Based on host file: '/etc/resolv.conf' (internal resolver)
# ExtServers: [192.168.4.7 192.168.4.1]
# Overrides: []
# Option ndots from: internal
  1. I have previously tried specifying the DNS in the Homepage docker-compose.yml file:
services:
  homepage:
    dns: 192.168.4.7
    image: ghcr.io/gethomepage/homepage:latest
    container_name: homepage
    restart: unless-stopped
    ...

…but that doesn’t seem to change anything.

  1. Found this, and created /etc/docker/daemon.json:
{
    "dns": ["192.168.4.7", "192.168.1.1", "8.8.8.8"]
}

followed by sudo service docker restart
…nup. Still no luck.

  1. Tried this:
docker run --dns 192.168.4.7 busybox nslookup https://traefik.davesservers.com
;; connection timed out; no servers could be reached

…this is creating an entirely new container, manually specifying the correct DNS, and then trying. As far as I can tell, this should just work.

  1. Similar to #5:
docker run --net proxy busybox nslookup https://traefik.davesservers.com
Server:         127.0.0.11
Address:        127.0.0.11:53

** server can't find https://traefik.davesservers.com: NXDOMAIN
** server can't find https://traefik.davesservers.com: NXDOMAIN
  1. Interestingly, the same problem occurs when using the “host” network…
docker run --net host busybox nslookup https://traefik.davesservers.com
Server:         192.168.4.7
Address:        192.168.4.7:53

** server can't find https://traefik.davesservers.com: NXDOMAIN
** server can't find https://traefik.davesservers.com: NXDOMAIN
docker run --net host busybox nslookup https://frigate.davesservers.com
Server:         192.168.4.7
Address:        192.168.4.7:53

** server can't find https://frigate.davesservers.com: NXDOMAIN
** server can't find https://frigate.davesservers.com: NXDOMAIN
docker run --dns 192.168.4.7 --net host busybox nslookup https://traefik.davesservers.com
Server:         192.168.4.7
Address:        192.168.4.7:53

** server can't find https://traefik.davesservers.com: NXDOMAIN
** server can't find https://traefik.davesservers.com: NXDOMAIN

It was at this point I realised I had been making a monumental mistake, but correcting it had the same result, anyway…

docker run --dns 192.168.4.7 --net host busybox nslookup https://traefik-dashboard.davesservers.com
Server:         192.168.4.7
Address:        192.168.4.7:53

** server can't find https://traefik-dashboard.davesservers.com: NXDOMAIN
** server can't find https://traefik-dashboard.davesservers.com: NXDOMAIN

(The correct URL I should’ve been using is https://traefik-dashboard.davesservers.com, not https://traefik.davesservers.com)

In any case…

docker exec -it homepage /bin/sh
nslookup https://traefik-dashboard.davesservers.com
Server:         127.0.0.11
Address:        127.0.0.11:53

** server can't find https://traefik-dashboard.davesservers.com: NXDOMAIN
** server can't find https://traefik-dashboard.davesservers.com: NXDOMAIN

nslookup https://traefik-dashboard.davesservers.com 192.168.4.7
;; connection timed out; no servers could be reached

…that’s me out of ideas. I would appreciate any further suggestions you or anyone else reading this may have.

Thanks again for your time.

User defined networks use the built-in dns resolver on 127.0.0.11, it is used for dns-based service discovery.
Whatever is configured in the host’s `/etc/resolv.conf´ file, is used as upstream of the builtin dns resolver

Can you explain why any service should be able to use pihole, if its listening on port 500 instead of 53?

It isn’t. The web interface for Pi-Hole is on port 500, but the actual TCP/UDP is done on port 53, as default. Here’s part of my Pi-Hole docker compose file:

services:
  pihole:
    container_name: pihole
    image: pihole/pihole:latest
    ports:
      - "53:53/tcp"
      - "53:53/udp"
      - "67:67/udp"
      - "500:80/tcp"

That said, I went into Portainer > Networks > Proxy, and noted the IP address for the Pi-Hole container was 172.22.0.14. I realised that this is the IP address for Pi-Hole as far as that network is concerned, and so that’s the IP address I should be using in the Homepage docker-compose file:

services:
  homepage:
    image: ghcr.io/gethomepage/homepage:latest
    container_name: homepage
    restart: unless-stopped
    dns: 172.22.0.14

I then restarted the container and tried the nslookup idea again from within the Homepage container:

docker exec -it homepage /bin/sh
nslookup traefik-dashboard.davesservers.com
Server:         127.0.0.11
Address:        127.0.0.11:53

traefik-dashboard.davesservers.com      canonical name = docker.davesservers.com
Name:   docker.davesservers.com
Address: 192.168.4.7
traefik-dashboard.davesservers.com      canonical name = docker.davesservers.com

That appears to be getting somewhere. I then updated the Pi-Hole compose file to include:

services:
  pihole:
    container_name: pihole
    image: pihole/pihole:latest
    ports:
      - "53:53/tcp"
      - "53:53/udp"
      - "67:67/udp"
      - "500:80/tcp"
    networks:
      pihole_internal:
        ipv4_address: 172.70.9.3
      proxy:
        ipv4_address: 172.22.0.14

(so that the IP address 172.22.0.14 would persist a system or container restart)

I then checked the various symptoms I listed in the OP:

  • Homepage is still unable to access the Traefik API:
curl https://traefik-dashboard.mydomain.com/api/overview
curl: (6) Could not resolve host: traefik-dashboard.mydomain.com
  • Logging into Portainer with Authentik still fails in the same manner.
  • I tried the Netbird configure script and that worked! It generated the docker-compose file for me as expected, I could then run that, but then when trying to log into it with Authentik, it gives a very unhelpful and vague 404 error (in the browser). In the logs seen in Portainer (for the “management” service), I note: lookup authentik.davesservers.com on 127.0.0.11:53: no such host

…so, a little progress, but certainly no solution as yet…

Ok, figured out the Traefik API issue for Homepage.
After having set the static IP for Pi-Hole as per my last comment, I could then use the FQDN in the labels section of the Traefik compose file:

      - homepage.widget.url=https://traefik-dashboard.davesservers.com

That then works: I’m now able to see all of the Traefik API data being displayed in Homepage.

One problem down, two to go. :slight_smile:

I’ve solved this second (of three) issues.

It turns out I hadn’t added the Launch URL for the application created in Authentik. That prevented it from working.

I’ll try to figure out the third issue tomorrow.

…getting there. :slight_smile: