Docker Community Forums

Share and learn in the Docker community.

Intermittent DNS lookup failure of other containers

I have a Node.js app running in a docker container set up using the following (simplified) Docker Compose configuration:

x-restart-policy: &restart_policy
  restart: unless-stopped
x-sails-app-defaults: &sails_defaults
  << : *restart_policy
  image: registry.example.com/group/webapp/snapshots
  depends_on:
    - postgres

services:

  postgres:
    << : *restart_policy
    image: postgis/postgis:11-3.1-alpine
    environment:
      POSTGRES_PASSWORD_FILE: /postgres-password
    volumes:
      - postgres:/var/lib/postgresql/data
      - /root/postgres-setup-password:/postgres-password:ro
      - ./postgres/setup.sh:/docker-entrypoint-initdb-resources/001-setup.sh:ro

  webapp:
    << : *sails_defaults
    volumes:
      - ./apps/webapp.sailsrc:/usr/src/app/.sailsrc:ro
    labels:
      - traefik.enable=true
      - traefik.http.routers.sbdev.entrypoints=https
      - traefik.http.routers.sbdev.rule=Host(`app.example.com`)
      - traefik.http.routers.sbdev.tls=true
      - traefik.http.routers.sbdev.tls.certresolver=letsencrypt

The Dockerfile for the app looks like this:

FROM node:fermium-alpine3.12 AS builderbase

# Need git and some other tools for npm install
RUN apk add --no-cache \
    git \
    python3 \
    make \
    openssh-client \
    g++

WORKDIR /usr/src/app

COPY package*.json ./

# Don't download Chromium for Puppeteer,
# since we will install it ourselves
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true

FROM builderbase AS builderdev

# Build development dependencies
RUN npm set progress=false && \
    npm config set depth 0 && \
    npm install

FROM node:fermium-alpine3.12 AS base

# Don't download Chromium for Puppeteer,
# since we will install it ourselves
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

# Install Chrome for Puppeteer.
# See https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#running-on-alpine
RUN apk add --no-cache \
    # Make sure version here is compatible with version of puppeteer
    chromium=86.0.4240.111-r0 \
    nss \
    freetype \
    freetype-dev \
    harfbuzz \
    ca-certificates \
    ttf-freefont

# Install tools we need
RUN apk add --no-cache \
    bash \
    jq \
    postgresql-client \
    su-exec \
    tini

COPY ./lib/docker/entrypoint.sh /docker-entrypoint.sh

ENTRYPOINT ["/sbin/tini", "--", "/docker-entrypoint.sh"]

EXPOSE 1337
WORKDIR /usr/src/app
COPY --chown=node . .

FROM base AS development

COPY --chown=node --from=builderdev /usr/src/app/node_modules node_modules

Every few minutes, I get errors like this in the log:

Exception: `getConnection` failed ("failed").  Could not acquire a connection to the database using the specified manager.
 Additional data:

 {
   error: Error: getaddrinfo ENOTFOUND postgres
       at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:67:26) {
     errno: -3008,
     code: 'ENOTFOUND',
     syscall: 'getaddrinfo',
     hostname: 'postgres'
   },
   meta: undefined
 }
     at flaverr (/usr/src/app/node_modules/flaverr/index.js:94:15)
     at Function.handlerCbs.<computed> [as failed] (/usr/src/app/node_modules/machine/lib/private/help-build-machine.js:879:31)
     at PendingItem.cb [as callback] (/usr/src/app/node_modules/machinepack-postgresql/machines/get-connection.js:76:22)
     at /usr/src/app/node_modules/pg-pool/index.js:237:23
     at Connection.connectingErrorHandler (/usr/src/app/node_modules/machinepack-postgresql/node_modules/pg/lib/client.js:213:14)
     at Connection.emit (events.js:315:20)
     at Connection.EventEmitter.emit (domain.js:467:12)
     at Socket.reportStreamError (/usr/src/app/node_modules/machinepack-postgresql/node_modules/pg/lib/connection.js:57:10)
     at Socket.emit (events.js:315:20)
     at Socket.EventEmitter.emit (domain.js:467:12)
     at emitErrorNT (internal/streams/destroy.js:106:8)
     at emitErrorCloseNT (internal/streams/destroy.js:74:3)
     at processTicksAndRejections (internal/process/task_queues.js:80:21)

Thinking that this could be a problem with Docker-Compose generating the container hostnames, I added hostname: postgres to the docker-compose.yml in the appropriate place, but that does not seem to have helped at all.

I have tried to see this failure directly by running docker-compose exec webapp node -pe 'require("dns").lookup("postgres",function(){console.dir(arguments)})', but it seems to be intermittent enough that I never manage to hit it.

I am having these problems in Docker 20.10 (I have tested in all patch releases .0-.3) running on Ubuntu 20.04. However, if I downgrade Docker to 19.03.15 (using apt install --allow-downgrades docker-ce=5:19.03.15~3-0~ubuntu-focal docker-ce-cli=5:19.03.15~3-0~ubuntu-focal), everything works perfectly.

I found this issue which has similar symptoms (though that one is in Rust, not Node.js), but their particular solution is language-specific so it is not generalizable to my situation.

What changed in 20.10 to break this, and how can I fix it?

(Note: StackOverflow crosspost)