Help with Docker Swarm and Nginx Configuration

Hello, I am having network issues within my docker swarm.

In my stack, I have an nginx container to handle SSL certificate verification and a backend container that hosts my Laravel PHP application. When I just run the containers on my manager node, I am able to access the application just fine with no errors. However, as soon as I scale up my service, the containers successfully start running on the worker node, but when I try to access the app from my browser, I start getting 504 Bad Gateway Errors.

My Docker Swarm Setup:

  • Two CentOS9 nodes - one manager and one worker node
  • Both my machines are in the same network and can communicate just fine.
  • I have also disabled the firewalls on both machines to rule out any port issues for internode communication

Flow of Traffic: Web Traffic → Manager Node Entrypoint → Nginx Container on Manger Node → Backend Container

This has led me to believe the issue may be with my Nginx configuration, but I specify the service name for the backend container and I thought that the Swarm Load Balancer would take care of routing the traffic across the different machines. Source.

I just did the basic Docker Swarm setup and didn’t see any other configurations I needed to make to get the multi-node setup working. Any insight or help would be appreciated.

Here are my files below:

docker-compose.yml

version: "3.8"
services:
  app:
    image: dev/my-app:prod
    volumes:
      - ./app-logs:/var/www/storage/logs
    networks:
      - k-overlay
    deploy:
      replicas: 2
      restart_policy:
        condition: on-failure
    configs:
      - source: app_config
        target: /var/www/.env
    secrets:
      - app-key
      - azure-client-secret
      - db-name

  web:
    image: dev/my-nginx-container:prod
    ports:
      - "80:80"
      - "443:443"
    depends_on:
      - app
    volumes:
      - ./web-logs:/var/log/nginx/
    networks:
      - k-overlay
    deploy:
      replicas: 2
      restart_policy:
        condition: on-failure
    configs:
      - source: app_config
        target: /var/www/.env
      - source: nginx_config
        target: /etc/nginx/conf.d/default.conf
    secrets:
      - source: dhparam
        target: /etc/nginx/ssl/dhparam.pem
      - source: ssl-bundle
        target: /etc/nginx/ssl/ssl-bundle.pem
      - source: ssl-key
        target: /etc/nginx/ssl/ssl.key

networks:
  k-overlay:
    driver: overlay

configs:
  app_config:
    external: true
  nginx_config:
    external: true

secrets:
  app-key:
    external: true
  azure-client-secret:
    external: true
  db-name:
    external: true

(More secrets not listed)

nginx_config:

# https redirect
server {
  listen 80 default_server;
  server_name myapp.company.com;

 return 301 https://$host$request_uri;
}


server {
  listen 443 ssl;
  listen [::]:443 ssl;

  server_name myapp.company.com;

  ssl_certificate /etc/nginx/ssl/ssl-bundle.pem;
  ssl_certificate_key /etc/nginx/ssl/ssl.key;
  ssl_dhparam /etc/nginx/ssl/dhparam.pem;

  ssl_protocols TLSv1.2 TLSv1.3;
  ssl_prefer_server_ciphers on;
  ssl_ciphers EECDH+AESGCM:EDH+AESGCM;
  ssl_ecdh_curve secp384r1;
  ssl_session_cache shared:SSL:10m;
  ssl_session_tickets off;

  #resolver 8.8.8.8.8.8.4.4 valid=300s;
  resolver 127.0.0.11 ipv6=off valid=10s;
  resolver_timeout 5s;

  add_header Strict-Transport-Security "max-age=63072000; includeSubdomains";
  add_header X-Frame-Options DENY;
  add_header X-Content-Type-Options nosniff;
  add_header X-XSS-Protection "1; mode=block";

  client_max_body_size 50M;

  access_log /var/log/nginx/access.log;
  error_log /var/log/nginx/error.log;

  index index.php index.html;
  root /var/www/public;
  location ~ \.php$ {
    try_files $uri =404;
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    fastcgi_pass app:9000;
    fastcgi_index index.php;
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_param PATH_INFO $fastcgi_path_info;
  }

  location / {
    try_files $uri $uri/ /index.php?$query_string;
    gzip_static on;
  }
}

If you manually setup the proxy config, then nginx needs to be in Docker, target should be the service name, then Docker DNS does round-robin.

To use a reverse proxy in Docker Swarm, I do recommend Traefik, as that is the only proxy AFAIK that supports configuration discovery out-of-the-box, you just add labels to new services and Traefik will handle the rest. See simple Traefik example and Traefik Swarm config.

I have my nginx container set up as a docker service. It is the one labeled “web”. I just have a custom image, but the base is nginx:latest.

I also have the “target” set to be the service name “app” in the nginx config if that’s what you mean.

Docker is able to resolve the virtual IP of the “app” service but then it cannot send traffic to the worker node for some reason. Why would nginx need to support configuration discovery if Docker DNS handles the load balancing aspect? I also saw that traefik can only be run on the manager node, but I want to be able to run my nginx containers on both nodes in case one goes down.

Of course you can set it up manually.

To debug, go into nginx container and try to ping app.

The system needs to use the internal Docker network resolver (DNS), maybe this line breaks it resolver 127.0.0.11, try removing the line.

Traefik configuration discovery only works on managers. You can just promote the worker to manager. For Doxker Swarm HA you should have 3 managers.

That IS actually the ip of the internal resolver :slight_smile:

If traffic on an overlay networks does not work, It’s usually one of those:

  • firewall prevents the communication
  • nodes have different mtu sizes
  • docker runs in a vm on esxi with nsx. nsx is known to prevent overlay communication.

Though, I am surprised the forum search did not yield any results on this, as it’s a reoccurring problem that has been asked and answered a couple of times.

Make sure to assign app to a variable and use the variable with fastcgi_pass instead to make sure you don’t suffer dns-caching of the resolved container ip.

@meyay thank you so much for pointing out the issues with VmWare.

I am running a VM on esxi but I don’t think I’m running NSX. I tried following this guide to change the data path port, but this did not work for me.

Ultimately, the issues ended up being with a CentOS incompatibility with Docker Swarm. The guide I followed was here. For some reason, the source node will drop UDP packets when checksum offloading is enable on CentOS machines. Essentially, I had to turn off checksum offloading on my CentOS machine and the internode communication started working!

1 Like