Code review for this compose stack?

Looking for comments on how to improve on this compose file to fit tighter security and best practices.

The objective of this project is to be able to spawn “private” and anonymized search engines on a VPS. This is to allow searching without tracking or limits based on location or ISP.

The projects works well as is but there are 2 things I want to add to it in the future:

  • Make the service available through the Lokinet network (similar to tor in some ways)
  • Be able to deploy this on swarm (multiple app containers, several routes out)

I have no experience with swarm but I am weary that there may be some major differences in how the network/routing works.

Your compose file looks good to me and I have to say you use more “best practices” then I usually do :slight_smile: You have a non-root user, you use capabilities insead of privileged container, you have healthcheck, limits, variables and default values for variables. The compose file is readable and well-structured, so I really can’t tell you much more, although I am not so good at security if this is what you are intrested in.

“latest” tag is usually not recommended, but that is just a default value, so it could be okay.

Your readme file is clear too containing the variables, so well done.

1 Like

Thanks mate! I’m here to learn. :smile::

I run my daemon with user namepaces which mean that they aren’t able to do anything with the host. However, the vpn container needs to do stuff which requires more access, but I wanted to restrict that to an absolute minimum.

You are right about the tagging part, however in my private config since it’s pointing to my private repo/image it would only get updated when I push. I will admit that my development environment is very ghetto/barebones. I’m still getting my bearings about those Makefiles and eventually I’ll figure out a better way to do automated versioning.

What @rimelek said :slight_smile:

Some additional notes:

  • I would use the key/value approach for everything where it’s applicable, e.g. environments.
  • You could use a recent compose schema versions like 3.8 or 3.9 (to my knowledge this only added windows container specific config items). Though docker compose v2 will ignore the schema version and use the most recent version of the schema instead.
  • Configuration items, like restart, mem_limit and memswap_limit will not work with swarm, as they moved underneath the deploy item. They will be ignored when the stack is deployed with docker stack deploy. Afair, docker compose v2 is able to leverage the same deploy items, while docker compose v1 might not (at least old versions definitely don’t).
  • Are there really no container folders worth persisting as a volume?
  • Using cap_add and cap_drop instead of privileged mode, indeed is needed for swarm stack deployments, as it doesn’t support privileged containers at all.

Generally, you will want to comply to the compose specification.

1 Like

Hey @meyay! Thanks for your input, I definitely appreciate your comments on this my friend! :

I would use the key/value approach for everything where it’s applicable, e.g. environments.

I am not sure I follow on that point. Almost everything is set in environment variables /w defaults. Typically changes are made on the .env file. Am I missing something?

You could use a recent compose schema versions like 3.8 or 3.9 (to my knowledge this only added windows container specific config items). Though docker compose v2 will ignore the schema version and use the most recent version of the schema instead.

This is a linux specific project aimed at privacy; Windows is not an operating system a privacy conscious person would consider. However, there’s no harm in bumping version numbers as you said. I’ll bump up the version numbers!

Configuration items, like restart, mem_limit and memswap_limit will not work with swarm, as they moved underneath the deploy item. They will be ignored when the stack is deployed with docker stack deploy. Afair, docker compose v2 is able to leverage the same deploy items, while docker compose v1 might not (at least old versions definitely don’t).

Right! I don’t have a swarm live yet but one of the things I am particularly concerned about is the network stack. network_mode in particular. The app container gets routed through the vpn container. If I want replicas, of both the front end and the VPN, what’s the right way/best practice for doing that? I would like to expand and discuss more the compose to swarm migration - would you recommend opening another post specifically for that?

  • Are there really no container folders worth persisting as a volume?
    None at all. Whoogle is a privacy focused ephemeral front end for google search: aside from the css styling (which I use an env variable for) nothing else needs to be saved. The app container takes care of removing ads and cleaning up the junk from the searches while the VPN container makes sure that the queries get routed through a randomized IP.

This project in particular saves nothing at all. However, I am also working on an VPN routed invidious (youtube front end) stack which actually needs a postgres database to run. This makes networking a little strange (when it comes to networking and isolation) and I’m still figuring things out. I’ll make a separate post on that stack eventually.

@meyay probably mean’t that you are using this syntax with the env variables

    environment:
      # Credentials needed OR fail
      - PROTONVPN_USERNAME=${PROTONVPN_USERNAME:?error}
      - PROTONVPN_PASSWORD=${PROTONVPN_PASSWORD:?error}
      - PROTONVPN_TIER=${PROTONVPN_TIER:?error}

but you could define env variables as YAML key-value pairs:

    environment:
      # Credentials needed OR fail
      PROTONVPN_USERNAME: ${PROTONVPN_USERNAME:?error}
      PROTONVPN_PASSWORD: ${PROTONVPN_PASSWORD:?error}
      PROTONVPN_TIER: ${PROTONVPN_TIER:?error}

This way you can also do something like this if you ever need it:

x-tier: &tier ${PROTONVPN_TIER:?error}

services:
  protonvpn:
    environment:
      # ...
      PROTONVPN_TIER: *tier
  otherservice:
    environment:
       TIER: *tier

I don’t know why you would need it, but it can be useful sometimes. For example you can define the default value of the variables once if you need to use them multiple times. And this is just one YAML feature.

Or if you have a cli tool to parse yaml, you could get the variables like this:

docker compose config --format json | jq '.services.protonvpn.environment.PROTONVPN_TIER'

Again, you wouldn’t need it in this project, since you could just read the env file.

Update:

Okay, the second example works even if you are not using yaml key-value pairs in the environment section, since docker compose config will return them as actual key-vaue pairs instead of a list of strings…

1 Like
      - traefik.http.routers.${APP_SVC_NAME:-whoggle}.rule=Host(`${HOST_NAME:?error}`, `www.${HOST_NAME:?error}`)
      - traefik.http.routers.${APP_SVC_NAME:-whoggle}.entrypoints=websecure
      - traefik.http.routers.${APP_SVC_NAME:-whoggle}.tls.certresolver=myresolver
      - traefik.http.services.${APP_SVC_NAME:-whoggle}.loadbalancer.server.port=${APP_SVC_PORT:-5000}

This syntax seems interesting - Can I use it for the labels? APP_SVC_NAME is the repeating item here and one thing that needs to be modified if running several of those container stacks.

Indeed, I was referring to what @rimelek wrote: instead of using the array style, I prefer the map style for readability. Technical it is neither required, nor beneficial, unless you start using yaml anchors.

environment, labels and sysctls can be either array or map style.

I doubt that yaml aliases (that point to an anchor) can be rendered within a string.
An anchor can be a map, a list, or a scaler, and the alias will replace the anchor with exactly that value.

The yaml do not mention that yaml aliases can be rendered withing to be a yaml alias can. The map style allows convient merges:

x-default-envs: 
  &default-envs 
  key1: value1
  key2: value2
  key3: value3

services:
  test:
    environment:
      <<: *default-envs
      key1: x
      key4: value4

This would result in following environments:

I just used environments as an example.

Since Swarm deployment where within the scope of your first post, a following up on swarm topics is in scope of the topic. I don’t feel that we necessarily have to create a new topic for it.

Since you already use network_mode: service:protonvpn, it should work like this in swarm as well. Though setting the network_mode ultimately means the container can not be part of another docker network.

It really depends on what the application needs and supports. Just because container make it easy to run replicas,

What happens if a request uses a different replica each time? Will the frontend and vpn connection be able to cope with it? Do the replicas need to share state?

It can be hard, brittle or even impossible to run an application that is supposed to be operated as a highlander (as in “there can only be one”).

For this particular situation (whoogle), it is stateless. At it’s core it’s only querying a search engine and re-rendering the results. No dependencies so I could in theory have traefik load balance between instances. I haven’t tried it yet but it’s most certainly possible.

However I have another situation where I want to run an invidious instance (youtube front end) - this one depends on a postgres db and I am not sure about how it handles states. I wanted to separate networks so I have a backend network for the db which is mapped onto the vpn container (weird tbh but that seems to be the network_mode way). Resolving didn’t work so I had to do use static IP addressing for it to connect to the postgres instance. I am not sure if I am doing it correctly. It works but requires more tweaking and I’m not sure how to scale. It feels somewhat hacky.

# Invidious proxied through ProtonVPN

version: "3"
services:

  invidious_protonvpn:
    container_name: ${APP_SVC_NAME:-invidious}_protonvpn
    environment:
      # Credentials
      - PROTONVPN_USERNAME=${PROTONVPN_USERNAME:?error}
      - PROTONVPN_PASSWORD=${PROTONVPN_PASSWORD:?error}
      # Override these where applicable
      - PROTONVPN_CHECK_INTERVAL=${PROTONVPN_CHECK_INTERVAL:-5}
      - PROTONVPN_FAIL_THRESHOLD=${PROTONVPN_FAIL_THRESHOLD:-3}
      - PROTONVPN_SERVER=${PROTONVPN_SERVER:-US}
      - PROTONVPN_TIER=${PROTONVPN_TIER:?error}
      - PROTONVPN_EXCLUDE_CIDRS=${EXCLUDE_CIDR:-}
      - PROTONVPN_DNS_LEAK_PROTECT=1
    # Always use semver tags, avoid using tag latest!
    image: ${PROTONVPN_IMAGE:-ghcr.io/tprasadtp/protonvpn}:${PROTONVPN_VERSION:-latest}
    restart: always
    mem_limit: 50mb
    memswap_limit: 100mb
    security_opt:
      - no-new-privileges
    cap_drop:
      - ALL
    cap_add:
      - NET_ADMIN
    networks:
      - web
      - proxy
      - invidious_backend
    cap_add:
      - NET_ADMIN
    devices:
      - /dev/net/tun:/dev/net/tun
    labels:
      # Host name must be specified or fail
      - traefik.http.routers.${APP_SVC_NAME:-invidious}.rule=Host(`${HOST_NAME:?error}`, `www.${HOST_NAME:?error}`)
      - traefik.http.routers.${APP_SVC_NAME:-invidious}.entrypoints=websecure
      - traefik.http.routers.${APP_SVC_NAME:-invidious}.tls.certresolver=myresolver
      - traefik.http.services.${APP_SVC_NAME:-invidious}.loadbalancer.server.port=${APP_SVC_PORT:-3000}
      # Authelia Middleware for MFA
      - traefik.http.routers.${APP_SVC_NAME:-invidious}.middlewares=${AUTHELIA_MIDDLEWARE:-}
      - traefik.docker.network=web
      - traefik.enable=${TRAEFIK_ENABLED:-true}
      # Reconnect to fastest server every hour
      - chadburn.enabled=${CHADBURN_ENABLED:-true}
      - chadburn.job-exec.${APP_SVC_NAME:-invidious}_protonvpn.schedule=@hourly
      - chadburn.job-exec.${APP_SVC_NAME:-invidious}_protonvpn.command=protonvpn connect -f
      - chadburn.job-exec.${APP_SVC_NAME:-invidious}_protonvpn.container=${APP_SVC_NAME:-invidious}_protonvpn
      # Enable auto updates
      - com.centurylinklabs.watchtower.enable=${WATCHTOWER_ENABLED:-true}
    healthcheck:
      test: /usr/bin/healthcheck || exit 1
      interval: 5s
      timeout: 1.5s
      retries: 3
      start_period: 15s

  invidious:
    image: quay.io/invidious/invidious:${INVIDIOUS_VERSION:-latest}
    restart: unless-stopped
    container_name: ${APP_SVC_NAME:-invidious}
    environment:
      # Please read the following file for a comprehensive list of all available
      # configuration options and their associated syntax:
      # https://github.com/iv-org/invidious/blob/master/config/config.example.yml
      # Forcing static IP for the DB since resolving will not work
      INVIDIOUS_CONFIG: |
        db:
          dbname: ${DB_NAME:-invidious}
          user: ${DB_USER:?error}
          password: ${DB_PASS:?error}
          host: ${DB_IP:-10.202.37.2}
          port: 5432
        check_tables: true
    healthcheck:
      test:  wget -nv --tries=1 --spider http://127.0.0.1:3000/api/v1/search?q=`date +%A` || exit 1
      interval: 4s
      timeout: 1.5s
      retries: 5
    depends_on:
      invidious_postgres:
        condition: service_healthy
      invidious_protonvpn:
        condition: service_healthy
    pids_limit: 50
    mem_limit: 1024mb
    memswap_limit: 1024mb
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges
    network_mode: service:invidious_protonvpn
    labels:
      - autoheal_vpn=${AUTOHEAL_VPN:-true}


  invidious_postgres:
    image: docker.io/library/postgres:${PG_VERSION:-13}
    restart: unless-stopped
    container_name: ${APP_SVC_NAME:-invidious}_postgres
    mem_limit: 150mb
    memswap_limit: 150mb
    security_opt:
      - no-new-privileges
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - DAC_OVERRIDE
      - SETGID
      - SETUID

    volumes:
      - data:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=${DB_NAME:?error}
      - POSTGRES_USER=${DB_USER:?error}
      - POSTGRES_PASSWORD=${DB_PASS:?error}
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB"]
      interval: 15s
      timeout: 5s
      retries: 4
    labels:
      - com.centurylinklabs.watchtower.enable=${WATCHTOWER_ENABLED:-true}
    networks:
      invidious_backend:
        ipv4_address: ${DB_IP:-10.202.37.2} # Forcing static IP for the DB since resolving will not work
      db_backend:


networks:
  web:
    external: true
  proxy:
    internal: true
    driver_opts:
      com.docker.network.driver.mtu: 1300
  invidious_backend:
    internal: true
    driver: bridge
    ipam:
      driver: default
      config: # Forcing static IP for the DB since resolving will not work
       - subnet: ${DB_SUBNET:-10.202.37.0/24}
         gateway: ${DB_SUBNET:-10.202.37.1}
  db_backend:
    external: true


volumes:
  data:
    name: ${APP_SVC_NAME:-invidious}_postgres_vol

Is this the correct way to handle networking?