Bus Error Starting PostgreSQL in Docker: Fix Script or Switch to Dockerfile?

Hello,

I’m running a bash script (run_postgresql_and_upload.sh) to start a PostgreSQL container and upload Parquet files to a database using a Python script. However, I encounter a Bus error during initdb, and the process exits with code 135. The logs show:

The files belonging to this database system will be owned by user "postgres".
...
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 20
selecting default shared_buffers ... 400kB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
Bus error
child process exited with exit code 135
initdb: removing contents of data directory "/var/lib/postgresql/data"

System Details:

  • OS: Ubuntu 20.04 LTS server
  • Architecture: x86-64
  • Hardware: 32 CPUs, 64GB RAM
  • Docker and Python 3 are installed on the host.

Current Script:
Here’s the current bash script (run_postgresql_and_upload.sh):

#!/bin/bash

POSTGRES_IMAGE="postgres:13"
CONTAINER_NAME="postgres-server"
PARQUET_FOLDER="data/"
TABLE_NAME="user_portraits"
HOST="localhost"
PORT=5431
DATABASE="user_portraits"

# PostgreSQL credentials per official documentation
POSTGRES_USER="postgres"
POSTGRES_PASSWORD="postgres"

# Resource limits
DOCKER_MEMORY="32g"
DOCKER_CPUS="16.0"

check_docker() {
    if ! command -v docker &> /dev/null; then
        echo "Docker is not installed. Please install Docker first."
        exit 1
    fi
}

start_postgresql() {
    echo "Checking PostgreSQL container..."
    
    # Check if container exists
    if docker ps -a --format "{{.Names}}" | grep -q "^${CONTAINER_NAME}$"; then
        echo "📦 Container $CONTAINER_NAME already exists"
        
        # Check if container is running
        if docker ps --format "{{.Names}}" | grep -q "^${CONTAINER_NAME}$"; then
            echo "✅ Container is already running"
        else
            echo "🚀 Starting existing container..."
            docker start $CONTAINER_NAME
            sleep 5
        fi
    else
        echo "🐘 Creating new PostgreSQL container..."
        
        # Create new container
        docker run -d \
            --name $CONTAINER_NAME \
            -p $PORT:5432 \
            -e POSTGRES_USER=$POSTGRES_USER \
            -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
            -e POSTGRES_DB=$DATABASE \
            --memory=$DOCKER_MEMORY \
            --cpus=$DOCKER_CPUS \
            --shm-size=256m \
            $POSTGRES_IMAGE
        
        echo "Waiting for PostgreSQL to be ready..."
        sleep 15
    fi
    
    # Wait for PostgreSQL to be ready
    for i in {1..10}; do
        if docker exec $CONTAINER_NAME pg_isready -U $POSTGRES_USER -d $DATABASE 2>/dev/null; then
            echo "✅ PostgreSQL is ready!"
            return 0
        fi
        echo "⏳ Waiting for PostgreSQL to start... ($i/10)"
        sleep 3
    done
    
    echo "❌ PostgreSQL failed to start within 30 seconds"
    echo "Checking container logs..."
    docker logs $CONTAINER_NAME
    exit 1
}

check_postgresql_connection() {
    echo "🔍 Testing PostgreSQL connection..."
    
    # Test connection with psql in container
    if docker exec $CONTAINER_NAME psql -U $POSTGRES_USER -d $DATABASE -c "SELECT 1;" 2>/dev/null; then
        echo "✅ Connection test successful"
        return 0
    else
        echo "❌ Connection test failed"
        return 1
    fi
}

run_python_script() {
    echo "🚀 Running Python script to upload parquet files..."
    
    # Check for Python
    if ! command -v python3 &> /dev/null; then
        echo "Python3 is not installed. Please install Python3 first."
        exit 1
    fi
    
    # Install required packages if missing
    if ! python3 -c "import psycopg2" 2>/dev/null; then
        echo "Installing psycopg2..."
        pip install psycopg2-binary
    fi
    
    if ! python3 -c "import polars" 2>/dev/null; then
        echo "Installing polars..."
        pip install polars
    fi
    
    # Run Python script
    python3 ../utils/upload_parquet_to_postgresql.py \
        --parquet_folder "$PARQUET_FOLDER" \
        --table_name "$TABLE_NAME" \
        --host "$HOST" \
        --port "$PORT" \
        --database "$DATABASE" \
        --user "$POSTGRES_USER" \
        --password "$POSTGRES_PASSWORD"
}

# Main script
check_docker
start_postgresql

# Check connection
if check_postgresql_connection; then
    run_python_script
else
    echo "❌ Cannot connect to PostgreSQL. Please check the container."
    echo "💡 Try: docker logs $CONTAINER_NAME"
    echo "💡 Try: docker restart $CONTAINER_NAME"
    exit 1
fi

echo "🎉 Process completed!"

Previous Script (for context):
I previously used a similar script with these differences:

  • Image: postgres:13.22-trixie (vs. postgres:13 in the current script)
  • Port: 5432 (vs. 5431)
  • Database: userdb (vs. user_portraits)
  • Shared memory: --shm-size=2g (vs. 256m)
  • Python dependencies: mltool and pandas (vs. psycopg2 and polars)
  • Included --ulimit memlock=-1:-1 and a volume (postgres-data)

Both scripts fail with the same Bus error during initdb.

Context:

  • The script checks for Docker, starts a PostgreSQL container with resource limits (--memory=32g, --cpus=16.0, --shm-size=256m), waits for the database to be ready, and runs a Python script (../utils/upload_parquet_to_postgresql.py) to load data from a data/ folder into a user_portraits table.
  • Settings: POSTGRES_USER=postgres, POSTGRES_PASSWORD=postgres, POSTGRES_DB=user_portraits, port 5431.
  • The script installs Python dependencies (psycopg2, polars) on the host and executes the Python script.
  • The Bus error likely relates to huge pages, as my system is x86-64 (Ubuntu 20.04 LTS, 32 CPUs, 64GB RAM, so no architecture emulation issues).

What I’ve Tried:

  • I suspect the Bus error is due to huge pages. A suggested fix is to create a custom image with huge_pages = off in postgresql.conf.sample or use -e POSTGRES_INITDB_ARGS="--set huge_pages=off" (though this may not work with postgres:13 or 13.22-trixie).
  • I’m considering replacing the script with a Dockerfile to bundle PostgreSQL, Python, and the upload logic into one image for better reproducibility.

Questions:

  1. How can I fix the Bus error in the current script? Would a custom image with huge_pages = off or upgrading to postgres:16 resolve it on my Ubuntu 20.04 LTS (x86-64, 32 CPUs, 64GB RAM) system?
  2. Should I switch to a Dockerfile for this scenario (running PostgreSQL and uploading Parquet files)? If so, how should I structure the Dockerfile to include PostgreSQL, Python, and the upload script while mounting the data/ folder?
  3. If I keep the bash script, how can I make it more robust (e.g., handle the Bus error or optimize for my system’s resources)?

Any advice, sample Dockerfile, or fixes for the script would be greatly appreciated! I can provide more details (e.g., Docker version, Python version) if needed.

I don’t know, but the best way to find out is trying it. You tried changing the huge_pages setting already, so that alone would probably not help, although I think I found the same recommendation as you or similar: https://github.com/docker-library/postgres/issues/451#issuecomment-871109581

Since you already set the shared memory which I could have recommended, the issue seems more like a Postgres configuration issue than Docker issue, so it would be worth asking more experienced PostgreSQL users as well.

https://www.postgresql.org/community/

Unless everything works outside of containers.

I also found this that might help. I haven’t finished it yet, but it looks like an interesting article about huge pages and Postgres

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.