Hello,
I’m running a bash script (run_postgresql_and_upload.sh) to start a PostgreSQL container and upload Parquet files to a database using a Python script. However, I encounter a Bus error during initdb, and the process exits with code 135. The logs show:
The files belonging to this database system will be owned by user "postgres".
...
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 20
selecting default shared_buffers ... 400kB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
Bus error
child process exited with exit code 135
initdb: removing contents of data directory "/var/lib/postgresql/data"
System Details:
- OS: Ubuntu 20.04 LTS server
- Architecture: x86-64
- Hardware: 32 CPUs, 64GB RAM
- Docker and Python 3 are installed on the host.
Current Script:
Here’s the current bash script (run_postgresql_and_upload.sh):
#!/bin/bash
POSTGRES_IMAGE="postgres:13"
CONTAINER_NAME="postgres-server"
PARQUET_FOLDER="data/"
TABLE_NAME="user_portraits"
HOST="localhost"
PORT=5431
DATABASE="user_portraits"
# PostgreSQL credentials per official documentation
POSTGRES_USER="postgres"
POSTGRES_PASSWORD="postgres"
# Resource limits
DOCKER_MEMORY="32g"
DOCKER_CPUS="16.0"
check_docker() {
    if ! command -v docker &> /dev/null; then
        echo "Docker is not installed. Please install Docker first."
        exit 1
    fi
}
start_postgresql() {
    echo "Checking PostgreSQL container..."
    
    # Check if container exists
    if docker ps -a --format "{{.Names}}" | grep -q "^${CONTAINER_NAME}$"; then
        echo "📦 Container $CONTAINER_NAME already exists"
        
        # Check if container is running
        if docker ps --format "{{.Names}}" | grep -q "^${CONTAINER_NAME}$"; then
            echo "✅ Container is already running"
        else
            echo "🚀 Starting existing container..."
            docker start $CONTAINER_NAME
            sleep 5
        fi
    else
        echo "🐘 Creating new PostgreSQL container..."
        
        # Create new container
        docker run -d \
            --name $CONTAINER_NAME \
            -p $PORT:5432 \
            -e POSTGRES_USER=$POSTGRES_USER \
            -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
            -e POSTGRES_DB=$DATABASE \
            --memory=$DOCKER_MEMORY \
            --cpus=$DOCKER_CPUS \
            --shm-size=256m \
            $POSTGRES_IMAGE
        
        echo "Waiting for PostgreSQL to be ready..."
        sleep 15
    fi
    
    # Wait for PostgreSQL to be ready
    for i in {1..10}; do
        if docker exec $CONTAINER_NAME pg_isready -U $POSTGRES_USER -d $DATABASE 2>/dev/null; then
            echo "✅ PostgreSQL is ready!"
            return 0
        fi
        echo "⏳ Waiting for PostgreSQL to start... ($i/10)"
        sleep 3
    done
    
    echo "❌ PostgreSQL failed to start within 30 seconds"
    echo "Checking container logs..."
    docker logs $CONTAINER_NAME
    exit 1
}
check_postgresql_connection() {
    echo "🔍 Testing PostgreSQL connection..."
    
    # Test connection with psql in container
    if docker exec $CONTAINER_NAME psql -U $POSTGRES_USER -d $DATABASE -c "SELECT 1;" 2>/dev/null; then
        echo "✅ Connection test successful"
        return 0
    else
        echo "❌ Connection test failed"
        return 1
    fi
}
run_python_script() {
    echo "🚀 Running Python script to upload parquet files..."
    
    # Check for Python
    if ! command -v python3 &> /dev/null; then
        echo "Python3 is not installed. Please install Python3 first."
        exit 1
    fi
    
    # Install required packages if missing
    if ! python3 -c "import psycopg2" 2>/dev/null; then
        echo "Installing psycopg2..."
        pip install psycopg2-binary
    fi
    
    if ! python3 -c "import polars" 2>/dev/null; then
        echo "Installing polars..."
        pip install polars
    fi
    
    # Run Python script
    python3 ../utils/upload_parquet_to_postgresql.py \
        --parquet_folder "$PARQUET_FOLDER" \
        --table_name "$TABLE_NAME" \
        --host "$HOST" \
        --port "$PORT" \
        --database "$DATABASE" \
        --user "$POSTGRES_USER" \
        --password "$POSTGRES_PASSWORD"
}
# Main script
check_docker
start_postgresql
# Check connection
if check_postgresql_connection; then
    run_python_script
else
    echo "❌ Cannot connect to PostgreSQL. Please check the container."
    echo "💡 Try: docker logs $CONTAINER_NAME"
    echo "💡 Try: docker restart $CONTAINER_NAME"
    exit 1
fi
echo "🎉 Process completed!"
Previous Script (for context):
I previously used a similar script with these differences:
- Image: postgres:13.22-trixie(vs.postgres:13in the current script)
- Port: 5432 (vs. 5431)
- Database: userdb(vs.user_portraits)
- Shared memory: --shm-size=2g(vs. 256m)
- Python dependencies: mltoolandpandas(vs.psycopg2andpolars)
- Included --ulimit memlock=-1:-1and a volume (postgres-data)
Both scripts fail with the same Bus error during initdb.
Context:
- The script checks for Docker, starts a PostgreSQL container with resource limits (--memory=32g,--cpus=16.0,--shm-size=256m), waits for the database to be ready, and runs a Python script (../utils/upload_parquet_to_postgresql.py) to load data from adata/folder into auser_portraitstable.
- Settings: POSTGRES_USER=postgres,POSTGRES_PASSWORD=postgres,POSTGRES_DB=user_portraits, port 5431.
- The script installs Python dependencies (psycopg2,polars) on the host and executes the Python script.
- The Bus errorlikely relates to huge pages, as my system is x86-64 (Ubuntu 20.04 LTS, 32 CPUs, 64GB RAM, so no architecture emulation issues).
What I’ve Tried:
- I suspect the Bus erroris due to huge pages. A suggested fix is to create a custom image withhuge_pages = offinpostgresql.conf.sampleor use-e POSTGRES_INITDB_ARGS="--set huge_pages=off"(though this may not work withpostgres:13or13.22-trixie).
- I’m considering replacing the script with a Dockerfileto bundle PostgreSQL, Python, and the upload logic into one image for better reproducibility.
Questions:
- How can I fix the Bus errorin the current script? Would a custom image withhuge_pages = offor upgrading topostgres:16resolve it on my Ubuntu 20.04 LTS (x86-64, 32 CPUs, 64GB RAM) system?
- Should I switch to a Dockerfilefor this scenario (running PostgreSQL and uploading Parquet files)? If so, how should I structure theDockerfileto include PostgreSQL, Python, and the upload script while mounting thedata/folder?
- If I keep the bash script, how can I make it more robust (e.g., handle the Bus erroror optimize for my system’s resources)?
Any advice, sample Dockerfile, or fixes for the script would be greatly appreciated! I can provide more details (e.g., Docker version, Python version) if needed.