Hello,
I’m running a bash script (run_postgresql_and_upload.sh) to start a PostgreSQL container and upload Parquet files to a database using a Python script. However, I encounter a Bus error during initdb, and the process exits with code 135. The logs show:
The files belonging to this database system will be owned by user "postgres".
...
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 20
selecting default shared_buffers ... 400kB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
Bus error
child process exited with exit code 135
initdb: removing contents of data directory "/var/lib/postgresql/data"
System Details:
- OS: Ubuntu 20.04 LTS server
- Architecture: x86-64
- Hardware: 32 CPUs, 64GB RAM
- Docker and Python 3 are installed on the host.
Current Script:
Here’s the current bash script (run_postgresql_and_upload.sh):
#!/bin/bash
POSTGRES_IMAGE="postgres:13"
CONTAINER_NAME="postgres-server"
PARQUET_FOLDER="data/"
TABLE_NAME="user_portraits"
HOST="localhost"
PORT=5431
DATABASE="user_portraits"
# PostgreSQL credentials per official documentation
POSTGRES_USER="postgres"
POSTGRES_PASSWORD="postgres"
# Resource limits
DOCKER_MEMORY="32g"
DOCKER_CPUS="16.0"
check_docker() {
if ! command -v docker &> /dev/null; then
echo "Docker is not installed. Please install Docker first."
exit 1
fi
}
start_postgresql() {
echo "Checking PostgreSQL container..."
# Check if container exists
if docker ps -a --format "{{.Names}}" | grep -q "^${CONTAINER_NAME}$"; then
echo "📦 Container $CONTAINER_NAME already exists"
# Check if container is running
if docker ps --format "{{.Names}}" | grep -q "^${CONTAINER_NAME}$"; then
echo "✅ Container is already running"
else
echo "🚀 Starting existing container..."
docker start $CONTAINER_NAME
sleep 5
fi
else
echo "🐘 Creating new PostgreSQL container..."
# Create new container
docker run -d \
--name $CONTAINER_NAME \
-p $PORT:5432 \
-e POSTGRES_USER=$POSTGRES_USER \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-e POSTGRES_DB=$DATABASE \
--memory=$DOCKER_MEMORY \
--cpus=$DOCKER_CPUS \
--shm-size=256m \
$POSTGRES_IMAGE
echo "Waiting for PostgreSQL to be ready..."
sleep 15
fi
# Wait for PostgreSQL to be ready
for i in {1..10}; do
if docker exec $CONTAINER_NAME pg_isready -U $POSTGRES_USER -d $DATABASE 2>/dev/null; then
echo "✅ PostgreSQL is ready!"
return 0
fi
echo "⏳ Waiting for PostgreSQL to start... ($i/10)"
sleep 3
done
echo "❌ PostgreSQL failed to start within 30 seconds"
echo "Checking container logs..."
docker logs $CONTAINER_NAME
exit 1
}
check_postgresql_connection() {
echo "🔍 Testing PostgreSQL connection..."
# Test connection with psql in container
if docker exec $CONTAINER_NAME psql -U $POSTGRES_USER -d $DATABASE -c "SELECT 1;" 2>/dev/null; then
echo "✅ Connection test successful"
return 0
else
echo "❌ Connection test failed"
return 1
fi
}
run_python_script() {
echo "🚀 Running Python script to upload parquet files..."
# Check for Python
if ! command -v python3 &> /dev/null; then
echo "Python3 is not installed. Please install Python3 first."
exit 1
fi
# Install required packages if missing
if ! python3 -c "import psycopg2" 2>/dev/null; then
echo "Installing psycopg2..."
pip install psycopg2-binary
fi
if ! python3 -c "import polars" 2>/dev/null; then
echo "Installing polars..."
pip install polars
fi
# Run Python script
python3 ../utils/upload_parquet_to_postgresql.py \
--parquet_folder "$PARQUET_FOLDER" \
--table_name "$TABLE_NAME" \
--host "$HOST" \
--port "$PORT" \
--database "$DATABASE" \
--user "$POSTGRES_USER" \
--password "$POSTGRES_PASSWORD"
}
# Main script
check_docker
start_postgresql
# Check connection
if check_postgresql_connection; then
run_python_script
else
echo "❌ Cannot connect to PostgreSQL. Please check the container."
echo "💡 Try: docker logs $CONTAINER_NAME"
echo "💡 Try: docker restart $CONTAINER_NAME"
exit 1
fi
echo "🎉 Process completed!"
Previous Script (for context):
I previously used a similar script with these differences:
- Image:
postgres:13.22-trixie(vs.postgres:13in the current script) - Port: 5432 (vs. 5431)
- Database:
userdb(vs.user_portraits) - Shared memory:
--shm-size=2g(vs. 256m) - Python dependencies:
mltoolandpandas(vs.psycopg2andpolars) - Included
--ulimit memlock=-1:-1and a volume (postgres-data)
Both scripts fail with the same Bus error during initdb.
Context:
- The script checks for Docker, starts a PostgreSQL container with resource limits (
--memory=32g,--cpus=16.0,--shm-size=256m), waits for the database to be ready, and runs a Python script (../utils/upload_parquet_to_postgresql.py) to load data from adata/folder into auser_portraitstable. - Settings:
POSTGRES_USER=postgres,POSTGRES_PASSWORD=postgres,POSTGRES_DB=user_portraits, port 5431. - The script installs Python dependencies (
psycopg2,polars) on the host and executes the Python script. - The
Bus errorlikely relates to huge pages, as my system is x86-64 (Ubuntu 20.04 LTS, 32 CPUs, 64GB RAM, so no architecture emulation issues).
What I’ve Tried:
- I suspect the
Bus erroris due to huge pages. A suggested fix is to create a custom image withhuge_pages = offinpostgresql.conf.sampleor use-e POSTGRES_INITDB_ARGS="--set huge_pages=off"(though this may not work withpostgres:13or13.22-trixie). - I’m considering replacing the script with a
Dockerfileto bundle PostgreSQL, Python, and the upload logic into one image for better reproducibility.
Questions:
- How can I fix the
Bus errorin the current script? Would a custom image withhuge_pages = offor upgrading topostgres:16resolve it on my Ubuntu 20.04 LTS (x86-64, 32 CPUs, 64GB RAM) system? - Should I switch to a
Dockerfilefor this scenario (running PostgreSQL and uploading Parquet files)? If so, how should I structure theDockerfileto include PostgreSQL, Python, and the upload script while mounting thedata/folder? - If I keep the bash script, how can I make it more robust (e.g., handle the
Bus erroror optimize for my system’s resources)?
Any advice, sample Dockerfile, or fixes for the script would be greatly appreciated! I can provide more details (e.g., Docker version, Python version) if needed.