Streamlined MongoDB Integration in Dockerized Python Projects for Small Teams

Me and my team are currently working on integrating Docker with a Python project hosted on GitHub. Our project uses Jupyter Notebook and MongoDB for its database. Since my team is small, we are seeking guidance on managing the MongoDB server within this development environment. Our anticipation is that the database will remain small-scale and are interested in strategies for handling such instances. We have some questions regarding this situation.

  • Efficient MongoDB Management for Small Teams:

    • What’s the best approach to handle MongoDB instances in projects with a limited number of developers?
    • We’re aiming for a solution that’s both straightforward and effective.
  • Backup and Restore Consideration:

    • Should we set up backup and restore procedures to safeguard against data loss and unforeseen problems?
  • Including Backups in Version Control for Compact DB:

    • Is it worth adding backups to version control for our small and compact database structure?
    • Are there potential benefits and drawbacks associated with this approach?

Basically, I just need help overseeing the MongoDB aspect of our project while following optimal practices for smaller dev groups.

Dockerfile:

# Use the official Python image as the base
FROM python:3.9

# Set the working directory inside the container
WORKDIR /app

# Copy the requirements file into the container and install dependencies
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Copy the notebook files into the container
COPY notebooks notebooks

# Expose the port the Jupyter Notebook will run on
EXPOSE 8888

# Start the Jupyter Notebook when the container starts
CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]

docker-compose.yml (with MongoDB integrated):

version: '3.8'
services:
  jupyter:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: test_jupyter_container
    image: test-jupyter-image:latest
    ports:
      - "8888:8888"
    volumes:
      - ./notebooks:/app

  mongo:
    container_name: test_mongo_container
    image: mongo:4.4.10
    ports:
      - "27017:27017"
    volumes:
      - ./mongodb_data:/data/db

volumes:
  mongodb_data:

Backups? Always a good idea!

Put backups in git? Not so sure.

We have a bash script iterating over all database clusters and databases, dumping each every half hour to a new timestamped file.

There are some advanced tools available, for example Percona Backup for MongoDB (no endorsement).