Sandboxing AI agents with MCP servers

Hi all,

I’m a student building a terminal AI agent that uses MCP servers (filesystem, Python, shell tools) and want to sandbox it properly as a security learning project.

Current understanding:
I need to isolate the agent so it thinks it’s operating on a dummy OS/filesystem. From my reading, this seems to be what Linux namespaces (which Docker uses under the hood) are for—creating isolated “views” of the system. I’m new to actually implementing this.

Questions:

  • For an AI agent that needs to execute code safely, is Docker the right starting point, or should I consider lightweight VMs (like Firecracker) from the beginning?
  • What’s the simplest way to create a realistic dummy filesystem inside the container that the agent can safely modify?
  • Beyond docker diff and logs, are there lightweight monitoring tools to watch what the agent thinks it’s doing vs. what’s happening on the host?
  • Any MCP-specific gotchas when running these servers inside containers?

Repo for context (learning in public): GitHub - Pavithra-Madhan/Terminal

Thanks for any guidance!

Curious about both educational and practical approaches, and eager to learn best practices from real-world experience.

Docker Docker has an experimental feature called Docker Sandboxes. Docker Sandboxes started with support for Gemini CLI and Claude Code, other agentic coding tools will follow.

If your agentic coding tool is not supported by Docker Sandbox, you can use DevContainers. You can define your own DevContainer declaratively, and involve a comose and Dockerfile if wanted.

You can use the mcp inspector to see what the coding agent exchanges with mcp servers: GitHub - modelcontextprotocol/inspector: Visual testing tool for MCP servers

Regarding what’s happening on your host: shouldn’t this be as easy as providing a rule (might be called different depending on the coding agent you use) where you tell the coding agent what you want it do it?

Thanks! Just to clarify - I’m actually building my own AI agent from scratch (still in development), not using an existing coding tool. So I need to containerize my custom Python code for safe testing. The MCP inspector link is helpful though.

Apologies! My brain somehow skipped the part that it’s about writing your own agent, and assumed it is about operating an existing coding agent in a secure sandbox.

Just on note on the lightweight vm part: on Windows, as soon as HyperV or WSL2 is installed, other vm solutions will only work, if they provide a HyperV-compatibilty mode (=poor performance!).