forkd: a microVM sandbox runtime for AI agent fan-out with millisecond-scale snapshot forking

forkd: a microVM sandbox runtime for AI agent fan-out with millisecond-scale snapshot forking

What it solves

forkd is a microVM sandbox runtime designed for AI agent fan-out. It solves the high latency and resource overhead associated with cold-booting virtual machines or containers when an AI agent needs to spawn many short-lived, isolated environments (e.g., for code interpretation, tool use, or evaluation rollouts).

How it works

Built on Firecracker, forkd uses a "fork-from-warm" approach. A parent VM boots once and loads the necessary runtime (such as Python, ML models, or dependencies like NumPy), then is paused to disk as a snapshot. Child VMs are then spawned by mapping the parent's memory image using mmap with MAP_PRIVATE. This allows children to share the parent's resident memory via copy-on-write (CoW) at the page level, resulting in spawn times significantly faster than cold boots.

Additionally, forkd supports BRANCH, allowing a running sandbox to be snapshotted and resumed as multiple divergent paths mid-thought, and diff-snapshot chains, which allow layers of dependencies (e.g., separate snapshots for NumPy and Pandas) to be stacked without duplicating the base memory image.

Who it’s for

  • AI Agent Developers: Those building code interpreters or Jupyter-kernel sandboxes where each tool call requires a fresh, isolated environment.
  • ML Evaluation Harnesses: Users running hundreds of parallel test rollouts (e.g., SWE-bench) without the overhead of Docker cold-starts.
  • Security-Conscious Users: Those needing KVM-level hardware isolation for untrusted code execution in CI or multi-tenant environments.
  • Self-Hosters: Developers seeking an open-source, Apache 2.0 alternative to managed sandbox SaaS platforms.

Highlights

  • Extreme Spawn Speed: Capable of forking 100 microVMs in approximately 101 ms.
  • Live BRANCH: Ability to snapshot a running sandbox and resume in as little as 56 ms (p50).
  • Hardware Isolation: Each child is a separate Firecracker microVM backed by KVM, providing stronger isolation than standard containers.
  • Stacked Snapshots: Supports diff-snapshot chains to efficiently manage layered dependencies.
  • Full Linux Environment: Provides a real Linux kernel per child with multi-vCPU and full TCP networking, unlike some function-level runtimes.
  • Developer Tooling: Includes a REST API, Python/TypeScript/MCP SDKs, and Prometheus metrics.

Sources