sparrow: an API-first document intelligence platform for structured data extraction and agentic workflows

sparrow: an API-first document intelligence platform for structured data extraction and agentic workflows

What it solves

Sparrow is an API-first platform designed for enterprise document intelligence. It solves the problem of converting unstructured documents—such as invoices, receipts, bank statements, and financial tables—into clean, validated, structured JSON data. It also handles text processing and decision-making tasks through instruction calling and agentic workflows.

How it works

Sparrow uses a pluggable architecture that allows users to mix and match different processing pipelines based on the task:

  • Sparrow Parse: Utilizes Vision LLMs to extract structured JSON from images and multi-page PDFs.
  • Sparrow Instructor: Uses Text LLMs for instruction processing, validation, and decision-making.
  • Sparrow Agents: Orchestrates multi-step workflows with custom agents and visual monitoring via Prefect.

The platform supports multiple backends to ensure it can run on various hardware, including MLX for Apple Silicon, vLLM for NVIDIA GPUs, Ollama, and Mistral OCR for cloud-based extraction. All processing can run on the user's own infrastructure to avoid external API dependencies.

Who it’s for

It is built for enterprises and developers who need to automate the extraction of data from complex documents and integrate that data into backend pipelines or data workflows without relying on cloud-based AI services.

Highlights

  • Universal Document Processing: Supports a wide range of formats including PNG, JPG, and multi-page PDFs.
  • Schema Validation: Uses JSON schema-based extraction to ensure output data is valid.
  • Pluggable Backends: Compatible with MLX, vLLM, Ollama, and Hugging Face.
  • Local Execution: Designed to run on private infrastructure for enhanced security and only requires RESTful API calls for integration.
  • Visual Interface: Includes a web UI for drag-and-drop uploads and real-time processing results.

Sources