Mistral OCR 4 Release Notes
Mistral OCR 4 Release Notes
Mistral OCR 4 is a state-of-the-art document intelligence model designed to extract and structure content from complex documents. It provides a structured representation of documents—including bounding boxes, block classification, and confidence scores—making it a critical ingestion component for Retrieval-Augmented Generation (RAG), enterprise search, and agentic workflows.
Structured Document Representation
Mistral OCR 4 moves beyond simple text extraction to provide a comprehensive structural map of a document. Every extracted block is accompanied by:
- Bounding Boxes: Localizes text for in-context highlighting and reliable data pipelines.
- Typed-Block Classification: Identifies elements such as titles, tables, equations, and signatures.
- Inline Confidence Scores: Generated per-page and per-word to facilitate human-in-the-loop verification and redactions.
This structured output allows downstream systems to understand not only the text content but also the spatial arrangement and the functional role of each element within the document.
Performance and Benchmarks
Mistral OCR 4 outperforms leading AI-native and enterprise OCR systems across multiple benchmarks and human evaluations.
Human Preference and Public Benchmarks
Independent annotators preferred OCR 4 over all tested competitors with an average win rate of 72%. In public benchmarks, it achieved a top overall score of 85.20 on OlmOCRBench and 93.07 on OmniDocBench.
Multilingual Capabilities
OCR 4 supports 170 languages across 10 language groups. It shows significant performance gains in rare and low-resource languages (including Georgian, Armenian, and Kannada) where competing systems typically degrade.
Benchmark Limitations
Mistral notes that aggregate scores should be treated as directional. Common scoring artifacts that can penalize correct output include:
- Ground-truth errors: Incorrect reference annotations in the benchmark.
- Equivalent math notation: Different LaTeX strings that render identically.
- Equation segmentation: Variations in how expressions are split into fragments.
- Multi-column reading order: Challenges with words split across column boundaries.
Deployment and Integration Options
OCR 4 is designed for flexibility in deployment, supporting both API-based access and self-hosted environments for data sovereignty.
API and Document AI
Developers can use the OCR 4 API in two primary modes:
- Pure Extraction Mode: Returns raw extracted content, bounding boxes, and block types. This is ideal for high-volume batch ingestion and custom downstream logic.
- Document AI Mode: Layers additional capabilities on top of the OCR engine. By passing a JSON schema or custom prompt, users can return structured JSON (via
mistral-small-2603) or annotate images with a vision-language model.
Infrastructure and Pricing
OCR 4 is compact enough to run in a single container, allowing enterprise customers to keep data within their own infrastructure.
Pricing Structure:
- OCR 4 API: $4 per 1,000 pages.
- Batch API: $2 per 1,000 pages (50% discount).
- Document AI: $5 per 1,000 pages.
Recommended Use Cases
Mistral OCR 4 is optimized for the following production workflows:
- Semantic Chunking for RAG: Using classified blocks as retrieval units to improve accuracy.
- Agentic Workflows: Enabling agents to perform form filling, invoice processing, and compliance checks.
- Enterprise Search: Serving as a data-source component for custom ingestion and entity extraction.
- Structured Data Pipelines: Utilizing confidence scores to trigger human verification for high-stakes redactions or financial extractions.
OCR 4 is integrated with the Mistral Search Toolkit, an open-source framework for RAG and enterprise search ingestion and evaluation.