Leanstral 1.5: Proof Abundance for All

Leanstral 1.5: Proof Abundance for All

Mistral AI has released Leanstral 1.5, a specialized model designed for proof engineering in Lean 4. The model features 119B total parameters with only 6B active parameters, is released under an Apache-2.0 license, and provides a significant performance upgrade in formal verification and mathematical reasoning.

Performance Benchmarks

Leanstral 1.5 achieves state-of-the-art results on several key mathematical benchmarks, demonstrating its ability to handle both elementary and advanced reasoning.

  • miniF2F: The model completely saturates this cross-system benchmark for formal mathematics, reaching 100% on both validation and test sets.
  • PutnamBench: Leanstral 1.5 solves 587 out of 672 problems from the Putnam Mathematical Competition. It outperforms Seed-Prover 1.5 (high setting) by 7 problems while costing significantly less—approximately $4 per problem compared to an estimated $300+.
  • FATE-H and FATE-X: The model sets a new state-of-the-art on these abstract algebra benchmarks for graduate and PhD-level problems, solving 87% of FATE-H and 34% of FATE-X problems.
  • FLTEval: Leanstral 1.5 improves pass@1 from 21.9 to 28.9 and pass@8 from 31.9 to 43.2, surpassing Opus 4.6's 39.6 at one-seventh the cost.

Training Methodology

Leanstral 1.5 was developed using a three-stage process: mid-training, supervised fine-tuning, and reinforcement learning with CISPO. The model was trained in two distinct RL environments:

  1. Multiturn Environment: The model is tasked with proving or disproving a theorem statement. It submits a proof, receives feedback from the Lean compiler, and iteratively refines its approach until the proof compiles or the budget is exhausted.
  2. Code Agent Environment: The model operates as a developer within a raw filesystem, editing files, running bash commands, and utilizing the Lean language server to inspect goals, errors, and type information. This environment allows the model to handle long-horizon tasks, such as completing partial proofs in a repository and building auxiliary lemmas.

Test-Time Scaling and Reasoning

Leanstral 1.5 exhibits strong test-time scaling, meaning its performance improves monotonically as the token budget per attempt is increased. On PutnamBench, Pass@8 performance climbed from 44 problems solved at 50k tokens to 587 problems solved at 4M tokens. This capability allows the model to reason, edit files, and revise across millions of tokens, as seen in a complex AVL-tree proof that required over 2.7 million tokens and 22 compactions.

Real-World Code Verification Case Studies

Leanstral 1.5 demonstrates practical utility in verifying code properties and discovering bugs in open-source repositories.

AVL Tree Time Complexity

The model successfully proved the O(log n) time complexity guarantees for a real implementation of AVL trees. This required structural induction, handling of monadic time tracking, and exhaustive case analysis for rebalancing paths.

Bug Discovery in Open-Source Software

Using a pipeline involving Aeneas (to translate Rust code to Lean) and Leanstral 1.5 (to infer intent and generate correctness properties), Mistral AI identified 11 genuine bugs across 57 tested repositories, five of which were previously unreported on GitHub. One notable example was an overflow bug in the datrs/varinteger library's sign function for zigzag decoding, which occurs on input Std.U64.MAX.

Community Feedback and Analysis

While the model's performance is impressive, community members on Hacker News have raised several points of discussion:

"I found the bug finding example to be weird... In what way would this boundary condition case be considered something that 'testing [...] would typically miss'?"

Some users argued that boundary condition bugs, like the one found in datrs/varinteger, are typically caught by fuzzing or careful testing. Others noted that the same bug had been reported as a GitHub issue a week prior to the announcement, questioning the validity of the aclaimed "discovery."

Additionally, some users questioned the timing of the benchmarks, suggesting that the comparisons were made against older models, and others questioned the choice of Lean 4 over other formal verification tools like Isabelle/HOL or TLA+.

Getting Started with Leanstral 1.5

Leanstral 1.5 is available as an Apache-2.0 licensed model on HuggingFace and as a free API endpoint (leanstral-1-5). Users can deploy it through Mistral Vibe with the following setup:

  1. Install Mistral Vibe: uv tool install mistral-vibe followed by vibe --setup.
  2. Install Leanstral 1.5: /leanstallexit.
  3. Launch Agent: vibe --agent lean.
  4. Optional LSP MCP: Install the lean-lsp-mcp for enhanced functionality by adding it to the ~/.vibe/config.toml file.

Sources