Scaling Past Informal AI: Axiom Math and the Path to Verified Superintelligence

Scaling Past Informal AI: Axiom Math and the Path to Verified Superintelligence

The Thesis: Verification as a Catalyst for Brilliance

Formal verification is not a tool for fixing "lousiness" or eliminating hallucinations; it is the primary mechanism for scaling and compounding superintelligence. While informal AI relies on human preference and stochastic judging, verified AI uses formal languages to provide a ground truth that allows AI to extend its brilliance, much like how rigorous proof writing transformed Ramanujan from an intuitive genius into a more powerful mathematician.

Axiom Math's Approach and the Putnam Success

Axiom Math utilizes a system called Action Prover, an ensemble of post-trained models using Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) on Lean data. This approach focuses on verified generation rather than just verification of existing outputs.

The Putnam Benchmark

In December 2025, Axiom's system achieved a perfect score of 120/120 on the Putnam exam, surpassing both the best human performers (who scored 110) and other leading LLMs like DeepSeek (which scored 103). This result demonstrates that a formal math system with significantly less data can outperform informal LLMs on superhuman tasks.

Lean as the Foundation

Lean is a functional programming language and theorem prover that turns proofs into programs (via the Curry-Howard correspondence). Axiom leverages Lean because it allows AI to handle low-level logical deductions through "tactics," freeing the system to navigate high-level intuition spaces.

Mathematical Discovery vs. Proof

Axiom distinguishes between mathematical discovery and formal proof. Proof is the final verification, but discovery is the pre-conjecturing step where mathematicians find constructions, sequences, or graphs to form intuitions.

  • Discovery Tools: Axiom is open-sourcing codebases for mathematical discovery to help theorists find counter-examples or constructions (e.g., solving 30-year-old conjectures) before attempting a formal proof.
  • The Workflow: The ideal pipeline involves an informal reasoner proposing a specification or conjecture, and a formal prover (like Action Prover) executing the proof.

The Business Case for Verified AI

With a $200M Series A and a $1.6B valuation, Axiom's market strategy extends beyond niche academic math into any domain where a "right of first refusal" on AI-generated code is required.

Hardcore Verification Markets

Certain industries have zero tolerance for "mostly verified" results:

  • Hardware Verification: There is no partial credit for a GPU; it either works or it doesn't. Currently, the industry standard for design-to-verification in ASIC projects is a ratio of 1:3 to 1:4 in terms of team size and duration.
  • Software Verification: While "vibe coding" a website doesn't require verification, mission-critical distributed systems and regulatory-heavy enterprise agents do.

The Specification Problem

A major bottleneck is the "specification problem": humans are often unable to precisely specify what they want. Axiom views this as an interactive process where AI suggests specifications (conjectures) and the prover verifies them, iteratively refining the goal.

Technical Challenges and Limits

Rice's Theorem and Decidability

While Rice's Theorem states that non-trivial properties of programs cannot be formally verified for all programs, Axiom focuses on verifying the majority of useful programs. The goal is to decompose complex tasks into small enough components that they become provable.

Scaling and Context Windows

As proofs grow (sometimes 20 lines of proof for every 1 line of code), context window limits become a concern. Axiom addresses this through:

  • Auto-informalization: Converting formal Lean code back into informal summaries to maintain high-level tracking.
  • Cyclic Consistency: Formalizing and informalizing repeatedly to ensure the logic remains sound.

The Path to AGI and Recursive Self-Improvement

Carina Hong asserts that an informal math system alone can never reach mathematical AGI because human expert grading does not scale. To achieve superintelligence, AI must be able to generate its own verified data and improve recursively without relying on a finite pool of human experts.

The Axel API

To accelerate the ecosystem, Axiom released Axel (Axiom Lean Engine), a set of meta-programming tools for Lean. This infrastructure allows other developers and frontier labs to perform large-scale proof validation and manipulation, potentially serving as a verification partner for other LLMs.

Sources