Economics of the AI Supercycle: Baseten and the Shift to Custom Inference

Economics of the AI Supercycle: Baseten and the Shift to Custom Inference

The Core Thesis: Inference as the Engine of AI Value

AI inference demand is projected to increase by a factor of a billion, driven by the rise of agentic applications and larger models. While 95% of current inference spend goes to frontier models (like those from OpenAI and Anthropic), the path to building profitable, viable, and defensible AI companies lies in transitioning to custom, post-trained open-source models.

The Economic Case for Custom Models

Companies are moving away from frontier models toward open-source alternatives for two primary reasons: viability and defensibility.

Financial Viability and Gross Margins

Open-source models typically trail frontier models by approximately 90 days in capability but can be 70% to 90% cheaper to run. For scaled businesses, this cost reduction is essential for moving gross margins from zero or negative toward sustainable levels (40% to 70%).

Strategic Defensibility

Relying exclusively on frontier labs creates a strategic risk where companies effectively hand over their proprietary workflows and user signals to the model providers. Tuhin Srivastava likens frontier labs to the "East India Company," suggesting that by using their APIs, companies provide the very data the labs need to eventually post-train models that compete directly with those same companies' specialized workflows.

The Post-Training Workflow

To "own their intelligence," companies are adopting post-training workflows to create specialized models. The process involves:

  1. Defining a Utility Function: The company determines exactly what it wants to optimize (e.g., minimizing transcription errors in a medical speech-to-text model).
  2. Data Provision: The company provides proprietary datasets.
  3. Base Model Selection: An open-source model is chosen as the starting point.
  4. Scaffolding: Infrastructure (like Baseten) provides the technical framework to turn the base model and data into a specialized, post-trained model.
  5. Deployment: The resulting model is integrated into the inference stack for production use.

The Compute Crisis and Vertical Integration

Compute scarcity is a systemic issue that is unlikely to normalize due to compounding demand. This scarcity is driving a shift in how AI infrastructure companies operate.

The "Drug Market" of GPUs

GPU procurement is currently described as an inefficient, non-mature market characterized by high slippage and extreme price volatility. For example, Baseten observed a renewal quote for B200 Blackwell chips that nearly doubled the hourly rate from $263 to $510.

Transition from Renting to Owning

While Baseten initially focused on the software layer (renting compute from ~20 different clouds and stitching 87 clusters together to make GPUs fungible), the company is moving toward owning its own hardware. This shift is driven by:

  • Access: Guaranteeing the ability to fulfill demand without relying on cloud provider timelines (which can be 12-15 months out).
  • Economics: Owning hardware is approximately 30% cheaper than renting at scale.
  • Scale: Baseten estimates a need for 150,000 B200 equivalents in two years to support its projected growth, representing roughly $7 billion in compute spend.

Hardware and Ecosystem Trends

NVIDIA's Dominance

Despite the emergence of TPUs and other "neo chips," NVIDIA remains dominant due to its fleshed-out supply chain, relationship with TSMC, and the CUDA ecosystem. Most current high-speed development relies on NVIDIA-native runtimes like TRT-LLM, vLLM, and XG Lang.

Heterogeneous Architectures

Future hardware is expected to move toward heterogeneous architectures that separate "prefill" (compute-bound) and "decode" (memory-bound) operations onto different chips rather than running everything on a single GPU.

Future Opportunities in AI Infrastructure

Beyond model training and inference, significant economic opportunities exist in the physical build-out of AI:

  • Energy and Power: Investing in the underlying power infrastructure required for massive compute clusters.
  • Modular Data Centers: Standardizing the unit of compute into modular containers to industrialize the build-out process, effectively creating an "API for compute" at the physical layer.

Sources