The Gap Between Open Weights and Closed Source LLMs

The Performance Gap Between Open and Closed LLMs

The gap between open weights LLMs and closed source frontier models is characterized by a consistent lag in general capabilities, though specific domains like coding are seeing rapid convergence. While a single-metric analysis might suggest a total convergence by late 2026, a broader look across multiple benchmarks reveals that open weights models have generally remained about five months behind the closed source frontier.

Divergent Predictions Based on Benchmarking

Measuring the quality of LLMs is difficult because different benchmarks yield wildly different predictions about the trajectory of open weights models.

The Case for Rapid Convergence

Using the Artificial Analysis Intelligence Index—a headline index designed to assess overall capabilities—data shows the gap between open weights and closed source models began shrinking in summer 2024. A linear projection of this trend suggests the gap would reach zero by December 3, 2026.

The Case for Constant Lag

When the analysis is expanded to 18 different benchmarks, the picture changes. A boxplot of the monthly open frontier lag across these diverse metrics shows a line of best fit that is almost completely flat, indicating a consistent lag of just under five months over the measured period.

Domain-Specific Progress: The Coding Exception

Not all capabilities are evolving at the same rate. The most significant improvement in open weights models has occurred in coding benchmarks, where the gap has shrunk from 15 months to only one or two months. In contrast, most other datasets show a moderate increase in the gap or a stagnant lag, suggesting that coding is an area where open weights models are catching up most effectively.

Technical and Geopolitical Considerations

Community discussion around the open weights vs. closed source divide highlights several systemic factors that influence these performance trajectories:

Data Sourcing and Synthetic Data

Some analysts argue that US-based closed source models maintain their lead through the creation of high-quality synthetic data generated by massive "teacher models" that are too large to serve interactive traffic. In this view, open weights models—particularly those from Chinese labs—often advance by optimizing existing models or harvesting data from these frontier models rather than inventing novel data systems.

Systemic Advantages of Closed Models

Closed source "models" are often not just weights but entire backend systems. This allows closed source providers to augment the model with external systems to score higher on benchmarks, whereas open weights models are evaluated based on the weights alone.

Sustainability of Open Weights

There are concerns regarding the long-term viability of open weights models, as they are currently produced by private organizations rather than community-owned infrastructure.

"The spigot can be turned off at any time. Until there's some sort of 'community owned hardware', open weights models are always at risk of being discontinued."

Geopolitical Influence

The role of open weights models is often viewed as an asymmetric strategy. Some contributors suggest that Chinese labs use open source to compete and distribute the burden of compute, while US government restrictions on frontier model access for non-Americans may inadvertently accelerate the reliance on and development of competitive open weights alternatives.

The Gap Between Open Weights and Closed Source LLMs

The Gap Between Open Weights and Closed Source LLMs

The Performance Gap Between Open and Closed LLMs

Divergent Predictions Based on Benchmarking

The Case for Rapid Convergence

The Case for Constant Lag

Domain-Specific Progress: The Coding Exception

Technical and Geopolitical Considerations

Data Sourcing and Synthetic Data

Systemic Advantages of Closed Models

Sustainability of Open Weights

Geopolitical Influence

Sources