AutoGrad and the Bayesian Brain: Dr. Jeff Beck on the Future of AI

The Core Thesis: Intelligence is Not Function Approximation

True artificial intelligence cannot be achieved simply by scaling transformers or improving function approximation. Instead, intelligence must be grounded in the same domain as human cognition: the physical, macroscopic world. Dr. Jeff Beck proposes a shift from massive, monolithic neural networks toward a "lots of little models" approach—modular, object-centered Bayesian models that interact through discovered forces, mirroring how the biological brain processes reality.

AutoGrad vs. Transformers: What Actually Changed AI

While the industry credits the transformer architecture for the current AI boom, Dr. Beck argues that the real catalyst was automatic differentiation (AutoGrad).

AutoGrad transformed AI development from a theoretical mathematics problem into an engineering problem. By automating the calculation of gradients, it allowed researchers to rapidly experiment with different architectures, nonlinearities, and memory structures without manually deriving learning rules. This engineering shift enabled the hyperscaling of models, which in turn led to the perceived success of transformers. Dr. Beck notes that other architectures, such as Mamba (a state-space model), can achieve similar functionality when scaled, suggesting that scaling—enabled by AutoGrad—is more critical than the specific transformer architecture itself.

The Bayesian Brain and Active Inference

Dr. Beck posits that the human brain operates as a Bayesian inference engine, constantly testing hypotheses about the world. This is supported by behavioral experiments on "optimal cue combination," where humans combine reliable and unreliable sensory information (e.g., visual and auditory cues) with surprising efficiency.

Key Principles of the Bayesian Approach:

Hypothesis Testing: The brain uses generative models of the world conditioned on specific hypotheses.
Information Filtering: A vast majority of brain activity is dedicated to deciding what to ignore to avoid information overload.
Active Inference: Based on the work of Karl Friston, this framework links information theory and statistical physics to describe how agents minimize surprisal to maintain their existence.

The Grounding Problem: Physics over Language

A fundamental flaw in current Large Language Models (LLMs) is that they are grounded in language. Dr. Beck argues that language is a poor model for thought because self-report data in psychology is notoriously unreliable; people often explain their behavior in ways that are inconsistent with their actual decision-making processes.

To build an AI that thinks like a human, it must be grounded in macroscopic physics. This means the AI should perceive the world not as pixels or tokens, but as objects with specific relations and affordances. Intelligence must be embodied because the physical environment provides the "atomic elements of thought"—the basic building blocks from which more sophisticated conceptual models are constructed.

The "Lots of Little Models" Architecture

Instead of one massive neural network, Dr. Beck envisions a system structured like a video game engine. In this architecture, the AI maintains a library of thousands of small, modular models, each representing a specific object or object class (e.g., a "book model").

Advantages of Modular Object Models:

Computational Efficiency: The agent only instantiates the small fraction of models relevant to its current environment, maintaining sparsity.
Systems Engineering: By understanding objects and their relational forces, AI can perform "systems engineering"—combining known objects in novel ways to invent new things (e.g., combining an airfoil and a jet engine to create an airplane).
Generalization: Models learned in one domain (e.g., the inside of a house) can be ported to another domain (e.g., a park) because the objects themselves are modular.

Solving Generalization: The Cat in the Warehouse

Dr. Beck illustrates the failure of current AI through the "Cat in the Warehouse" problem. A warehouse robot trained only on boxes and forklifts would typically either crash or hallucinate when encountering a cat.

In a Bayesian, object-centered system, the process would be:

Surprisal Detection: The robot detects a high surprisal signal because the cat does not fit any existing model.
Knowledge Acquisition: The robot "phones a friend" (queries a central server) to download potential object models that match the visual data.
Hypothesis Testing: The robot tests several candidate models (e.g., different cat breeds) by observing the cat's behavior until it identifies the correct one.
Integration: The cat model is incorporated into the robot's local library for future use.

Alignment via Belief Exchange

Dr. Beck argues that traditional Reinforcement Learning (RL) alignment—using reward functions—is fundamentally flawed because reward values are often arbitrary and can lead to degenerate behavior (the "malevolent genie" problem).

He proposes an alignment strategy based on belief exchange. Since human actions are a combination of beliefs and values, alignment should involve an explicit exchange of beliefs. By communicating and reconciling their internal models of the world, humans and AI can isolate whether a disagreement is based on a factual misunderstanding (belief) or a difference in values (reward function), allowing for a more transparent and stable form of alignment.

AutoGrad and the Bayesian Brain: Dr. Jeff Beck on the Future of AI

AutoGrad and the Bayesian Brain: Dr. Jeff Beck on the Future of AI

The Core Thesis: Intelligence is Not Function Approximation

AutoGrad vs. Transformers: What Actually Changed AI

The Bayesian Brain and Active Inference

Key Principles of the Bayesian Approach:

The Grounding Problem: Physics over Language

The "Lots of Little Models" Architecture

Advantages of Modular Object Models:

Solving Generalization: The Cat in the Warehouse

Alignment via Belief Exchange

Sources