Categorical Deep Learning: Moving AI from Alchemy to Science

Categorical Deep Learning: Moving AI from Alchemy to Science

The Fundamental Failure of LLM Reasoning

Large Language Models (LLMs) currently struggle with basic algorithmic tasks, such as adding large numbers or adhering to the laws of physics, because they rely on pattern recognition rather than internalizing the underlying logic. When a pattern is slightly altered—such as changing a single digit in a long addition problem—the model often fails because it lacks the internal machinery to perform a discrete operation like "carrying the one."

While tool use (e.g., connecting an LLM to a calculator) provides a temporary fix, it does not solve the architectural misalignment. Relying on external tools is inefficient for complex reasoning problems requiring iterative computation and does not improve the model's intrinsic ability to predict the correct inputs for those tools. To achieve true reasoning and scientific capability, AI must internalize these computational rules within its own architecture.

From Geometric Deep Learning to Category Theory

Geometric Deep Learning (GDL) improved AI by introducing equivariance to symmetry transformations. This ensures that if an input is transformed in an irrelevant way (e.g., shifting a cat in an image or permuting nodes in a graph), the output remains predictably the same. This approach exponentially reduces the amount of data needed for training.

However, GDL has two primary limitations:

  1. Invertibility Requirement: GDL typically assumes symmetries are invertible (e.g., you can permute nodes back to their original order). Real-world computation often destroys information (e.g., pathfinding algorithms like Dijkstra's compress many different graphs into a single shortest path), making it non-invertible.
  2. Scope of Symmetry: Group theory, the basis of GDL, is excellent for spatial regularities but insufficient for describing generic algorithmic computation, where inputs must satisfy specific preconditions to produce post-conditions.

Category Theory is proposed as the solution to these limitations. By treating categories as "algebra with colors," it allows for partial compositionality—where components can only be linked if their "colors" (types) match. This provides a more flexible framework than group theory for modeling non-invertible processes and complex computational pipelines.

The "Alchemy" of Deep Learning and the Need for Theory

Modern deep learning is currently in an "alchemy" phase: practitioners achieve powerful results through ad hoc design choices, knobs, and tweaks, but lack a unifying theory to explain why these choices work or how to derive new architectures formally.

Categorical Deep Learning aims to be the "Periodic Table" for neural networks, providing a systematic guide to move from trial-and-error to principled engineering. This framework seeks to unify several disparate perspectives:

  • Probabilistic perspectives
  • Neuroscience perspectives
  • Gradient-based iterative updating

Synthetic vs. Analytic Mathematics in AI

To build this framework, researchers distinguish between two mathematical approaches:

  • Analytic Mathematics: Focuses on what things are made of (e.g., Descartes' lines as solution sets to equations). It requires a common foundation from which everything is built.
  • Synthetic Mathematics: Focuses on how things behave and relate to one another (e.g., Euclid's lines defined by their relationship between two points). It ignores inaccessible details (noise) and focuses on the principles of inference.

Categorical Deep Learning adopts a structuralist/synthetic approach. Instead of focusing on the internal "substance" of a neural network, it focuses on the structure-preserving maps between representations.

Advanced Categorical Concepts in Network Design

Weight Tying and 2-Categories

Weight tying occurs when multiple parts of a computation share identical parameters (e.g., in Recurrent Neural Networks). Category theory provides a formal way to justify this through 2-categories. While a standard category describes relationships between objects (morphisms), a 2-category describes relationships between those morphisms (2-morphisms). In this context, 2-morphisms can model reparameterizations and weight sharing, allowing researchers to prove when weight tying preserves the intended structure.

Recursion and Folds

In functional programming, data types like lists are defined recursively. Categorically, this is viewed as an algebra for an endofunctor. The process of consuming a list (a "fold") is a homomorphism from this algebra. By viewing neural network layers as homomorphisms between algebras for the same endofunctor, the framework can naturally express recursion and list-like processing.

The "Carry" Problem and Hopf Fibrations

One of the most basic failures of Graph Neural Networks (GNNs) is the inability to handle a "carry" operation (e.g., in addition). In discrete math, a carry is a simple trigger; in continuous math (used by neural networks), it is remarkably difficult to implement because the information is often in the change of state rather than the state itself.

The researchers suggest that this behavior may be modeled using complex geometric structures like Hopf Fibrations—where a 3D sphere in 4D space is projected onto a 2D sphere. This geometric subtlety could potentially allow neural networks to implement the "carrying" logic necessary for true algorithmic reasoning, effectively building "CPUs in neural networks."

Sources