Un-0: Generating Images with Coupled Oscillators

Un-0: Generating Images with Coupled Oscillators

Un-0 is a generative AI model that replaces conventional deep neural network layers with a simulated system of coupled Kuramoto oscillators. By leveraging the laws of physics—specifically the dynamics of synchronized oscillators—Un-0 demonstrates that modern AI workloads can be executed on physical substrates, potentially reducing energy consumption by up to 1,000x compared to GPU-based execution.

Performance and Benchmarks

Un-0 achieves an FID (Fréchet Inception Distance) of 6.74 on class-conditional ImageNet 64×64. This performance matches the quality of several leading conventional image generation methods at the time of their initial publication, such as BigGAN, iDDPM, and WGAN-GP.

Model Scaling and Results

Un-0 was tested across different scales for CIFAR-10 and ImageNet 64×64:

ImageNet 64×64 Results:

Model Oscillator Count Total Parameters FID@50k
Un-0.n6656 6,656 57.17M 8.41
Un-0.n10240 10,240 129.80M 8.01
Un-0.n16384 16,384 322.44M 6.74

CIFAR-10 Results:

Model Oscillator Count Total Parameters FID@50k
Un-0.n1024 1,024 1.29M 11.01
Un-0.n2048 2,048 4.94M 9.32
Un-0.n4096 4,096 19.43M 8.76

While Un-0 expands the Pareto frontier for small models, it currently trails state-of-the-art conventional baselines like EDM and GDD at larger scales, as its quality improves more slowly with parameter count than conventional frontiers.

How Un-0 Works: The Physics of Generation

Un-0 utilizes the Kuramoto model, where a population of oscillators each with a natural frequency is coupled to others via a learnable coupling matrix. The system evolves according to an ordinary differential equation (ODE) where each oscillator's phase is nudged by the pull of its neighbors.

The Inference Process

Generating an image follows a five-step pipeline:

  1. Random Initialization: Every oscillator's phase is set to a random angle, serving as the seed (similar to noise in diffusion models).
  2. Class Conditioning: A smaller group of oscillators drives the requested class, biasing the main population toward class-associated arrangements.
  3. Physical Execution: The system evolves over time, with oscillators pulling on one another based on trained coupling strengths.
  4. Snapshot: At a specified time $T$, the phases of all oscillators are recorded as a latent representation.
  5. Rendering: A conventional decoder (comprising less than 13% of total model parameters) converts these latents into final pixels.

Learnable Parameters

Training focuses on three primary components:

  • The coupling matrix $K$ (how oscillators interact).
  • The natural frequencies $\omega_i$ of each oscillator.
  • The weights of the conventional decoder.

Ablation Analysis: Attributing Computation

To determine if the physical dynamics are performing actual computation or if the decoder is doing the heavy lifting, Unconventional AI performed several ablations:

  • Decoder Only: Training the decoder without any dynamics. This resulted in the poorest performance, showing the decoder struggles to map raw noise to target images.
  • Reservoir: Fixing dynamical weights to random initialization. This improved performance over the decoder-only baseline, suggesting random dynamics provide a more separable input to the decoder.
  • Time Delta: Varying integration steps. Models with learned dynamics and more integration steps (e.g., 10 steps) significantly outperformed both the reservoir and 1-step learned models.

These results indicate that Un-0 computes using nonlinear dynamics, and the trained dynamics are more robust against decreasing model size than random reservoir dynamics.

Dynamics Analysis: Diversity vs. Quality

Analysis of the model's behavior reveals a functional split between the physical substrate and the conventional component:

  • Dynamics for Diversity: The Kuramoto system is responsible for preserving image diversity (recall). Trained networks measurably increase diversity over time as they align with the class manifold, preventing the collapse of diversity seen in untrained reservoirs.
  • Decoder for Quality: The conventional decoder acts as the image quality generator (precision).

Low-dimensional projections (PCA) of the decoder space at $T=1$ show high visual separability between classes, confirming that the objective drives the dynamics to create distinct clusters in a low-dimensional space relative to the effective decoder input dimensionality.

Sources