burn: a unified Rust deep learning framework for seamless training and inference across diverse hardware

burn: a unified Rust deep learning framework for seamless training and inference across diverse hardware

What it solves

Burn addresses the fragmentation between AI model training and production deployment. Traditionally, models are trained in Python and then exported to formats like ONNX or optimized for specific engines (e.g., vLLM, TensorRT), a process that is often brittle and lossy. Burn unifies this by providing a single codebase in Rust for both training and inference, enabling seamless transitions and supporting advanced use cases like on-device personalization and federated learning.

How it works

Burn is a tensor library and deep learning framework that uses a unified API for multi-platform tensor operations. It combines the intuitive ergonomics of PyTorch (dynamic shapes and graphs) with the performance of JIT-compilation and automatic kernel fusion.

Its architecture is based on a Backend trait, allowing it to be generic over different compute engines. It uses "backend decorators" to add functionality:

  • Autodiff: Adds backpropagation capabilities to any base backend.
  • Fusion: Enables kernel fusion for accelerated backends to improve performance.
  • Remote: Allows tensor operations to be executed on a remote server for distributed computing.

Who it’s for

  • AI Researchers: Who want a Python-like feedback loop (fast incremental compilation) but with the safety and speed of Rust.
  • ML Engineers: Who need to deploy models across diverse hardware (from embedded no_std devices to large GPU clusters) without rewriting code.
  • Developers: Looking for a fully open-source Rust-based AI ecosystem.

Highlights

  • Unified Workflow: Use the exact same code for training and production inference.
  • Broad Hardware Support: Supports CUDA, ROCm, Metal, Vulkan, WebGPU, and CPU (including no_std for bare metal).
  • Interoperability: Import models from ONNX, PyTorch, or Safetensors.
  • Developer Experience: Includes a built-in terminal UI dashboard for real-time training monitoring.
  • Web-Ready: Capable of running inference directly in the browser via WebAssembly and WebGPU.

Sources