ao: a PyTorch-native architecture optimization library for training-to-serving model quantization and sparsity

What it solves

TorchAO provides a native PyTorch library for optimizing AI models to make them faster and more memory-efficient. It addresses the common trade-off between model size and accuracy, allowing users to reduce the memory footprint of large models (like LLMs and diffusion models) and speed up both training and inference without significant quality loss.

How it works

TorchAO implements several architecture optimization techniques:

Quantization: It converts model weights and activations to lower-precision formats (such as int4, int8, and float8), reducing memory usage and increasing throughput.
Quantization-Aware Training (QAT): To prevent accuracy loss during quantization, it allows models to be trained to adapt to the lower precision.
Sparsity: It uses semi-structured 2:4 sparsity to remove redundant weights, further increasing speed.
** uma-native integration**: It works seamlessly with torch.compile() and FSDP2 for high-performance execution across various hardware (CUDA, XPU, CPU, and ARM).

Who it’s for

This library is designed for AI researchers and engineers who need to deploy large-scale models on limited hardware, accelerate pre-training of massive models, or optimize models for edge devices via ExecuTorch.

Highlights

Training Speedups: Pre-training Llama-3.1-70B up to 1.5x faster using float8 training.
Inference Gains: Quantizing Llama-3-8B to int4 can result in 1.89x faster inference and 58% less memory usage.
Broad Integration: Built-in support for Hugging Face Transformers, Diffusers, vLLM, and SGLang.
Memory Efficiency: Includes quantized optimizers (AdamW 4/8-bit) and CPU offloading to reduce VRAM requirements by up to 60%.

ao: a PyTorch-native architecture optimization library for training-to-serving model quantization and sparsity

ao: a PyTorch-native architecture optimization library for training-to-serving model quantization and sparsity

What it solves

How it works

Who it’s for

Highlights

Sources