XNNPACK: a low-level acceleration library providing optimized neural network primitives for cross-platform inference

XNNPACK: a low-level acceleration library providing optimized neural network primitives for cross-platform inference

What it solves

XNNPACK addresses the need for high-performance neural network inference on a wide variety of hardware platforms, particularly mobile and edge devices. It provides the low-level mathematical primitives required to run AI models efficiently without requiring the user to write architecture-specific assembly code.

How it works

It functions as a low-level acceleration library that implements a comprehensive set of neural network operators (such as 2D convolutions, pooling, and various activation functions). These operators are highly optimized for specific CPU architectures, including ARM, x86, RISC-V, and WebAssembly. Instead of being used directly by researchers, it is integrated into high-level frameworks like TensorFlow Lite and PyTorch to speed up their execution.

Who it’s for

It is designed for developers of machine learning frameworks and runtime engines who need to optimize inference performance across diverse hardware targets.

Highlights

  • Broad Hardware Support: Optimized for ARM64, ARMv7, ARMv6, x86/x86-64 (up to AVX512), RISC-V, WebAssembly, and Hexagon.
  • Extensive Operator Library: Supports a vast array of operations including grouped/depthwise convolutions, bilinear resize, and various quantization/dequantization converts.
  • Flexible Memory Layout: Supports NHWC layout with custom strides along the channel dimension for zero-cost channel splitting and concatenation.
  • Framework Integration: Powering inference in major tools like TensorFlow Lite, PyTorch Mobile, ONNX Runtime, and MediaPipe.

Sources