ComputeLibrary: a collection of low-level machine learning functions optimized for Arm hardware

ComputeLibrary: a collection of low-level machine learning functions optimized for Arm hardware

What it solves

Compute Library provides a collection of low-level machine learning functions optimized specifically for Arm hardware. It aims to provide superior performance compared to other open-source alternatives by leveraging micro-architecture optimizations and immediate support for new Arm technologies like SVE2.

How it works

The library implements over 100 ML primitives, including multiple convolution algorithms (such as GeMM, Winograd, FFT, and Direct), and supports a wide range of data types (FP32, FP16, INT8, UINT8, BFLOAT16). It uses advanced optimization techniques like kernel fusion, fast math, and texture utilization, and allows for device-specific tuning via an OpenCL tuner and GeMM optimized heuristics.

Who it’s for

Developers building AI/ML applications for Arm-based systems, including those using Cortex-A, Neoverse, and Mali GPUs, across various operating systems like Android, Linux, and macOS.

Highlights

  • own source software under the MIT license.
  • Optimized for Arm Cortex-A, Neoverse, and Mali GPUs.
  • Support for multiple precision data types including BFLOAT16 and INT8.
  • Highly configurable build options for lightweight binaries.

Sources