whisper.cpp: a high-performance C/C++ implementation of OpenAI's Whisper for offline on-device speech recognition

whisper.cpp: a high-performance C/C++ implementation of OpenAI's Whisper for offline on-device speech recognition

What it solves

whisper.cpp delivers a high-performance, lightweight implementation of OpenAI's Whisper automatic speech recognition (ASR) model. It allows users to run speech-to-text transcription fully offline and on-device, removing the need for heavy dependencies or cloud-based APIs.

How it works

The project is written in plain C/C++ and leverages the ggml machine learning library for inference. It supports a wide range of hardware accelerations to maximize speed, including Apple Silicon (via Metal and Core ML), NVIDIA GPUs (via CUDA), AMD GPUs (via ROCm), Vulkan, OpenVINO, and various CPU intrinsics (AVX, NEON, VSX).

Who it’s for

Developers and users who want to integrate high-quality speech recognition into applications across diverse platforms—including iOS, Android, Windows, Linux, macOS, and WebAssembly—without relying on external servers.

Highlights

  • Zero runtime memory allocations: Optimized for efficiency and speed.
  • Broad hardware support: Native acceleration for almost all major GPU and NPU architectures.
  • Integer quantization: Reduces memory and disk footprint for smaller devices.
  • Cross-platform: Runs on everything from high-end GPUs to Raspberry Pi and mobile phones.
  • Real-time capabilities: Includes examples for continuous microphone input transcription.

Sources