sherpa-onnx: a highly portable local audio AI framework supporting speech recognition and synthesis across diverse hardware and languages
sherpa-onnx: a highly portable local audio AI framework supporting speech recognition and synthesis across diverse hardware and languages
What it solves
Sherpa-onnx provides a highly portable and efficient way to run various speech and audio processing tasks locally. It eliminates the need for complex cloud dependencies by allowing users to deploy AI-driven audio functions—such as speech-to-text and text-to-speech—across a vast array of hardware platforms and programming languages.
How it works
The project leverages the ONNX Runtime to execute pre-trained models locally on the device. It provides a unified set of APIs for multiple programming languages, allowing developers to integrate speech functions into their applications without needing to manage the low-level details of machine learning frameworks.
Who it’s for
Developers building audio-enabled applications for mobile (Android, iOS), desktop (Windows, macOS, Linux), embedded systems (Raspberry Pi, Jetson, RISC-V), and web browsers (WebAssembly), who require local, offline processing for privacy or performance.
Highlights
- Comprehensive Audio Suite: Supports speech recognition (ASR), speech synthesis (TTS), speaker diarization, speaker identification, verification, audio tagging, voice activity detection (VAD), speech enhancement, and source separation.
- Extreme Portability: Compatible with a wide range of architectures (x64, x86, ARM, RISC-V) and operating systems, including HarmonyOS and openKylin.
- Broad Language Support: Provides APIs for C++, C, Python, Go, C#, Java, Kotlin, JavaScript, Swift, Rust, Dart, and Object Pascal.
- Hardware Acceleration: Supports various NPUs (Rockchip, Qualcomm, Ascend, Axera) and NVIDIA Jetson GPUs for optimized performance.
Sources
- undefinedk2-fsa/sherpa-onnx