FluidAudio: a Swift SDK for local audio AI on Apple devices offloading inference to the Apple Neural Engine

What it solves

FluidAudio is a Swift SDK that enables developers to integrate high-performance, fully local audio AI capabilities into macOS and iOS applications. It eliminates the need for cloud-based audio processing, ensuring user privacy and reducing latency by offloading inference to the Apple Neural Engine (ANE).

How it works

The SDK provides a set of optimized CoreML models for various audio tasks. By running inference directly on the ANE, it minimizes CPU and GPU usage, making it is ideal for background processing and always-on workloads. It supports a variety of open-source models (MIT/Apache 2.0) and provides official wrappers for React Native, Expo, and Rust/Tauri.

Who it’s for

It is designed for Apple platform developers building apps that require transcription, text-to-speech, or speaker identification without relying on external servers.

Highlights

Automatic Speech Recognition (ASR): Supports batch and streaming transcription across multiple languages including European languages, Japanese, and Mandarin Chinese.
Text-to-Speech (TTS): Includes parallel synthesis with SSML and pronunciation control, as well as streaming TTS with voice cloning.
Speaker Diarization: Offers both online (real-time) and offline (batch) pipelines for speaker separation and identification.
Voice Activity Detection (VAD): Integrates Silero models for efficient voice detection.
Apple Neural Engine Optimization: Specifically tuned for ANE to maximize performance and minimize power consumption.
Open-Source Models: Uses publicly available models from HuggingFace, optimized for on-device use.

FluidAudio: a Swift SDK for local audio AI on Apple devices offloading inference to the Apple Neural Engine

FluidAudio: a Swift SDK for local audio AI on Apple devices offloading inference to the Apple Neural Engine

What it solves

How it works

Who it’s for

Highlights

Sources