OmniVoice-Studio: a local open-source voice studio for zero-shot cloning, video dubbing, and multi-engine speech synthesis
OmniVoice-Studio: a local open-source voice studio for zero-shot cloning, video dubbing, and multi-engine speech synthesis
What it solves
OmniVoice Studio is a fully local, open-source alternative to cloud-based voice AI services like ElevenLabs. It eliminates monthly subscriptions, API keys, and privacy concerns by running all processing on the user's own hardware, providing professional-grade voice cloning, text-to-speech (TTS), and audio manipulation tools without data leaving the machine.
How it works
The project uses a React frontend and a FastAPI backend to orchestrate a wide array of local AI models. It features a multi-engine architecture that allows users to switch between 11 different TTS engines (such as OmniVoice, CosyVoice 3, and GPT-SoVITS) and 9 ASR (automatic speech recognition) engines (including WhisperX and Faster-Whisper). The system automatically detects hardware acceleration (CUDA, MPS, ROCm) and manages VRAM offloading to ensure compatibility across different GPU and CPU configurations.
Who it’s for
It is designed for content creators, audiobook producers, and developers who need high-quality voice synthesis and transcription tools locally. It is particularly useful for those who want to avoid recurring costs, maintain total data privacy, and work with a vast range of languages (up to 646).
Highlights
- Zero-Shot Voice Cloning: Mirror any voice using a 3-second audio clip.
- Cinematic Video Dubbing: A full pipeline that transcribes, translates, and re-voices videos from files or YouTube URLs.
- Audiobook & Story Editor: Tools to import EPUB/PDFs, assign multiple voices per line, and export as .m4b files.
- Real-time Dictation: A system-wide widget that transcribes speech to text and auto-pastes it into any application.
- Vocal Isolation: Powered by Demucs to split speech from background music.
- Speaker Diarization: Automatically identifies different speakers in an audio track using Pyannote and WhisperX.
- MCP Server: Integration allowing the tool to be controlled via Claude, Cursor, or other MCP clients.
Sources
- undefineddebpalash/OmniVoice-Studio