diff-svc: a diffusion-based singing voice conversion system for timbre transfer and pitch correction

diff-svc: a diffusion-based singing voice conversion system for timbre transfer and pitch correction

What it solves

It enables the conversion of a singing voice from one person to another (timbre conversion) while maintaining the original melody and supporting basic pitch correction.

How it works

The project uses diffusion models to transform the input singing voice into a target timbre. It utilizes components like Hubert and ContentVec for audio processing and supports various audio formats and sampling rates (up to 44.1kHz).

Who it’s for

Musicians, audio engineers, and researchers interested in AI-driven singing voice conversion and timbre transfer.

Highlights

  • Supports high-fidelity audio at 44.1kHz.
  • Includes basic pitch correction capabilities.
  • Supports a wide range of input and output audio formats.
  • Features automatic slicing for long audio files during inference.

Sources