audiomentations: a fast and easy-to-use audio data augmentation library for deep learning
audiomentations: a fast and easy-to-use audio data augmentation library for deep learning
What it solves
It provides a way to perform audio data augmentation to help deep learning models for audio perform better in real-world environments rather than just in controlled laboratory settings.
How it works
It is a Python library that allows users to create a pipeline of audio transforms (using a Compose object) to perturb or transform audio data. It runs on the CPU and supports both mono and multichannel audio. It integrates with common training pipelines like PyTorch and TensorFlow/Keras.
Who it’s for
Developers and researchers building audio-based AI models who need to increase the diversity and robustness of their training data.
Highlights
- Extensive list of transforms including noise addition (Gaussian, color, background), pitch shifting, time stretching, and room simulation.
- API inspired by albumentations for ease of use.
- Supports mono and multichannel audio.
- Compatible with PyTorch and TensorFlow/Keras training pipelines.
Sources
- undefinediver56/audiomentations