audiomentations: a fast and easy-to-use audio data augmentation library for deep learning

What it solves

It provides a way to perform audio data augmentation to help deep learning models for audio perform better in real-world environments rather than just in controlled laboratory settings.

How it works

It is a Python library that allows users to create a pipeline of audio transforms (using a Compose object) to perturb or transform audio data. It runs on the CPU and supports both mono and multichannel audio. It integrates with common training pipelines like PyTorch and TensorFlow/Keras.

Who it’s for

Developers and researchers building audio-based AI models who need to increase the diversity and robustness of their training data.

Highlights

Extensive list of transforms including noise addition (Gaussian, color, background), pitch shifting, time stretching, and room simulation.
API inspired by albumentations for ease of use.
Supports mono and multichannel audio.
Compatible with PyTorch and TensorFlow/Keras training pipelines.

audiomentations: a fast and easy-to-use audio data augmentation library for deep learning

audiomentations: a fast and easy-to-use audio data augmentation library for deep learning

What it solves

How it works

Who it’s for

Highlights

Sources