audiomentations: a fast and easy-to-use audio data augmentation library for deep learning

audiomentations: a fast and easy-to-use audio data augmentation library for deep learning

What it solves

It provides a way to perform audio data augmentation to help deep learning models for audio perform better in real-world environments rather than just in controlled laboratory settings.

How it works

It is a Python library that allows users to create a pipeline of audio transforms (using a Compose object) to perturb or transform audio data. It runs on the CPU and supports both mono and multichannel audio. It integrates with common training pipelines like PyTorch and TensorFlow/Keras.

Who it’s for

Developers and researchers building audio-based AI models who need to increase the diversity and robustness of their training data.

Highlights

  • Extensive list of transforms including noise addition (Gaussian, color, background), pitch shifting, time stretching, and room simulation.
  • API inspired by albumentations for ease of use.
  • Supports mono and multichannel audio.
  • Compatible with PyTorch and TensorFlow/Keras training pipelines.

Sources