bagustris@/home: Basic Audio Manipulation With Torchaudio

Friday, April 22, 2022

Basic Audio Manipulation With Torchaudio

Recently, I moved my audio processing toolkit from librosa (and others) to Torchaudio. This short writing documented the very basics of torchaudio for audio manipulation: read, resample, and write an audiofile.

Load audio file (read)

The process of loading (reading) an audio file is straightforward, just pass the audio path to `torchaudio.load`. We need to import the needed modules first. Most audio files can be loaded by torchaudio (WAV, OGG, MP3, etc.).

import torchaudio
import torchaudio.transforms as T
wav0, sr0 = torchaudio.load("old_file_48k.wav", normalize=True)

where wav0 is the output tensor (array) and sr0 is the original sampling rate. Argument `normalize=True` is optional to normalize the waveform. Note that one of my colleagues (a student) found that using `librosa.util.normalize()` resulted in better normalization (peak to peak waveform is -1 to 1) than this torchaudio normalization.

Resample

Resample a sampling rate to another sampling rate is done by a Class; the output is a function. Hence, we need to pass the old tensor to the resampler function. Here is an example to convert 48k tensor to 16k tensor.

sr1 = 16000
resampler = T.Resample(sr0, sr1)
wav1 = resampler(wav0)

Save as a new audio file (write)

The process of saving files is also straightforward, just pass the file name, tensor, and sampling rate in order.

torchaudio.save('new_file_16k.wav', wav1, sr1)

Then the new audio file appeared in the current directory. Just set the path and file name if you want to save it in another directory.

Reference:

[1] https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html