Friday, April 22, 2022

Basic Audio Manipulation With Torchaudio

Recently, I moved my audio processing toolkit from librosa (and others) to Torchaudio. This short writing documented the very basics of torchaudio for audio manipulation: read, resample, and write an audiofile.

Load audio file (read)

The process of loading (reading) an audio file is straightforward, just pass the audio path to `torchaudio.load`. We need to import the needed modules first. Most audio files can be loaded by torchaudio (WAV, OGG, MP3, etc.).
import torchaudio
import torchaudio.transforms as T
wav0, sr0 = torchaudio.load("old_file_48k.wav", normalize=True) 
where wav0 is the output tensor (array) and sr0 is the original sampling rate. Argument `normalize=True` is optional to normalize the waveform. Note that one of my colleagues (a student) found that using `librosa.util.normalize()` resulted in better normalization (peak to peak waveform is -1 to 1) than this torchaudio normalization.


Resample a sampling rate to another sampling rate is done by a Class; the output is a function. Hence, we need to pass the old tensor to the resampler function. Here is an example to convert 48k tensor to 16k tensor.
sr1 = 16000
resampler = T.Resample(sr0, sr1)
wav1 = resampler(wav0)

Save as a new audio file (write)

The process of saving files is also straightforward, just pass the file name, tensor, and sampling rate in order.'new_file_16k.wav', wav1, sr1)
Then the new audio file appeared in the current directory. Just set the path and file name if you want to save it in another directory.


Related Posts Plugin for WordPress, Blogger...