Recently, I moved my audio processing toolkit from librosa (and others) to Torchaudio. This short writing documented the very basics of torchaudio for audio manipulation: read, resample, and write an audiofile.
Load audio file (read)
The process of loading (reading) an audio file is straightforward, just pass the audio path to `torchaudio.load`. We need to import the needed modules first. Most audio files can be loaded by torchaudio (WAV, OGG, MP3, etc.).import torchaudio import torchaudio.transforms as T wav0, sr0 = torchaudio.load("old_file_48k.wav", normalize=True)where wav0 is the output tensor (array) and sr0 is the original sampling rate. Argument `normalize=True` is optional to normalize the waveform. Note that one of my colleagues (a student) found that using `librosa.util.normalize()` resulted in better normalization (peak to peak waveform is -1 to 1) than this torchaudio normalization.
Resample
Resample a sampling rate to another sampling rate is done by a Class; the output is a function. Hence, we need to pass the old tensor to the resampler function. Here is an example to convert 48k tensor to 16k tensor.sr1 = 16000 resampler = T.Resample(sr0, sr1) wav1 = resampler(wav0)
Save as a new audio file (write)
The process of saving files is also straightforward, just pass the file name, tensor, and sampling rate in order.
torchaudio.save('new_file_16k.wav', wav1, sr1)Then the new audio file appeared in the current directory. Just set the path and file name if you want to save it in another directory.