Studi kasus
Misalkan kita ingin menyimpan dataset berikut (RAVDESS Speech) dalam format json yang berisi file dan labelnya (data speech emotion recognition). Untuk keperluan tersebut kita ingin memisahkan antara data training ('train_meta_data.json') dan data test ('test_meta_data.json'). Skrip berikut memenuhi tujuan tersebut.
import os
import glob
import json
data_dir = '/data/Audio_Speech_Actors_01-24/'
files = glob.glob(os.path.join(data_dir, 'Actor_??', '*.wav'))
files.sort()
data_train = []
data_test = []
for file in files:
lab = os.path.basename(file).split('-')[2]
if int(file[-6:-4]) < 20: # speaker 1-19 for training
data_train.append({
'path': file,
'label': lab
})
else: # speaker 20-24 for test
data_test.append({
'path': file,
'label': lab
})
with open("train_meta_data.json", 'w') as f:
json.dump(data_train, f)
with open("test_meta_data.json", 'w') as f:
json.dump(data_test, f)
MEMBUKA file JSON
import json
filepath = '/data/Audio_Speech_Actors_01-24/train_meta_data.json'
with open(filepath, 'r') as f:
data_train = json.load(f)