Studi kasus
Misalkan kita ingin menyimpan dataset berikut (RAVDESS Speech) dalam format json yang berisi file dan labelnya (data speech emotion recognition). Untuk keperluan tersebut kita ingin memisahkan antara data training ('train_meta_data.json') dan data test ('test_meta_data.json'). Skrip berikut memenuhi tujuan tersebut.
import os import glob import json data_dir = '/data/Audio_Speech_Actors_01-24/' files = glob.glob(os.path.join(data_dir, 'Actor_??', '*.wav')) files.sort() data_train = [] data_test = [] for file in files: lab = os.path.basename(file).split('-')[2] if int(file[-6:-4]) < 20: # speaker 1-19 for training data_train.append({ 'path': file, 'label': lab }) else: # speaker 20-24 for test data_test.append({ 'path': file, 'label': lab }) with open("train_meta_data.json", 'w') as f: json.dump(data_train, f) with open("test_meta_data.json", 'w') as f: json.dump(data_test, f)MEMBUKA file JSON
import json filepath = '/data/Audio_Speech_Actors_01-24/train_meta_data.json' with open(filepath, 'r') as f: data_train = json.load(f)