Thursday, December 23, 2021

New Paper: Speech Naturalness Recognition


This study proposes an automatic naturalness recognition from an acted dialogue. The problem can be stated that: given speech utterances with their naturalness labels, is it possible to recognize these labels automatically? By what methods? And how to evaluate these methods? We evaluated two supervised classifiers to investigate the possibility of recognizing naturalness automatically in acted speech: long short-term memory and multilayer perceptron neural networks. These classifiers accept inputs in the form of acoustic features from a speech dataset. Two kinds of acoustic features were evaluated: low-level and high-level features. This initial study on automatic naturalness recognition of speech resulted in a moderate performance of the assessed systems. We measured the performance in concordance correlation coefficients, Pearson correlation coefficients, and root mean square errors. This study opens a potential application of speech processing techniques for measuring naturalness in acted dialogue, which benefits for drama- or movie-making in the future.

Illustration (of potential application):

(best) Result

Metric: concordance + Pearson correlation coefficient (CCC, PCC), [root] mean square error ([R]MSE)
Method; Multilayer perceptron (MLP) with high-level statistical functions (HSF)
Interpretation: intermediate result (CCC)

Full paper + Code

Related Posts Plugin for WordPress, Blogger...