Two of my papers were accepted in TENCON 2024. Here is the list of titles:
- Multi-label Emotion Share Regression From Speech Using Pre-Trained Self-Supervised Learning Models
- Evaluating Hyperparameter Optimization for Machinery Anomalous Sound Detection
The first paper talks about emotion (share) recognition, meaning how to predict more than a single emotion from utterance. It differs from general speech emotion recognition (SER), although we can select n highest probabilities from SER. In the former, the total share should be 1 (or 100%). In the latter, the probabilities of each emotion category are independent, i.e., each could have 0.85 and 0.75 of probabilities. Usually, the highest probability is selected.
Here is a more detailed example.
Emotion (share) recognition
Angry: 0.54
Fear: 0.43
Other: 0.03
Speech emotion recognition
Angry: 0.64
Fear: 0.53
Sad: 0.23
In the first, the sum up of all probabilities is 1; this is not the case for the second approach (SER).
In the second article, I optimized anomalous machine sound detection via Optuna. The results on two different databases show different values of optimal parameters; however, the top three parameters to optimize remain the same (learning rate, patience, and type of loss function).
See you in Singapore, inshallah!