Two of my papers were accepted in TENCON 2024. Here is the list of title:
- Multi-label Emotion Share Regression From Speech Using Pre-Trained Self-Supervised Learning Models
- Evaluating Hyperparameter Optimization for Machinery Anomalous Sound Detection
The first paper talks about emotion (share) recognition, meaning how to predict more than a single emotion from utterance. It differs from general speech emotion recognition (SER), although we can select n highest probabilities from SER. In the former, the total of share should be 1 (or 100%). In the later, the probabilities of each emotion category is independence, i.e., each could have 0.85 and 0.75 of probabilities. Usually, the highest probabilities is selected.
Here is more details example.
Emotion (share) recognition
Angry: 0.54
Fear: 0.43
Other: 0.03
Speech emotion recognition
Angry: 0.64
Fear: 0.53
Sad: 0.23
In the first, the sum up of all probabilities is 1, this is not the case for the second approach (SER).
In the second article, I optimized anomalous machine sound detection via Optuna. The results on two different databases shows different value of optimal parameters; however, top three parameters to optimize remains same (learning rate, patiences, type of loss function).