Skip to Main content Skip to Navigation
Conference papers

Multi-corpus Experiment on Continuous Speech Emotion Recognition: Convolution or Recurrence?

Abstract : Extraction of semantic information from real-life speech, such as emotions, is a challenging task that has grown in popularity over the last few years. Recently, emotion processing in speech moved from discrete emotional categories to continuous affective dimensions. This trend helps in the design of systems that predict the dynamic evolution of affect in speech. However, no standard annotation guidelines exist for these dimensions thus making cross-corpus studies hard to achieve. Deep neural networks are nowadays predominant in the task of emotion recognition. Almost all systems use recurrent architectures, but convolutional networks were recently reassessed as they are faster to train and have less parameters than recurrent ones. This paper aims at investigating pros and cons of the aforementioned architectures using cross-corpus experiments to highlight the issue of corpus variability. We also explore the best suitable acoustic representation for continuous emotion, together with loss functions. We concluded that recurrent networks are robust to corpus variability and we confirm the power of cepstral features for continuous Speech Emotion Recognition(SER), especially for satisfaction prediction. A final post-treatment applied on prediction brings very nice result (ccc = 0.719) on AlloSat and achieves new state of the art.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-02945644
Contributor : Marie Tahon <>
Submitted on : Monday, December 7, 2020 - 10:06:59 AM
Last modification on : Monday, December 14, 2020 - 7:46:27 AM
Long-term archiving on: : Monday, March 8, 2021 - 6:25:37 PM

File

SPECOM(1).pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02945644, version 1

Collections

Citation

Manon Macary, Martin Lebourdais, Marie Tahon, Yannick Estève, Anthony Rousseau. Multi-corpus Experiment on Continuous Speech Emotion Recognition: Convolution or Recurrence?. 22ND INTERNATIONAL CONFERENCE ON SPEECH AND COMPUTER SPECOM 2020, Oct 2020, St Petersburg, Russia. ⟨hal-02945644⟩

Share

Metrics

Record views

123

Files downloads

80