Conversational Emotion Recognition: Joint Speaker and Emotion Diarization in Conversations

Olorundamilola Kazeem, Johns Hopkins University

Photo of Olorundamilola Kazeem

Conversational emotion recognition (CER) is a subfield of automatic speech emotion recognition (ASR), and is a highly active area of research towards endowing machines the ability to comprehend and communicate with emotion. This area of research has extensive affective computing applications across various sectors and industries (i.e. from cybersecurity to healthcare; and further onto computa- tional storytelling for education and entertainment. For all these applications, it is important not just to understand the speech content channel (i.e. “what is being said”), but also the emotional context channel (i.e. “how it is being said”). This research aims to develop novel transformer-based neural network models to determine and diarize “what was felt when” for a given speaker and “who felt what and when” amongst two or more speakers in spontaneous conversational speech scenarios.

Abstract Author(s): Olorundamilola "Dami" Kazeem, Raghavendra Pappagari, Jesus Villalba Lopez,Laureano Moro-Velazquez, Najim Dehak