Dual-branch Graph Domain Adaptation for Cross-scenario Multi-modal Emotion Recognition
Summary: arXiv:2603.26840v1 Announce Type: cross
Abstract
Multimodal Emotion Recognition in Conversations (MERC) aims to predict speakers’ emotional states in multi-turn dialogues through text, audio, and visual cues. In real-world settings, conversation scenarios differ significantly in speakers, topics, styles, and noise levels. Existing MERC methods generally neglect these cross-scenario variations, limiting their ability to transfer models trained on a source domain to unseen target domains.
Introduction
To address the challenges posed by varying conversation scenarios, we propose a Dual-branch Graph Domain Adaptation framework (DGDA) for multimodal emotion recognition under cross-scenario conditions. Our innovative approach is designed to enhance the model’s ability to generalize across diverse contexts, ultimately improving the accuracy of emotion recognition in varied conversational environments.
Methodology
The DGDA framework comprises several key components:
- Emotion Interaction Graph: We first construct an emotion interaction graph that characterizes complex emotional dependencies among utterances. This graph captures the subtle nuances of emotional exchanges in conversations.
- Dual-branch Encoder: Our dual-branch encoder consists of a Hypergraph Neural Network (HGNN) and a Path Neural Network (PathNN). The HGNN explicitly models multivariate relationships among emotions, while the PathNN captures global dependencies in the dialogue.
- Domain Adversarial Discriminator: To enable out-of-domain generalization, we introduce a domain adversarial discriminator that learns invariant representations across domains. This component ensures that the model remains robust despite variations in the conversation contexts.
- Regularization Loss: To mitigate the impact of noisy labels, we incorporate a regularization loss that suppresses negative influences, allowing for more accurate emotion predictions.
Results
To the best of our knowledge, DGDA is the first MERC framework that jointly addresses domain shift and label noise. The theoretical analysis of the framework provides tighter generalization bounds, showcasing its effectiveness in adapting to new scenarios. Extensive experiments conducted on two benchmark datasets, IEMOCAP and MELD, demonstrate that DGDA consistently outperforms strong baselines, proving its capability in handling cross-scenario conversations.
Conclusion
The findings highlight the importance of addressing domain adaptation and label noise in multimodal emotion recognition tasks. DGDA not only enhances the performance of emotion recognition models but also sets a precedent for future research in adapting machine learning models to varied real-world scenarios. Our code is available for public access at https://github.com/Xudmm1239439/DGDA-Net.
