Cognition-Inspired Dual-Stream Semantic Enhancement for Vision-Based Dynamic Emotion Modeling
Summary: arXiv:2604.12777v1 Announce Type: cross
Abstract
The human brain constructs emotional percepts not by processing facial expressions in isolation, but through a dynamic, hierarchical integration of sensory input with semantic and contextual knowledge. However, existing vision-based dynamic emotion modeling approaches often neglect emotion perception and cognitive theories. To bridge this gap between machine and human emotion perception, we propose cognition-inspired Dual-stream Semantic Enhancement (DuSE).
Introduction
In recent years, advancements in artificial intelligence have led to significant progress in dynamic emotion modeling. However, many of these approaches lack the depth of understanding of human cognitive processes. DuSE aims to fill this gap by incorporating cognitive theories into the architecture of emotion recognition systems.
Model Architecture
The DuSE model is designed around a dual-stream cognitive architecture that enhances the processing of emotional information. The two streams work in tandem to simulate the complex cognitive processes involved in emotion recognition.
-
Hierarchical Temporal Prompt Cluster (HTPC):
The first stream operationalizes the cognitive priming effect. It simulates how linguistic cues pre-sensitize neural pathways, modulating the processing of incoming visual stimuli. This stream aligns textual semantics with fine-grained temporal features of facial dynamics.
-
Latent Semantic Emotion Aggregator (LSEA):
The second stream models the knowledge integration process, akin to the Conceptual Act Theory. It aggregates sensory inputs and synthesizes them with learned conceptual knowledge, mimicking the role of the hippocampus and default mode network in constructing coherent emotional experiences.
Neuro-Cognitive Mechanisms
By explicitly modeling these neuro-cognitive mechanisms, DuSE provides a more neurally plausible and robust framework for dynamic facial expression recognition (DFER). This approach not only enhances the accuracy of emotion detection but also improves the interpretability of the model.
Experimental Validation
Extensive experiments conducted on challenging in-the-wild benchmarks validate our cognition-centric approach. The results demonstrate that emulating the brain’s strategies for emotion processing yields state-of-the-art performance in DFER tasks.
Conclusion
The DuSE model presents a significant advancement in the field of emotion modeling by integrating cognitive theories into the design of AI systems. This innovative approach not only enhances the understanding of emotional perception but also paves the way for more sophisticated and human-like emotion recognition technologies.
