A-MBER: Affective Memory Benchmark for Emotion Recognition
In recent advancements in artificial intelligence, the ability of AI assistants to interact with users over time has gained significant attention. A critical aspect of these interactions is the assistant’s capability to interpret the user’s current emotional state to respond appropriately and personally. However, this emotive capability remains insufficiently evaluated in existing models and benchmarks.
Current emotion datasets primarily focus on assessing local or instantaneous affect, while long-term memory benchmarks predominantly emphasize factual recall, temporal consistency, or knowledge updating. This narrow focus results in limited resources that fail to adequately test whether a model can utilize remembered interaction history to interpret a user’s present affective state.
To address this gap, researchers have introduced A-MBER, an Affective Memory Benchmark for Emotion Recognition. A-MBER is designed to evaluate a model’s capacity for present affective interpretation, grounded in remembered multi-session interaction history.
Key Features of A-MBER
- Interaction Trajectory Analysis: Given a specific interaction trajectory and a designated anchor turn, the model must infer the user’s current affective state.
- Historical Relevance Identification: The model is tasked with identifying historically relevant evidence that supports its interpretation of the user’s emotions.
- Justification Mechanism: A-MBER requires the model to justify its emotional interpretation in a grounded manner, enhancing the reliability of AI responses.
The benchmark is constructed through a comprehensive staged pipeline, incorporating explicit intermediate representations. This includes long-horizon planning, conversation generation, annotation, question construction, and final packaging. A-MBER supports various tasks, including judgment, retrieval, and explanation, along with robustness settings such as modality degradation and conditions with insufficient evidence.
Experimental Insights
Experiments conducted using A-MBER compare several contexts: local-context, long-context, retrieved-memory, structured-memory, and gold-evidence conditions within a unified framework. The results indicate that A-MBER is particularly discriminative on its designed subsets, which stress aspects like long-range implicit affect, high-dependency memory levels, trajectory-based reasoning, and adversarial settings.
These findings suggest that memory plays a vital role in affective interpretation, not merely by providing more historical data but by enabling a more selective, grounded, and context-sensitive application of past interactions. As AI technology continues to evolve, benchmarks like A-MBER are crucial for fostering the development of emotionally intelligent AI systems capable of meaningful user engagement.
Conclusion
The introduction of A-MBER represents a significant step forward in the evaluation of emotion recognition capabilities in AI systems. By focusing on the integration of long-term memory with affective interpretation, A-MBER paves the way for more responsive and empathetic AI assistants that can better understand and cater to user emotions.
