VE-MD: Privacy-Focused Group Emotion Recognition Model

Variational Encoder–Multi-Decoder (VE-MD) for Privacy-by-functional-design (Group) Emotion Recognition

Summary: arXiv:2604.02397v1 Announce Type: cross

Abstract

Group Emotion Recognition (GER) aims to infer collective affect in social environments such as classrooms, crowds, and public events. Many existing approaches rely on explicit individual-level processing, including cropped faces, person tracking, or per-person feature extraction, which makes the analysis pipeline person-centric and raises privacy concerns in deployment scenarios where only group-level understanding is needed.

This research proposes VE-MD, a Variational Encoder-Multi-Decoder framework for group emotion recognition under a privacy-aware functional design. Rather than providing formal anonymization or cryptographic privacy guarantees, VE-MD is designed to avoid explicit individual monitoring by constraining the model to predict only aggregate group-level affect, without identity recognition or per-person emotion outputs.

Key Features of VE-MD

Joint optimization for emotion classification and internal prediction of body and facial structural representations.
Two structural decoding strategies:
- Transformer-based PersonQuery decoder
- Dense Heatmap decoder accommodating variable group sizes

Research Findings

Experiments conducted on six in-the-wild datasets, including two GER and four Individual Emotion Recognition (IER) benchmarks, indicate that structural supervision significantly enhances representation learning. The results highlight a crucial distinction between GER and IER:

Optimizing the latent space alone is often inadequate for GER as it may diminish interaction-related cues.
Maintaining explicit structural outputs proves beneficial for collective affect inference.
In contrast, projected structural representations effectively serve as a denoising bottleneck for IER.

Performance Metrics

VE-MD achieves state-of-the-art performance on various datasets:

GAF-3.0: Up to 90.06%
VGAF: 82.25% with multimodal fusion including audio
SamSemo: 77.9% (adding text modality)
MER-MULTI: 63.8%
DFEW: 70.7%
EngageNet: 69.0%

Conclusion

The findings underscore the significance of preserving interaction-related structural information for effective group-level affect modeling, all while minimizing reliance on prior individual feature extraction. VE-MD stands as a promising advancement in the field of emotion recognition, ensuring privacy without compromising accuracy.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

VE-MD: Privacy-Focused Group Emotion Recognition Model

Variational Encoder–Multi-Decoder (VE-MD) for Privacy-by-functional-design (Group) Emotion Recognition

Abstract

Key Features of VE-MD

Research Findings

Performance Metrics

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related