Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Summary: arXiv:2405.00181v3 Announce Type: replace-cross
Abstract: Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos, thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on anomaly detection and localization, our focus is on more practicality, prompting us to raise the following crucial questions: “what anomaly occurred?”, “why did it happen?”, and “how severe is this abnormal event?”. In pursuit of these answers, we present a comprehensive benchmark for Causation Understanding of Video Anomaly (CUVA).
Introduction to Causation Understanding of Video Anomaly (CUVA)
Video anomaly understanding has emerged as a vital area of research aimed at automatically interpreting unusual events within video sequences. This capability is particularly important in fields like traffic surveillance, security monitoring, and quality control in industrial settings. However, conventional benchmarks have typically emphasized the detection and localization of anomalies without delving into the underlying causes and consequences of these events.
Key Questions Addressed by CUVA
Our benchmark seeks to address three fundamental questions that are crucial for a deeper understanding of video anomalies:
- What anomaly occurred? – Identifying the type of anomaly, its start and end times, as well as descriptive event characteristics.
- Why did it happen? – Providing natural language explanations that elucidate the causal factors behind the anomaly.
- How severe is this abnormal event? – Offering qualitative assessments of the impact or consequences stemming from the anomaly.
Benchmark Structure and Annotations
The CUVA benchmark comprises three distinct sets of human annotations for each video anomaly instance:
- Anomaly type, start and end times, along with event descriptions.
- Natural language explanations detailing the cause of the anomaly.
- Free text reflecting the effects or ramifications of the abnormality.
Introduction of MMEval: A Novel Evaluation Metric
To facilitate a more accurate assessment of the understanding of these anomalies, we introduce MMEval, a novel evaluation metric designed to align closely with human preferences for CUVA. This metric allows researchers to measure the effectiveness of existing large language models (LLMs) in grasping the causal relationships and effects associated with video anomalies.
Proposed Methodology and Experiments
In addition to the benchmark and evaluation metric, we propose a prompt-based methodology that serves as a baseline approach for tackling the CUVA challenges. Our extensive experiments demonstrate the superiority of our evaluation metric and prompt-based approach, showcasing their efficacy in enhancing the understanding of video anomalies.
Access to Resources
For researchers and practitioners interested in exploring our work further, the code and dataset associated with CUVA are publicly available at https://github.com/fesvhtr/CUVA.
Conclusion
The Causation Understanding of Video Anomaly (CUVA) benchmark represents a significant advancement in the field of video anomaly understanding. By addressing not only the occurrence of anomalies but also their causes and effects, we pave the way for more sophisticated applications and better interpretability in automated systems.
