SenBen: Sensitive Scene Graphs for Explainable Content Moderation
The rise of digital content has necessitated the implementation of effective content moderation systems to classify images as safe or unsafe. However, existing systems often lack spatial grounding and interpretability, failing to provide clear explanations of detected sensitive behavior, the individuals involved, or the locations where such behaviors occur. To address these challenges, researchers have introduced the Sensitive Benchmark (SenBen), the first large-scale scene graph benchmark specifically designed for sensitive content.
The SenBen benchmark comprises 13,999 frames extracted from 157 movies, all meticulously annotated with Visual Genome-style scene graphs. These annotations include 25 object classes and 28 attributes that capture affective states such as pain, fear, aggression, and distress. Additionally, the benchmark features 16 sensitivity tags categorized into five distinct groups, allowing for a nuanced understanding of sensitive content.
Innovative Model Development
In a significant advancement for content moderation technology, researchers have distilled a frontier Vision and Language Model (VLM) into a compact student model comprising 241 million parameters. This model leverages a multi-task training recipe that effectively addresses vocabulary imbalance in autoregressive scene graph generation. Key innovations include:
- Suffix-based Object Identity: This technique enhances the model’s ability to recognize and categorize objects within complex scenes.
- Vocabulary-Aware Recall (VAR) Loss: VAR Loss improves the model’s recall rates by accounting for vocabulary discrepancies between training data and real-world applications.
- Decoupled Query2Label Tag Head: This method utilizes asymmetric loss to refine the tagging process, ensuring that sensitivity tags are accurately assigned to objects.
These innovations have led to a remarkable +6.4 percentage point improvement in SenBen Recall compared to traditional cross-entropy training methods.
Performance and Results
On grounded scene graph metrics, the new student model outshines all evaluated VLMs, with the exception of Gemini models, and surpasses all commercial safety APIs. Notably, it achieves the highest object detection and captioning scores among all models tested. Furthermore, the model operates at an impressive speed, providing $7.6\times$ faster inference while consuming $16\times$ less GPU memory than its predecessors.
These advancements underscore the potential of the SenBen benchmark and the associated model to revolutionize content moderation practices. By offering greater interpretability and spatial grounding, the SenBen framework not only enhances the technical capabilities of moderation systems but also fosters trust and transparency in automated content evaluations.
Conclusion
The introduction of SenBen represents a significant leap forward in the field of explainable AI for content moderation. As online platforms continue to grapple with the challenges of sensitive content, tools like SenBen will be crucial in ensuring that moderation systems are both effective and understandable. The ongoing research in this domain promises to yield even more refined models and benchmarks in the future, paving the way for safer digital environments.
