TSHA Benchmark: Enhancing Visual Models for Safety Hazards

TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios

Summary: arXiv:2603.29759v1 Announce Type: cross

In recent years, the integration of vision-language models (VLMs) has gained traction in the field of safety hazard assessment, particularly within indoor environments. Despite these advancements, existing benchmarks for VLMs are hindered by several significant limitations that undermine their practical application. This article introduces TSHA, a new benchmark designed to address these deficiencies and enhance the reliability of VLMs in assessing safety hazards.

Limitations of Existing Benchmarks

Current benchmarks face three primary challenges:

Reliance on Synthetic Datasets: Many benchmarks depend heavily on synthetic datasets generated through simulation software. This reliance creates a substantial domain gap between simulated environments and real-world scenarios, leading to discrepancies in model performance.
Oversimplified Safety Tasks: Existing benchmarks often present safety tasks that are overly simplified, imposing artificial constraints on hazard types and scene configurations. This limits the generalization capabilities of models, rendering them less effective in diverse real-world situations.
Lack of Rigorous Evaluation Protocols: There is a notable absence of stringent evaluation protocols to thoroughly assess the capabilities of VLMs in complex home safety contexts. This gap makes it challenging to gauge the true effectiveness of these models in practical applications.

Introducing TSHA

To overcome these challenges, we present TSHA (Trustworthy Safety Hazards Assessment), a comprehensive benchmark consisting of 81,809 meticulously curated training samples. These samples are sourced from four complementary origins:

Existing indoor datasets that provide a foundational understanding of indoor safety.
Internet images that capture a wide variety of real-world scenarios.
AIGC (Artificially Generated Content) images that simulate complex safety environments.
Newly captured images that reflect current safety conditions and hazards.

In addition to the extensive training set, TSHA includes a rigorously designed test set containing 1,707 samples. This test set features a carefully selected subset from the training distribution, complemented by newly added videos and panoramic images that showcase multiple safety hazards. This design aims to evaluate model robustness in intricate safety scenarios effectively.

Experimental Validation

Extensive experiments conducted on 23 popular VLMs reveal that current models exhibit inadequate capabilities for safety hazard assessment. However, models trained using the TSHA training set demonstrate notable performance improvements. Specifically, these models achieve an impressive performance boost of up to 18.3 points on the TSHA test set.

Moreover, the enhanced models also show improved generalizability across other benchmarks, highlighting the substantial impact and importance of the TSHA benchmark in advancing the field of safety hazard assessment.

Conclusion

TSHA represents a significant step forward in the development of robust visual language models for safety hazard assessment. By addressing the limitations of existing benchmarks and providing a comprehensive evaluation framework, TSHA aims to foster advancements in the reliability and effectiveness of VLMs in real-world safety applications.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

TSHA Benchmark: Enhancing Visual Models for Safety Hazards

TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios

Limitations of Existing Benchmarks

Introducing TSHA

Experimental Validation

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related