PaveBench: A Versatile Benchmark for Pavement Distress Perception and Interactive Vision-Language Analysis
Pavement condition assessment is vital for ensuring road safety and effective maintenance. While existing research has made considerable advancements in this field, many studies have primarily concentrated on conventional computer vision tasks such as classification, detection, and segmentation. However, real-world pavement inspection demands more than mere visual recognition; it necessitates quantitative analysis, detailed explanations, and interactive decision support. Unfortunately, current datasets are limited in their capabilities, focusing heavily on unimodal perception and lacking support for multi-turn interactions and fact-grounded reasoning. Furthermore, they do not effectively connect perception with vision-language analysis.
To address these shortcomings, researchers have introduced PaveBench, a comprehensive benchmark designed specifically for pavement distress perception and interactive vision-language analysis based on real-world highway inspection images. PaveBench is structured around four core tasks that provide a holistic approach to pavement assessment:
- Classification: Identifying various types of pavement distress.
- Object Detection: Locating and identifying specific distress instances within images.
- Semantic Segmentation: Dividing images into segments that correspond to different types of pavement conditions.
- Vision-Language Question Answering: Enabling interactive dialogue based on the visual data.
PaveBench offers unified task definitions and evaluation protocols that facilitate systematic assessments of pavement conditions. On the visual front, it provides extensive annotations and includes a curated hard-distractor subset for robustness evaluation, ensuring that the models trained on PaveBench can withstand real-world challenges. The dataset boasts a vast collection of real-world pavement images, making it an invaluable resource for researchers and practitioners alike.
In addition to visual tasks, PaveBench introduces PaveVQA, a novel real-image question answering (QA) dataset. PaveVQA supports various interaction styles, including single-turn, multi-turn, and expert-corrected interactions. It encompasses a wide range of tasks, such as recognition, localization, quantitative estimation, and maintenance reasoning.
The research team has evaluated several state-of-the-art methods on this dataset, providing detailed analyses that highlight both strengths and areas for improvement. Moreover, they present a simple yet effective agent-augmented visual question answering framework that integrates domain-specific models as tools alongside vision-language models, enhancing the overall efficacy of the analysis.
The dataset is readily available for public access, enabling further research and development in the field of pavement inspection. Interested parties can find the PaveBench dataset at https://huggingface.co/datasets/MML-Group/PaveBench.
In conclusion, PaveBench represents a significant step forward in the quest for effective pavement distress perception and interactive vision-language analysis. By addressing the limitations of existing datasets and providing a comprehensive framework for research and application, it paves the way for improved road safety and maintenance strategies.
