TaCarla: A Comprehensive Benchmarking Dataset for End-to-End Autonomous Driving
The development of high-quality datasets is of paramount importance in the field of autonomous driving.
A meticulous approach is vital, as overlooking certain aspects can lead to datasets that are not usable and
hinder advancements in this rapidly evolving technology. Autonomous driving continues to present significant
research challenges, particularly in enhancing the perception and planning capabilities of vehicles.
Unfortunately, many existing datasets are incomplete, which poses significant limitations to the research community.
For example, datasets that encompass perception information often lack planning data, while those focused on planning
typically consist of extensive driving sequences where the ego vehicle primarily drives forward. This results in a
lack of behavioral diversity, which is crucial for training robust models. Additionally, many real-world datasets
struggle to effectively evaluate their models, particularly for planning tasks, due to the absence of a proper
closed-loop evaluation setup.
The CARLA Leaderboard 2.0 challenge has emerged as a crucial platform, offering a diverse array of scenarios that address
the long-tail problem in autonomous driving. This challenge serves as a valuable alternative for developing models in both
open-loop and closed-loop evaluation environments. However, existing datasets collected on this platform have their own
limitations. Many appear to be designed primarily for specific sensor configurations, which restricts their generalizability.
Introducing TaCarla
In response to these challenges, we have developed a new dataset, TaCarla, which comprises over 2.85 million frames
collected using the CARLA simulation environment. This dataset is specifically designed to support end-to-end
autonomous driving research and is aligned with the diverse scenarios provided by the Leaderboard 2.0 challenge.
- Support for Various Tasks: TaCarla is not limited to planning tasks; it also facilitates dynamic
object detection, lane divider detection, centerline detection, traffic light recognition, and prediction tasks. - Visual Language Action Models: The dataset is versatile enough to support visual language action models,
broadening the scope of research possibilities. - Numerical Rarity Scores: We provide numerical rarity scores that help researchers understand how
frequently certain states occur within the dataset, offering insights into the dataset’s diversity and completeness.
Furthermore, we demonstrate the versatility of TaCarla by training various models using our dataset. The results indicate
that TaCarla not only enhances the performance of perception and planning models but also contributes to a better
understanding of the nuances involved in autonomous driving scenarios.
In conclusion, TaCarla represents a significant advancement in the realm of autonomous driving datasets.
By addressing the limitations of existing datasets and providing a comprehensive framework for both perception and
planning tasks, TaCarla paves the way for further exploration and development in autonomous vehicle technology.
We believe that this dataset will serve as a critical resource for researchers and developers aiming to push the
boundaries of autonomous driving.
