CuraLight: Debate-Guided Data Curation for LLM-Centered Traffic Signal Control
Summary: arXiv:2604.05663v1 Announce Type: new
Abstract: Traffic signal control (TSC) is a core component of intelligent transportation systems (ITS), aiming to reduce congestion, emissions, and travel time. Recent approaches based on reinforcement learning (RL) and large language models (LLMs) have improved adaptivity, but still suffer from limited interpretability, insufficient interaction data, and weak generalization to heterogeneous intersections.
Introduction
As urban areas continue to grow, the need for efficient traffic signal control has become increasingly critical. Traditional methods often fall short in adapting to real-time traffic conditions, leading to increased congestion and delays. The advent of artificial intelligence, particularly through reinforcement learning (RL) and large language models (LLMs), has opened new avenues for enhancing traffic signal management. However, challenges regarding interpretability and adaptability remain.
The CuraLight Framework
This paper introduces CuraLight, a novel framework designed to integrate RL with LLMs for improved traffic signal control. The framework operates on several key principles:
- RL-Assisted Exploration: An RL agent actively explores various traffic environments, generating high-quality interaction trajectories.
- Imitation Fine-Tuning: These trajectories are converted into prompt-response pairs, facilitating imitation learning to fine-tune the LLM-based traffic signal controller.
- Multi-LLM Ensemble Deliberation: A deliberation system evaluates potential signal timing actions through structured debate, providing preference-aware supervision signals for training.
Methodology
The CuraLight framework utilizes a multi-faceted approach to enhance the effectiveness of traffic signal management. The RL agent is tasked with navigating diverse traffic scenarios, gathering data that reflects real-world conditions. This data is crucial for generating prompt-response pairs that serve as training material for the LLM. The ensemble deliberation system further enhances this process by enabling multiple LLMs to assess and debate the merits of different signal timing strategies.
Results
Extensive experiments were conducted using the Simulation of Urban MObility (SUMO) framework across various real-world networks, including those in Jinan, Hangzhou, and Yizhuang. The results were promising, demonstrating that:
- Average travel time was reduced by 5.34 percent.
- Average queue length decreased by 5.14 percent.
- Average waiting time saw a reduction of 7.02 percent.
These findings underscore the potential of combining RL-assisted exploration with deliberation-based data curation, offering a scalable and interpretable solution for traffic signal control.
Conclusion
CuraLight represents a significant advancement in the realm of intelligent transportation systems. By leveraging the strengths of both reinforcement learning and large language models, this framework addresses critical limitations in current traffic signal control methodologies. The promising results from real-world applications suggest that CuraLight could play a pivotal role in shaping the future of urban traffic management.
