Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
Summary: arXiv:2603.29232v1 Announce Type: cross
Large language models (LLMs) have gained significant traction in performing data analytics over various documents. However, the direct reasoning capabilities of these models over long and often noisy documents remain a challenge, often resulting in brittle and error-prone outputs. To address this issue, a new approach has been proposed for document question answering (QA) that consolidates dispersed evidence into structured outputs, such as tables, graphs, or organized chunks. This innovation aims to enhance the reliability and verifiability of QA processes.
The LiteCoST Framework
The proposed solution, known as the LiteCoST framework, is built upon two key pillars designed to achieve both high accuracy and low latency while utilizing small language models (SLMs).
Pillar 1: Chain-of-Structured-Thought (CoST)
The first pillar introduces a novel Chain-of-Structured-Thought (CoST) template. This schema-aware instruction guides a robust LLM to generate both a step-wise CoST trace and the corresponding structured output. The advantages of this process include:
- Inducing a minimal structure to the data
- Normalizing entities and units for consistency
- Aligning records to ensure accuracy
- Serializing the output for systematic representation
- Verifying and refining the output to yield auditable supervision
Pillar 2: SLM Fine-Tuning
The second pillar focuses on the fine-tuning of compact models. This process involves training the SLMs on LLM-generated CoST data through two distinct stages:
- Supervised Fine-Tuning: This stage focuses on achieving structural alignment.
- Group Relative Policy Optimization (GRPO): This stage incorporates triple rewards aimed at enhancing answer quality, output format, and process consistency.
Performance and Efficiency
By distilling a structure-first behavior into SLMs, the LiteCoST approach achieves quality comparable to that of LLMs in multi-domain long-document QA tasks. Impressively, this is accomplished using models with sizes of only 3B and 7B parameters. Furthermore, the framework demonstrates a significant performance edge, offering 2 to 4 times lower latency compared to existing models such as GPT-4o and DeepSeek-R1, which have considerably larger parameter counts (671B).
Conclusion
The LiteCoST framework represents a significant advancement in the field of document question answering, blending the strengths of structured reasoning with the efficiency of small language models. As the demand for reliable and efficient document analytics continues to grow, innovations like LiteCoST will play a crucial role in shaping the future of AI-powered data processing.
The code for LiteCoST is available at GitHub – HKUSTDial/LiteCoST.
