Efficient Long-Document QA with Chain-of-Structured-Thought

Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs

Summary: arXiv:2603.29232v1 Announce Type: cross

Large language models (LLMs) have gained significant traction in performing data analytics over various documents. However, the direct reasoning capabilities of these models over long and often noisy documents remain a challenge, often resulting in brittle and error-prone outputs. To address this issue, a new approach has been proposed for document question answering (QA) that consolidates dispersed evidence into structured outputs, such as tables, graphs, or organized chunks. This innovation aims to enhance the reliability and verifiability of QA processes.

The LiteCoST Framework

The proposed solution, known as the LiteCoST framework, is built upon two key pillars designed to achieve both high accuracy and low latency while utilizing small language models (SLMs).

Pillar 1: Chain-of-Structured-Thought (CoST)

The first pillar introduces a novel Chain-of-Structured-Thought (CoST) template. This schema-aware instruction guides a robust LLM to generate both a step-wise CoST trace and the corresponding structured output. The advantages of this process include:

Inducing a minimal structure to the data
Normalizing entities and units for consistency
Aligning records to ensure accuracy
Serializing the output for systematic representation
Verifying and refining the output to yield auditable supervision

Pillar 2: SLM Fine-Tuning

The second pillar focuses on the fine-tuning of compact models. This process involves training the SLMs on LLM-generated CoST data through two distinct stages:

Supervised Fine-Tuning: This stage focuses on achieving structural alignment.
Group Relative Policy Optimization (GRPO): This stage incorporates triple rewards aimed at enhancing answer quality, output format, and process consistency.

Performance and Efficiency

By distilling a structure-first behavior into SLMs, the LiteCoST approach achieves quality comparable to that of LLMs in multi-domain long-document QA tasks. Impressively, this is accomplished using models with sizes of only 3B and 7B parameters. Furthermore, the framework demonstrates a significant performance edge, offering 2 to 4 times lower latency compared to existing models such as GPT-4o and DeepSeek-R1, which have considerably larger parameter counts (671B).

Conclusion

The LiteCoST framework represents a significant advancement in the field of document question answering, blending the strengths of structured reasoning with the efficiency of small language models. As the demand for reliable and efficient document analytics continues to grow, innovations like LiteCoST will play a crucial role in shaping the future of AI-powered data processing.

The code for LiteCoST is available at GitHub – HKUSTDial/LiteCoST.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Efficient Long-Document QA with Chain-of-Structured-Thought

Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs

The LiteCoST Framework

Pillar 1: Chain-of-Structured-Thought (CoST)

Pillar 2: SLM Fine-Tuning

Performance and Efficiency

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related