KLong: Advanced LLM Agent for Long-Horizon Tasks

KLong: Training LLM Agent for Extremely Long-horizon Tasks

In the constantly evolving landscape of artificial intelligence, researchers are continually seeking innovative ways to enhance the capabilities of language models. The recent introduction of KLong, an open-source large language model (LLM) agent, signifies a groundbreaking advancement in the ability to tackle extremely long-horizon tasks. This development is detailed in the paper titled “KLong: Training LLM Agent for Extremely Long-horizon Tasks” (arXiv:2602.17547v3).

Overview of KLong

KLong is designed to address the limitations of existing models when dealing with tasks that require extended reasoning and planning over long periods. The training process for KLong involves a two-step methodology that effectively equips the model to handle complex challenges.

Cold-start with Trajectory-splitting SFT: Initially, KLong activates the base model’s basic agentic abilities through a comprehensive Supervised Fine-Tuning (SFT) recipe. This cold-start phase is crucial for establishing a foundational understanding before progressing to more complex tasks.
Progressive Reinforcement Learning (RL): Following the initial training, KLong undergoes progressive RL training, which aims to scale the model’s capabilities. This phase involves scheduling training into various stages, each with progressively extended timeouts, allowing the model to refine its decision-making skills over longer time horizons.

Research-Factory: Automated Data Generation

Central to KLong’s training process is an innovative tool known as Research-Factory. This automated pipeline is responsible for generating high-quality training data by gathering research papers and constructing evaluation rubrics. The significance of this tool lies in its ability to create thousands of long-horizon trajectories distilled from Claude 4.5 Sonnet (Thinking).

The trajectories are meticulously designed to preserve early context while progressively truncating later context. This approach ensures that the model retains essential information from the beginning of the task, facilitating better decision-making as it navigates through complex challenges.

Performance and Results

Initial experiments have demonstrated KLong’s superior performance and generalization capabilities. Notably, KLong (106B) surpassed the previously established Kimi K2 Thinking (1T) by an impressive 11.28% on the PaperBench benchmark. Moreover, the performance improvements observed with KLong extend to other coding benchmarks, including SWE-bench Verified and MLE-bench, showcasing its versatility and applicability across different domains.

Conclusion

The development of KLong represents a significant step forward in the training of LLM agents for long-horizon tasks. By integrating advanced methodologies such as trajectory-splitting SFT and progressive RL, KLong not only enhances the model’s performance but also broadens its applicability in real-world scenarios. As researchers continue to explore the potential of this innovative model, the future looks promising for the application of AI in complex problem-solving contexts.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

KLong: Advanced LLM Agent for Long-Horizon Tasks

KLong: Training LLM Agent for Extremely Long-horizon Tasks

Overview of KLong

Research-Factory: Automated Data Generation

Performance and Results

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related