SkillGraph: Graph Foundation Priors for LLM Agent Tool Sequence Recommendation
Summary: arXiv:2604.19793v1 Announce Type: new
Abstract: LLM agents must select tools from large API libraries and order them correctly. Existing methods use semantic similarity for both retrieval and ordering, but ordering depends on inter-tool data dependencies that are absent from tool descriptions. As a result, semantic-only methods can produce negative Kendall-τ in structured workflow domains.
We introduce SkillGraph, a directed weighted execution-transition graph mined from 49,831 successful LLM agent trajectories, which encodes workflow-precedence regularities as a reusable graph foundation prior. Building on this graph foundation prior, we propose a two-stage decoupled framework: GS-Hybrid retrieval for candidate selection and a learned pairwise reranker for ordering.
Methodology Overview
SkillGraph innovatively addresses the challenge of tool selection and ordering in LLM agents. The methodology comprises two key components:
- GS-Hybrid Retrieval: This phase focuses on candidate tool selection from a vast library of APIs based on the graph foundation prior.
- Learned Pairwise Reranker: In this phase, the ordering of selected tools is refined using a pairwise ranking model that improves the overall sequence recommendation.
Performance Metrics
The SkillGraph framework was evaluated on two distinct datasets:
- ToolBench: This dataset comprises 9,965 test instances and approximately 16,000 tools. The GS-Hybrid method achieved a Set-F1 score of 0.271 and a Kendall-τ score of 0.096.
- API-Bank: In this scenario, the Kendall-τ score improved significantly from -0.433 to +0.613, indicating a marked enhancement in ordering accuracy.
Comparative Analysis
Under identical Stage-1 inputs, the learned reranker within the SkillGraph framework demonstrated superior performance compared to existing models, including the widely recognized LLaMA-3.1-8B Stage-2 rerankers. This highlights the effectiveness of a graph-based approach in optimizing tool sequence recommendations for LLM agents.
Conclusion
SkillGraph represents a significant advancement in the field of LLM agent tool selection and ordering. By leveraging a graph foundation prior, the framework not only enhances the accuracy of tool sequence recommendations but also addresses the inherent limitations of semantic similarity methods. Future research may expand on this foundation, exploring additional dimensions of API interactions and further refining the efficiency of LLM agents in real-world applications.
