Evaluate large language models' reasoning on real-world site selection with LocationReasoner, a benchmark for spatial and logistic decision-making tasks.
Explore how LLMs perform on repository-level question answering with StackRepoQA, highlighting challenges and advancements in multi-file code comprehension...