Discover how the Reward Hacking Benchmark evaluates exploit risks in RL-trained LLM agents using tools, revealing vulnerabilities and mitigation strategies...
Discover how OpenSeeker-v2 uses informative, high-difficulty trajectories to train state-of-the-art search agents with simpler methods and top benchmark re...