Discover CoVUBench, the first benchmark for evaluating copyright unlearning in large vision-language models, balancing legal compliance and model utility.
Discover how the Reward Hacking Benchmark evaluates exploit risks in RL-trained LLM agents using tools, revealing vulnerabilities and mitigation strategies...