Discover AnyPoC, a framework generating executable PoC tests for scalable, automated bug detection using large language models, improving software reliabil...
Explore GBQA, a benchmark evaluating large language models' ability to detect software bugs in games, highlighting current AI challenges in QA engineering.