Discover TurnGate, a novel defense detecting hidden malicious intent in multi-turn dialogues, enhancing AI safety with precise turn-level intervention.
Discover WARDEN, a dynamic adversarial training framework enhancing large language models' robustness with info-theoretic methods and efficient optimizatio...