Explore how persistent homology reveals adversarial impacts on LLM latent spaces, enhancing AI interpretability and security across model architectures.
Discover how integrating Sparse Autoencoders boosts Large Language Models' robustness, reducing jailbreak attacks by up to 5x without altering weights.
Explore how LLM tutors resist answer leakage under adversarial student attacks and discover effective defense strategies to protect educational integrity.