Explore how persistent homology reveals adversarial impacts on LLM latent spaces, enhancing AI interpretability and security across model architectures.
Discover how integrating Sparse Autoencoders boosts Large Language Models' robustness, reducing jailbreak attacks by up to 5x without altering weights.