Tag: Language Model Safety

Browse our exclusive articles!

ProjLens: Enhancing Safety in Multimodal Models

Discover how ProjLens improves multimodal model safety by exposing backdoor vulnerabilities and boosting robustness in large language models.

Spectral Dynamics of Hallucination in Whisper ASR Models

Explore how spectral dynamics affect hallucinations in Whisper ASR models, revealing safety risks and phase transitions across model scales.

Statute-Centric Legal QA: Structure-Aware Retrieval & Safety

Explore SearchFireSafety, a benchmark enhancing statute-centric legal QA with structure-aware retrieval and safety to reduce hallucination in legal AI mode...

MetaSAEs: Enhancing Atomic Sparse Autoencoder Latents

Discover how MetaSAEs use joint training with decomposability penalties to create more atomic, interpretable sparse autoencoder latents for safer AI models...

GUARD-SLM: Defense Against Jailbreaks in Small Language Models

Discover GUARD-SLM, a token activation-based method protecting small language models from jailbreak attacks while preserving legitimate inputs.

Popular

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.

Fitbit Air Deal on Amazon: 26% Off + Free Band Offer

Get 26% off the new Fitbit Air on Amazon with a free band included. Limited-time offer—boost your fitness with advanced tracking and stylish design.

Subscribe

spot_imgspot_img