Tag: agentic AI safety

Browse our exclusive articles!

Jailbreak Attacks on Large Reasoning Models Using Semantic Triggers

Explore novel jailbreak attacks on large reasoning models via semantic triggers and psychological framing, revealing key vulnerabilities and defense needs.

Symbolic Guardrails for Safer Domain-Specific AI Agents

Discover how symbolic guardrails improve safety and security in domain-specific AI agents without compromising their utility or performance.

HarmfulSkillBench: Detecting Dangerous Skills in AI Agents

Discover how HarmfulSkillBench identifies and measures harmful skills in AI agents, enhancing safety in large language model ecosystems.

Subliminal Transfer of Unsafe Behaviors in AI Distillation

Explore how unsafe behaviors subliminally transfer in AI agent distillation, revealing risks beyond explicit data sanitation in model training.

IatroBench: Evidence of AI Safety Risks in Medical Advice

IatroBench reveals AI safety measures causing harm by withholding critical medical info, highlighting risks in AI-generated healthcare guidance.

Popular

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.

Fitbit Air Deal on Amazon: 26% Off + Free Band Offer

Get 26% off the new Fitbit Air on Amazon with a free band included. Limited-time offer—boost your fitness with advanced tracking and stylish design.

Subscribe

spot_imgspot_img