Tag: LLM evaluation

Browse our exclusive articles!

Contrastive Decoding Reduces Score Bias in LLM Evaluations

Learn how contrastive decoding improves Large Language Models' scoring accuracy by reducing score range bias, boosting reliability in LLM evaluations by 11...

ATBench: Realistic Agent Trajectory Benchmark for AI Safety

Discover ATBench, a diverse benchmark for evaluating AI agent safety through realistic multi-step trajectories and risk diagnosis.

Overcoming Self-Preference Bias in LLM Rubric Evaluations

Explore how self-preference bias impacts rubric-based evaluation of large language models and strategies to ensure fair, accurate AI assessments.

Detecting Hallucinations in Mental Health Chatbots Using Human-LLM Hybrid

Enhance mental health chatbot safety by combining human expertise with LLMs to detect hallucinations and omissions in responses accurately.

DOVE: Evaluating LLM Cultural Value Alignment Open-Endedly

Discover DOVE, a novel framework for distributional open-ended evaluation of LLMs' cultural value alignment using a value codebook and optimal transport.

Popular

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.

Fitbit Air Deal on Amazon: 26% Off + Free Band Offer

Get 26% off the new Fitbit Air on Amazon with a free band included. Limited-time offer—boost your fitness with advanced tracking and stylish design.

Subscribe

spot_imgspot_img