The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models
Summary: arXiv:2604.19139v1 Announce Type: cross
Abstract: As Large Language Models (LLMs) continue to evolve through alignment techniques such as Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI, a growing and increasingly conspicuous phenomenon has emerged: the proliferation of verbal tics — repetitive, formulaic linguistic patterns that pervade model outputs. These range from sycophantic openers (“That’s a great question!”, “Awesome!”) to pseudo-empathetic affirmations (“I completely understand your concern”, “I’m right here to catch you”) and overused vocabulary (“delve”, “tapestry”, “nuanced”).
In this paper, we present a systematic analysis of the verbal tic phenomenon across eight state-of-the-art LLMs: GPT-5.4, Claude Opus 4.7, Gemini 3.1 Pro, Grok 4.2, Doubao-Seed-2.0-pro, Kimi K2.5, DeepSeek V3.2, and MiMo-V2-Pro. Utilizing a custom evaluation framework for standardized API-based evaluation, we assess 10,000 prompts across 10 task categories in both English and Chinese, yielding 160,000 model responses.
We introduce the Verbal Tic Index (VTI), a composite metric quantifying tic prevalence, and analyze its correlation with sycophancy, lexical diversity, and human-perceived naturalness.
Key Findings
- Inter-Model Variation: Our findings reveal significant inter-model variation: Gemini 3.1 Pro exhibits the highest VTI (0.590), while DeepSeek V3.2 achieves the lowest (0.295).
- Accumulation Over Conversations: Verbal tics accumulate over multi-turn conversations and are amplified in subjective tasks.
- Cross-Lingual Patterns: Distinct patterns of verbal tics were observed across languages, indicating a complex relationship between language and model behavior.
- Human Evaluation: In a human evaluation (N = 120), we confirmed a strong inverse relationship between sycophancy and perceived naturalness (r = -0.87, p < 0.001).
Implications
These results underscore the “alignment tax” of current training paradigms, suggesting that while alignment techniques improve the performance of LLMs, they may simultaneously lead to the emergence of repetitive verbal patterns that detract from the authenticity of human-AI interactions. The study highlights the urgent need for more robust frameworks that foster genuine communication between humans and AI systems.
Conclusion
As the field of artificial intelligence continues to advance, understanding the nuances of verbal communication in LLMs is paramount. The systematic analysis of verbal tics presents both challenges and opportunities for researchers and developers aiming to create more natural and engaging AI interactions. Future research should focus on refining alignment methods to minimize these verbal tics, ultimately enhancing the user experience in human-AI collaborations.
