LLM Adaptation Challenges in Non-Stationary Reversal Learning

Comparative Reversal Learning Reveals Rigid Adaptation in LLMs under Non-Stationary Uncertainty

Summary: arXiv:2604.04182v1 Announce Type: new

Abstract: Non-stationary environments require agents to revise previously learned action values when contingencies change. We treat large language models (LLMs) as sequential decision policies in a two-option probabilistic reversal-learning task with three latent states and switch events triggered by either a performance criterion or timeout.

Introduction

In the ever-evolving landscape of artificial intelligence, understanding how large language models (LLMs) adapt to changing environments is crucial. This paper examines the adaptability of various LLMs in non-stationary scenarios through a probabilistic reversal-learning task. The task is designed to analyze how effectively these models can revise their learned action values when faced with new contingencies.

Methodology

The study involves a comparative analysis of three prominent LLMs: DeepSeek-V3.2, Gemini-3, and GPT-5.2, using human data as a behavioral benchmark. The models were subjected to two different schedules:

Deterministic Fixed Transition Cycle: A stable environment where the transitions are predictable.
Stochastic Random Schedule: An unpredictable environment that increases volatility and changes the learning dynamics.

Key Findings

The results reveal significant insights regarding the adaptability of these models:

Across all models, the win-stay strategy was nearly at its ceiling, while the lose-shift strategy was noticeably less effective, indicating an asymmetric reliance on positive versus negative outcomes.
DeepSeek-V3.2 exhibited extreme perseveration following reversals, demonstrating weak acquisition capabilities.
Both Gemini-3 and GPT-5.2 adapted more quickly than DeepSeek-V3.2 but still showed less sensitivity to losses compared to human participants.
Increased randomness in transitions amplified the models’ tendency for reversal-specific persistence, suggesting that high total payoffs can coexist with rigid adaptation behaviors.

Discussion

The findings indicate that the rigidity observed in LLMs can stem from various mechanisms, including weak loss learning, inflated policy determinism, and value polarization due to counterfactual suppression. These results highlight the necessity for developing reversal-sensitive diagnostics and volatility-aware models for evaluating the performance of LLMs in non-stationary environments.

Conclusion

This comparative reversal learning framework sheds light on the limitations of current LLMs in adapting to changing contingencies. Understanding these constraints is vital for advancing AI technologies that can operate more flexibly and effectively in dynamic settings.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

LLM Adaptation Challenges in Non-Stationary Reversal Learning

Comparative Reversal Learning Reveals Rigid Adaptation in LLMs under Non-Stationary Uncertainty

Introduction

Methodology

Key Findings

Discussion

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related