User Turn Generation Reveals Interaction Awareness in LLMs

Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models

Summary: arXiv:2604.02315v2 Announce Type: replace

Abstract

Standard LLM benchmarks evaluate the assistant turn: the model generates a response to an input, a verifier scores correctness, and the analysis ends. This paradigm leaves unmeasured whether the LLM encodes any awareness of what follows the assistant response. We propose user-turn generation as a probe of this gap: given a conversation context of user query and assistant response, we let a model generate under the user role. If the model’s weights encode interaction awareness, the generated user turn will be a grounded follow-up that reacts to the preceding context.

Introduction

Recent advancements in large language models (LLMs) have significantly changed the landscape of human-computer interaction. However, the evaluation metrics traditionally used do not fully capture the model’s understanding of ongoing conversations. This article discusses a novel approach that shifts focus from assistant responses to user turn generation, aiming to measure interaction awareness in LLMs.

Proposed Methodology

We introduce a method called user-turn generation, which assesses how well a model can generate responses that reflect an understanding of the conversational context. The process involves:

Providing a model with a user query followed by an assistant response.
Allowing the model to generate a user turn that serves as a follow-up to the assistant’s response.
Analyzing the generated user turn to determine if it demonstrates awareness of the prior interaction.

Experimental Setup

Our experiments were conducted across 11 open-weight LLMs, including Qwen3.5, gpt-oss, and GLM, and utilized 5 diverse datasets focusing on:

Mathematical reasoning
Instruction following
Conversational dynamics

Findings

Our findings reveal several critical insights into the interaction awareness of language models:

Interaction awareness is distinct from task accuracy, highlighting a gap between a model’s ability to perform tasks and its understanding of conversational context.
Within the Qwen3.5 family, as model size increased from 0.8B to 397B parameters, the accuracy on GSM8K tasks improved from 41% to 96.8%. However, the genuine follow-up rate remained close to zero under deterministic generation.
Higher temperature sampling yielded a latent interaction awareness, with follow-up rates reaching up to 22%.

Controlled Perturbations

To ensure the robustness of our findings, we conducted controlled perturbations. These experiments validated that user-turn generation effectively measures a real property of the model concerning interaction awareness.

Post-Training Enhancements

Further, we explored collaboration-oriented post-training on the Qwen3.5-2B model. The results indicated a notable increase in follow-up rates, suggesting that targeted training can enhance interaction awareness.

Conclusion

In summary, user-turn generation serves as a vital probe to uncover interaction awareness in LLMs, a dimension often overlooked by conventional assistant-only benchmarks. Our results encourage further exploration into this area, suggesting that enhancing interaction awareness can lead to more engaging and contextually aware AI systems.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

User Turn Generation Reveals Interaction Awareness in LLMs

Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models

Abstract

Introduction

Proposed Methodology

Experimental Setup

Findings

Controlled Perturbations

Post-Training Enhancements

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related