Social Meaning in Large Language Models: Structure, Magnitude, and Pragmatic Prompting
Summary: arXiv:2604.02512v1 Announce Type: cross
Abstract
Large language models (LLMs) increasingly exhibit human-like patterns of pragmatic and social reasoning. This paper addresses two related questions: do LLMs approximate human social meaning not only qualitatively but also quantitatively, and can prompting strategies informed by pragmatic theory improve this approximation? To address the first, we introduce two calibration-focused metrics distinguishing structural fidelity from magnitude calibration: the Effect Size Ratio (ESR) and the Calibration Deviation Score (CDS). To address the second, we derive prompting conditions from two pragmatic assumptions: that social meaning arises from reasoning over linguistic alternatives, and that listeners infer speaker knowledge states and communicative motives.
Key Findings
Applied to a case study on numerical (im)precision across three frontier LLMs, we find:
- All models reliably reproduce the qualitative structure of human social inferences.
- There are substantial differences in magnitude calibration among the models.
- Prompting models to reason about speaker knowledge and motives consistently reduces magnitude deviation.
- Prompting for alternative-awareness tends to amplify exaggeration.
- Combining both components is the only intervention that improves all calibration-sensitive metrics across all models.
Implications of the Study
The results indicate that while LLMs capture the inferential structure inherent in human social reasoning, they often distort the inferential strength. This distortion raises important questions regarding the reliability and applicability of LLMs in contexts where accurate social reasoning is crucial, such as in legal, medical, and educational settings. The study suggests that pragmatic theory can serve as a foundation for developing improved prompting strategies, yet it also acknowledges the limitations of this approach.
Future Directions
Further research is needed to refine the prompting conditions and explore additional factors influencing the calibration of LLMs. Some potential areas for exploration include:
- Investigating the impact of different linguistic structures on social meaning interpretation.
- Examining the role of context in shaping the responses of LLMs.
- Developing more sophisticated metrics for evaluating social reasoning in LLMs.
- Testing prompting strategies across diverse LLM architectures to assess generalizability.
Conclusion
This study contributes to the understanding of how LLMs approximate human social meaning and the ways in which pragmatic theory can inform their development. Although the current prompting strategies show promise, achieving full magnitude calibration remains a challenge. Ongoing research will be vital for enhancing the reliability of LLMs in social reasoning tasks.
