Optimizing Temperature and Prompting in Large Language Models

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

Summary: arXiv:2604.08563v1 Announce Type: cross

Abstract

Extended reasoning models represent a transformative shift in Large Language Model (LLM) capabilities by enabling explicit test-time computation for complex problem solving. However, the optimal configuration of sampling temperature and prompting strategy for these systems remains largely underexplored.

Research Overview

In this study, we systematically evaluate chain-of-thought and zero-shot prompting across four temperature settings (0.0, 0.4, 0.7, and 1.0) using Grok-4.1 with extended reasoning on 39 mathematical problems from AMO-Bench, a challenging International Mathematical Olympiad-level benchmark. The findings of this research provide crucial insights into how different prompting strategies can be optimized in conjunction with temperature settings to enhance performance.

Key Findings

Zero-shot Prompting: Achieves peak performance at moderate temperatures, specifically at T=0.4 and T=0.7, with an accuracy of 59%.
Chain-of-Thought Prompting: Shows optimal performance at the temperature extremes, suggesting a unique interaction between reasoning strategies and temperature.
Extended Reasoning Benefit: The advantage of employing extended reasoning increases significantly, from 6x at T=0.0 to an impressive 14.3x at T=1.0.

Implications for Future Research

The results of this study challenge the common practice of using T=0 for reasoning tasks. Instead, the research advocates for the optimization of temperature in conjunction with the chosen prompting strategy to maximize the performance of extended reasoning models. This finding opens up new avenues for further investigation into how different configurations can impact the efficiency and accuracy of problem-solving in LLMs.

Conclusion

In conclusion, this research highlights the importance of systematically investigating the interplay between prompting strategies and temperature settings in large language models. By acknowledging that different contexts may require unique configurations, we can refine the capabilities of these models and enhance their effectiveness in complex reasoning tasks.

This work not only contributes to the understanding of LLMs but also sets the stage for future advancements in the field, paving the way for more sophisticated models capable of handling intricate problem-solving scenarios.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Optimizing Temperature and Prompting in Large Language Models

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

Abstract

Research Overview

Key Findings

Implications for Future Research

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related