Can Large Language Models Accurately Perceive Time?

Date:

Can LLMs Perceive Time? An Empirical Investigation

Summary: arXiv:2604.00010v1 Announce Type: cross

Abstract

Large language models (LLMs) have proven to be adept at a variety of language-related tasks; however, a critical limitation has been identified regarding their ability to perceive time. In this article, we delve into this limitation through a comprehensive empirical investigation involving four experiments conducted across 68 tasks and four distinct model families. Our findings indicate that these models consistently overshoot their pre-task estimates by a factor of 4 to 7 times (p < 0.001), often predicting durations in human-scale minutes for tasks that complete in mere seconds. Moreover, the relative ordering of task durations demonstrates similar inaccuracies, particularly in task pairs designed to expose the models' reliance on heuristics. For instance, GPT-5 scored only 18% on counter-intuitive pairs (p = 0.033), indicating a systematic failure when confronted with misleading complexity labels. Additionally, post-hoc recall of task durations is disconnected from reality, revealing a divergence from actual time estimates by an order of magnitude in either direction. This study highlights the persistent nature of these failures even in multi-step agentic settings, where errors can range from 5 to 10 times. While the models possess propositional knowledge about duration obtained from their training data, they lack experiential grounding in their own inference time. This shortcoming presents practical implications for applications involving agent scheduling, planning, and time-critical scenarios.

Introduction

The ability of AI systems, particularly large language models, to understand and estimate time remains a topic of significant interest and concern. As these models are increasingly integrated into various applications, from automated customer service to sophisticated planning systems, their inability to accurately perceive time could lead to inefficiencies and errors. This article aims to explore the depth of this limitation through a series of empirical experiments.

Methodology

To assess the temporal perception of LLMs, we designed four experiments involving 68 tasks across four different model families. Each experiment was structured to evaluate both pre-task estimates and post-task recalls, allowing us to measure the discrepancies between predicted and actual durations.

Key Findings

  • Pre-task Estimates: The models consistently overestimated the duration of tasks by 4 to 7 times.
  • Relative Ordering: In task pairs designed to challenge heuristic reliance, GPT-5 scored at or below chance levels, indicating a significant lack of accuracy.
  • Post-hoc Recall: Models demonstrated a considerable divergence in their recall of task durations, often misestimating by an order of magnitude.
  • Multi-step Settings: Errors persisted in multi-step tasks, with models displaying inaccuracies of 5 to 10 times the actual duration.

Implications

The findings of this investigation underscore the importance of developing LLMs that not only possess propositional knowledge but also a grounding in experiential understanding. As applications of AI become more complex and time-sensitive, addressing these limitations is crucial for enhancing the reliability and efficiency of AI systems in tasks involving scheduling and planning.

Conclusion

The empirical investigation into LLMs’ perception of time reveals a critical gap that must be addressed. As we continue to integrate these models into real-world applications, understanding and improving their temporal awareness will be essential for ensuring optimal performance and reliability.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.