PushupBench Reveals VLMs Fail to Count Pushups Accurately

Date:

PushupBench: Your VLM is Not Good at Counting Pushups

In a groundbreaking study released on arXiv, researchers have unveiled a significant limitation of large vision-language models (VLMs) in video analysis. While these models excel at recognizing the content and context of visual data, they struggle with quantifying actions, particularly when it comes to counting repetitions in exercise videos. The study introduces PushupBench, a novel dataset specifically designed to evaluate the ability of VLMs to accurately count repetitions in video clips.

PushupBench comprises 446 long-form video clips, with an average duration of 36.7 seconds. These clips are meticulously curated to focus on various pushup techniques, offering a comprehensive platform for assessing the counting capabilities of VLMs. The research highlights the inadequacy of current models, revealing that even the best-performing frontier model achieves only 42.1% exact accuracy in counting pushups. In stark contrast, open-source models with 4 billion parameters score around 6%, which aligns with supervised baselines.

Key Findings from the Research

The study presents several critical findings that shed light on the performance of VLMs in action counting:

  • Accuracy Misleading: The researchers emphasize that counting accuracy alone is a misleading metric. Weaker models appear to exploit the modal count—simply guessing the most common number—rather than engaging in temporal reasoning, which is essential for accurate counting.
  • Fine-tuning Benefits: Fine-tuning VLMs on counting tasks using a subset of 1,000 samples has shown promising results. The models exhibited improved performance across various general video understanding benchmarks:
    • MVBench: Increased by 2.15 points
    • PerceptionTest: Increased by 1.88 points
    • TVBench: Increased by 4.54 points
  • Counting as a Proxy: The findings suggest that counting capabilities may serve as a proxy for broader temporal reasoning skills in VLMs, indicating that addressing this limitation could enhance overall model performance in video understanding.

Implications for the Future of VLMs

The introduction of PushupBench opens the door to a new avenue of research aimed at improving VLMs’ temporal reasoning capabilities. As the field of AI continues to evolve, understanding the limitations of current models is crucial for developing more sophisticated systems that can accurately interpret and analyze dynamic content.

PushupBench has been incorporated into the lmms-eval framework and is now hosted on pushupbench.com. This resource provides researchers and developers with the tools necessary to evaluate and enhance the counting abilities of their VLMs, ultimately pushing the boundaries of what these models can achieve.

As we move forward, it is essential for AI researchers to focus not only on increasing accuracy but also on fostering a deeper understanding of temporal dynamics in video content. The insights garnered from PushupBench could serve as a catalyst for future advancements, paving the way for more intelligent and capable AI systems.

Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.