CreativityBench: Benchmarking AI Creative Reasoning Skills

CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing

Recent advancements in artificial intelligence, particularly in large language models (LLMs), have demonstrated impressive capabilities in reasoning and environment-interaction tasks. However, the exploration of their creative problem-solving abilities has not been as thoroughly examined. A new study introduces CreativityBench, a benchmark aimed at evaluating the creative tool use of these models through an affordance-based approach.

Understanding CreativityBench

CreativityBench is designed to assess how well LLMs can repurpose available objects by reasoning about their affordances and attributes instead of relying on their traditional uses. This innovative benchmark is built on a substantial affordance knowledge base (KB) that includes:

4,000 entities
Over 150,000 affordance annotations
Explicit links between objects, parts, attributes, and actionable uses

The knowledge base serves as a foundational element for generating a diverse array of grounded tasks. In total, 14,000 tasks have been created that challenge models to identify non-obvious yet physically plausible solutions while adhering to specific constraints.

Key Findings from Evaluations

In the evaluation process, 10 state-of-the-art LLMs, including both closed and open-source models, were tested using the CreativityBench framework. The results revealed several critical insights:

Plausibility vs. Accuracy: While models generally succeeded in selecting plausible objects for tasks, they frequently struggled to identify the correct parts, their affordances, and the underlying physical mechanisms required for effective problem-solving.
Performance Drop: The inability to accurately assess and utilize these elements led to a significant decline in overall performance, highlighting a gap in creative reasoning capabilities.
Saturation of Improvements: As models were scaled, performance improvements quickly plateaued, indicating that merely increasing model size does not inherently enhance creative affordance discovery.
Limited Effect of Inference Strategies: Common strategies employed during inference, such as Chain-of-Thought reasoning, provided only marginal gains in performance, further emphasizing the complexity of creative tool use.

Implications for Future AI Development

The findings from this groundbreaking research indicate that creative tool use remains a substantial challenge for existing AI models. The limitations observed suggest that enhancing the creative reasoning capabilities of LLMs is a vital area for future research and development. CreativityBench not only serves as a valuable assessment tool but also opens avenues for refining planning and reasoning modules within AI agents.

As AI continues to evolve, understanding the creative dimensions of intelligence will be crucial for developing more sophisticated and versatile agents. The insights gained from CreativityBench may lead to significant advancements in how AI interacts with the world, ultimately enriching its problem-solving abilities and expanding its potential applications across various fields.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

CreativityBench: Benchmarking AI Creative Reasoning Skills

CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing

Understanding CreativityBench

Key Findings from Evaluations

Implications for Future AI Development

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related