Market-Bench: Benchmarking LLMs in Economic Trade

Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition

Summary: arXiv:2604.05523v1 Announce Type: new

Abstract: The ability of large language models (LLMs) to manage and acquire economic resources remains unclear. In this paper, we introduce Market-Bench, a comprehensive benchmark that evaluates the capabilities of LLMs in economically-relevant tasks through economic and trade competition.

Market-Bench constructs a configurable multi-agent supply chain economic model where LLMs act as retailer agents responsible for procuring and retailing merchandise. The benchmark consists of two pivotal stages:

Procurement Stage: LLMs engage in budget-constrained auctions to bid for limited inventory.
Retail Stage: LLMs set retail prices, generate marketing slogans, and present them to buyers through a role-based attention mechanism for purchase.

Market-Bench meticulously logs complete trajectories of bids, prices, slogans, sales, and balance-sheet states. This comprehensive logging enables automatic evaluation using various metrics, including:

Economic metrics
Operational metrics
Semantic metrics

Benchmarking results on 20 open- and closed-source LLM agents unveil significant performance disparities among the agents. A notable finding is the “winner-take-most” phenomenon, where only a small subset of LLM retailers consistently achieve capital appreciation. In contrast, many others hover around the break-even point, despite having similar semantic matching scores. This disparity raises important questions about the underlying factors that contribute to the economic success of certain LLMs over others.

Market-Bench offers a reproducible testbed for researchers and developers to study how LLMs interact in competitive markets, providing valuable insights into their economic behaviors and performance. By offering a structured environment, it paves the way for future explorations into the economic capabilities of artificial intelligence and the potential for LLMs to contribute to real-world economic scenarios.

In conclusion, Market-Bench stands as a significant advancement in the field of AI benchmarking, specifically targeting the economic and trade competition dimensions of large language models. As the landscape of AI continues to evolve, understanding the economic implications of these technologies becomes increasingly essential.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Market-Bench: Benchmarking LLMs in Economic Trade

Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related