Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition
Summary: arXiv:2604.05523v1 Announce Type: new
Abstract: The ability of large language models (LLMs) to manage and acquire economic resources remains unclear. In this paper, we introduce Market-Bench, a comprehensive benchmark that evaluates the capabilities of LLMs in economically-relevant tasks through economic and trade competition.
Market-Bench constructs a configurable multi-agent supply chain economic model where LLMs act as retailer agents responsible for procuring and retailing merchandise. The benchmark consists of two pivotal stages:
- Procurement Stage: LLMs engage in budget-constrained auctions to bid for limited inventory.
- Retail Stage: LLMs set retail prices, generate marketing slogans, and present them to buyers through a role-based attention mechanism for purchase.
Market-Bench meticulously logs complete trajectories of bids, prices, slogans, sales, and balance-sheet states. This comprehensive logging enables automatic evaluation using various metrics, including:
- Economic metrics
- Operational metrics
- Semantic metrics
Benchmarking results on 20 open- and closed-source LLM agents unveil significant performance disparities among the agents. A notable finding is the “winner-take-most” phenomenon, where only a small subset of LLM retailers consistently achieve capital appreciation. In contrast, many others hover around the break-even point, despite having similar semantic matching scores. This disparity raises important questions about the underlying factors that contribute to the economic success of certain LLMs over others.
Market-Bench offers a reproducible testbed for researchers and developers to study how LLMs interact in competitive markets, providing valuable insights into their economic behaviors and performance. By offering a structured environment, it paves the way for future explorations into the economic capabilities of artificial intelligence and the potential for LLMs to contribute to real-world economic scenarios.
In conclusion, Market-Bench stands as a significant advancement in the field of AI benchmarking, specifically targeting the economic and trade competition dimensions of large language models. As the landscape of AI continues to evolve, understanding the economic implications of these technologies becomes increasingly essential.
