Olfactory Perception Benchmark for Large Language Models

Date:

Benchmark for Assessing Olfactory Perception of Large Language Models

Summary: arXiv:2604.00002v1 Announce Type: cross

This article introduces the Olfactory Perception (OP) benchmark, a novel framework aimed at evaluating the capabilities of large language models (LLMs) in reasoning about smell. The benchmark is a comprehensive tool that encompasses a wide array of tasks related to olfactory perception.

What is the Olfactory Perception Benchmark?

The OP benchmark is designed to assess LLMs through a series of structured questions that span eight distinct task categories. These categories are as follows:

  • Odor classification
  • Odor primary descriptor identification
  • Intensity and pleasantness judgments
  • Multi-descriptor prediction
  • Mixture similarity
  • Olfactory receptor activation
  • Smell identification from real-world odor sources

In total, the benchmark comprises 1,010 questions, each presented in two different prompt formats: compound names and isomeric SMILES. This dual-format approach is intended to investigate the impact of molecular representations on the models’ performance.

Evaluation of Model Configurations

The study evaluates 21 different model configurations across major model families. The results reveal significant insights into the performance of LLMs when tasked with olfactory reasoning:

  • Compound-name prompts consistently outperform isomeric SMILES prompts.
  • Performance gains range from +2.4 to +18.9 percentage points, with a mean increase of approximately +7 points.
  • The best-performing model achieved an overall accuracy of 64.4%.

These findings indicate that current LLMs tend to access olfactory knowledge primarily through lexical associations, rather than through structural molecular reasoning.

Cross-Language Evaluation

Additionally, the benchmark extends its evaluation to a subset of the OP across 21 languages. The research indicates that aggregating predictions across different languages results in enhanced olfactory prediction capabilities:

  • The best performing language ensemble model achieved an area under the receiver operating characteristic curve (AUROC) of 0.86.
  • This improvement suggests that LLMs can leverage linguistic diversity to enhance their olfactory reasoning abilities.

Conclusion

The introduction of the Olfactory Perception benchmark is a significant advancement in the field of artificial intelligence, as it emphasizes the potential for LLMs to process olfactory information alongside visual and auditory data. The results suggest that while LLMs demonstrate emerging capabilities in olfactory reasoning, there are still substantial gaps that need to be addressed to fully harness their potential in this domain.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.