Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics
Summary: arXiv:2604.00443v1 Announce Type: cross
Abstract: If the same neuron activates for both “lender” and “riverside,” standard metrics attribute the overlap to superposition–the neuron must be compressing two unrelated concepts. This work explores how much of the overlap is due to a lexical confound: neurons fire for a shared word form (such as “bank”) rather than for two compressed concepts. A 2×2 factorial decomposition reveals that the lexical-only condition (same word, different meaning) consistently exceeds the semantic-only condition (different word, same meaning) across models spanning 110M-70B parameters. The confound carries into sparse autoencoders (18-36% of features blend senses).
Introduction
The field of artificial intelligence and natural language processing has long been intrigued by the phenomena of polysemy and polysemanticity. These terms refer to the capacity for a single word or phrase to convey multiple meanings or senses. This complexity poses significant challenges for computational models designed to process and understand human language.
Understanding Superposition Metrics
Standard superposition metrics have been utilized to measure how neurons in artificial neural networks respond to overlapping concepts. When a single neuron activates for distinct words like “lender” and “riverside,” researchers typically interpret this as an indication that the neuron is compressing two unrelated ideas into a single representation. However, this interpretation may not fully capture the underlying linguistic realities.
The Lexical Confound
This new research investigates the hypothesis that part of this overlap can be attributed to a lexical confound—a situation where the same word form, such as “bank,” activates a neuron due to its multiple meanings rather than the neuron processing two completely separate concepts. By employing a 2×2 factorial design, the authors meticulously disentangle the lexical effects from semantic ones.
Key Findings
- The study reveals that the lexical-only condition, where the same word is used but carries different meanings, consistently surpasses the semantic-only condition, where different words share the same meaning.
- This pattern is observed across various models, ranging in size from 110 million to 70 billion parameters, indicating that the phenomenon is robust across different architectures.
- In sparse autoencoders, it was found that between 18% and 36% of features blend different senses, further complicating the interpretation of neuron activations.
Implications for AI Models
The implications of these findings are profound for the design and evaluation of AI models in natural language processing. Understanding the role of lexical identity versus semantic identity is crucial for improving model accuracy and reliability. As researchers continue to refine these superposition metrics, it may lead to more nuanced models that better reflect the intricacies of human language.
Conclusion
This investigation into polysemanticity and polysemy not only sheds light on the cognitive processes behind language comprehension but also challenges AI researchers to reconsider how they measure and interpret neuron activations. As the field evolves, a deeper understanding of these lexical confounds may pave the way for more sophisticated and effective AI systems.
