Human-Centric Topic Modeling with Goal-Prompted Learning

Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport

Summary: arXiv:2604.12663v1 Announce Type: new

Abstract: Existing topic modeling methods, from LDA to recent neural and LLM-based approaches, which focus mainly on statistical coherence, often produce redundant or off-target topics that miss the user’s underlying intent. We introduce Human-centric Topic Modeling, Human-TM, a novel task formulation that integrates a human-provided goal directly into the topic modeling process to produce interpretable, diverse and goal-oriented topics.

Introduction

Topic modeling has become an essential technique in natural language processing, allowing researchers and practitioners to extract themes from large corpora of text. Traditional methods, such as Latent Dirichlet Allocation (LDA), have paved the way for more advanced models, including those based on neural networks and large language models (LLMs). However, these existing methodologies often prioritize statistical coherence over user intent, leading to topics that may be redundant, irrelevant, or misaligned with the user’s goals.

Human-Centric Topic Modeling

To address these limitations, we propose a new approach called Human-centric Topic Modeling (Human-TM). This approach emphasizes the integration of human-provided goals into the topic modeling process. By doing so, we aim to enhance the interpretability, diversity, and relevance of the topics generated. Human-TM represents a shift towards more user-centered applications in topic discovery.

Proposed Method: GCTM-OT

At the core of our approach is the Goal-prompted Contrastive Topic Model with Optimal Transport (GCTM-OT). The GCTM-OT methodology consists of several key components:

Goal Extraction: The process begins with LLM-based prompting to extract potential goal candidates from the input documents.
Semantic-Aware Contrastive Learning: These goals are then integrated into a contrastive learning framework that is aware of the underlying semantics of the data.
Optimal Transport: We utilize optimal transport techniques to ensure that the discovered topics align closely with the extracted goals, thus enhancing topic relevance and coherence.

Experimental Results

To evaluate the effectiveness of GCTM-OT, we conducted extensive experiments on three public subreddit datasets. The results demonstrate that GCTM-OT significantly outperforms state-of-the-art baselines in terms of both topic coherence and diversity. More importantly, our approach shows a marked improvement in aligning the generated topics with human-provided goals, highlighting its potential as a more human-centric topic discovery system.

Conclusion

The introduction of Human-centric Topic Modeling and the GCTM-OT framework represents a significant advancement in the field of topic modeling. By integrating human intent directly into the modeling process, we can produce more meaningful and relevant topics for users. This research opens the door for future developments in creating more intuitive and user-friendly topic discovery systems.

For more information, please refer to the full paper available on arXiv.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Human-Centric Topic Modeling with Goal-Prompted Learning

Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport

Introduction

Human-Centric Topic Modeling

Proposed Method: GCTM-OT

Experimental Results

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related