AudioRole Dataset for Audio Role-Playing in LLMs

AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models

In the rapidly evolving landscape of artificial intelligence, the creation of high-quality multimodal datasets is fundamental for advancing the role-playing capabilities in large language models (LLMs). A new dataset, named AudioRole, aims to bridge existing gaps by providing a meticulously curated collection of audio and text data specifically designed for Audio Role-Playing (ARP).

Traditionally, research in this field has predominantly focused on text-based persona simulation. However, ARP introduces unique challenges due to the necessity for synchronized alignment between semantic content and vocal characteristics. Recognizing the importance of this alignment, the creators of AudioRole have assembled a dataset that includes over 1,000 hours of audio from 13 popular TV series, featuring more than 1 million character-grounded dialogues.

Key Features of the AudioRole Dataset

AudioRole is not just another dataset; it offers a wealth of features that make it a vital resource for researchers and developers. The key features include:

Synchronized Audio-Text Pairs: Each dialogue is paired with corresponding audio, ensuring that users can study the nuances of character interactions effectively.
Speaker Identity Annotations: The dataset includes detailed annotations of speaker identities, allowing for precise role-playing simulations.
Contextual Metadata: Contextual information accompanying dialogues enhances the understanding of character dynamics and situational contexts.
Diverse Character Representation: With dialogues from over 115 main characters, the dataset captures a wide array of personalities and voices.

Introducing ARP-Eval: A Dual-Aspect Evaluation Framework

To ensure the effectiveness of the AudioRole dataset, the creators also introduced ARP-Eval, a dual-aspect evaluation framework. This framework assesses:

Response Quality: Evaluating how well the generated responses align with the character’s persona.
Role Fidelity: Measuring the accuracy of the character portrayal in role-playing scenarios.

Performance Validation of ARP-Model

Empirical validation of the dataset was conducted using a model specifically trained on AudioRole, referred to as ARP-Model. The findings revealed that ARP-Model achieved an average Acoustic Personalization score of 0.31. This score significantly outperformed both the original GLM-4-Voice and the more powerful MiniCPM-O-2.6 model, which is tailored for one-shot role-playing scenarios.

Furthermore, the ARP-Model attained a Content Personalization score of 0.36, surpassing the untrained original model by approximately 38%, while maintaining comparable performance to MiniCPM-O-2.6. These results underscore the potential of AudioRole in enhancing audio-grounded role-playing research.

Conclusion

In summary, AudioRole is a groundbreaking dataset that offers a plethora of resources for advancing the fields of audio-grounded role-playing and character simulation in large language models. With its unique features and robust evaluation framework, AudioRole is poised to become an essential tool for researchers and developers aiming to push the boundaries of AI-driven role-playing experiences.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

AudioRole Dataset for Audio Role-Playing in LLMs

AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models

Key Features of the AudioRole Dataset

Introducing ARP-Eval: A Dual-Aspect Evaluation Framework

Performance Validation of ARP-Model

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related