TokenDance: Token-to-Token Music-to-Dance Generation with Bidirectional Mamba
Summary: arXiv:2603.27314v1 Announce Type: new
Abstract
Music-to-dance generation has broad applications in virtual reality, dance education, and digital character animation. However, the limited coverage of existing 3D dance datasets confines current models to a narrow subset of music styles and choreographic patterns, resulting in poor generalization to real-world music. Consequently, generated dances often become overly simplistic and repetitive, substantially degrading expressiveness and realism.
To tackle this problem, we present TokenDance, a two-stage music-to-dance generation framework that explicitly addresses this limitation through dual-modality tokenization and efficient token-level generation.
Framework Overview
TokenDance consists of two main stages:
-
Stage One: Discretization
In the first stage, we discretize both dance and music using Finite Scalar Quantization. Dance motions are factorized into upper and lower-body components with kinematic-dynamic constraints, while music is decomposed into semantic and acoustic features. Dedicated codebooks are employed to capture choreography-specific structures.
-
Stage Two: Token Generation
In the second stage, we introduce a Local-Global-Local token-to-token generator built on a Bidirectional Mamba backbone. This architecture enables coherent motion synthesis, strong music-dance alignment, and efficient non-autoregressive inference.
Performance and Applications
Extensive experiments demonstrate that TokenDance achieves overall state-of-the-art (SOTA) performance in both generation quality and inference speed. The effectiveness of this framework highlights its practical value for real-world music-to-dance applications, including:
- Virtual Reality Experiences: Enhancing user interaction and immersion in virtual environments.
- Dance Education: Providing personalized learning experiences through AI-generated choreography.
- Digital Character Animation: Enabling realistic movement in character animations for games and films.
Conclusion
TokenDance represents a significant advancement in the field of music-to-dance generation. By overcoming the limitations of existing datasets and models, it opens new avenues for creativity and innovation in various applications. The combination of dual-modality tokenization and the Bidirectional Mamba backbone sets a new standard for generating expressive and realistic dance movements in sync with diverse musical styles.
