Anchored Cyclic Generation: A Novel Paradigm for Long-Sequence Symbolic Music Generation
Abstract: Generating long sequences with structural coherence remains a fundamental challenge for autoregressive models across sequential generation tasks. In symbolic music generation, this challenge is particularly pronounced, as existing methods are constrained by the inherent severe error accumulation problem of autoregressive models, leading to poor performance in music quality and structural integrity. In this paper, we propose the Anchored Cyclic Generation (ACG) paradigm, which relies on anchor features from already identified music to guide subsequent generation during the autoregressive process, effectively mitigating error accumulation in autoregressive methods.
Based on the ACG paradigm, we further propose the Hierarchical Anchored Cyclic Generation (Hi-ACG) framework, which employs a systematic global-to-local generation strategy and is highly compatible with our specifically designed piano token, an efficient musical representation. The experimental results demonstrate that compared to traditional autoregressive models, the ACG paradigm achieves a reduction in cosine distance by an average of 34.7% between predicted feature vectors and ground-truth semantic vectors.
Introduction
The field of symbolic music generation has seen significant advancements in recent years, yet generating long sequences of music with structural coherence remains a formidable challenge. Autoregressive models, while powerful, often suffer from error accumulation, leading to deteriorating quality in generated music. This issue is particularly critical in the context of symbolic music, where maintaining musical structure is essential for creating coherent pieces.
The Anchored Cyclic Generation Paradigm
The Anchored Cyclic Generation (ACG) paradigm addresses these challenges by introducing anchor features derived from previously generated music. By utilizing these anchors, the model can effectively guide the generation process, thereby reducing the likelihood of errors that typically accumulate in autoregressive setups. This innovative approach allows for a more stable and coherent generation of long musical sequences.
Hi-ACG Framework
Building on the ACG paradigm, we introduce the Hierarchical Anchored Cyclic Generation (Hi-ACG) framework. This framework employs a global-to-local generation strategy, allowing for a more structured approach to music generation. The Hi-ACG framework is designed to work seamlessly with a specialized piano token, which serves as an efficient representation of musical elements.
Experimental Results
Our experimental evaluations demonstrate the efficacy of the ACG paradigm. When compared to traditional autoregressive models, the ACG approach reduced cosine distance between predicted feature vectors and ground-truth semantic vectors by an impressive average of 34.7%. Furthermore, in tasks involving long-sequence symbolic music generation, the Hi-ACG framework significantly outperformed existing mainstream methods, showcasing its superior capabilities in both subjective and objective evaluations.
Generalization Capabilities
In addition to its performance in music generation tasks, the Hi-ACG framework exhibits remarkable generalization capabilities. It has shown to perform exceptionally well in related tasks, such as music completion, demonstrating its versatility and adaptability across various music generation challenges.
Conclusion
The introduction of the Anchored Cyclic Generation paradigm represents a significant stride forward in the field of symbolic music generation. By addressing the challenges associated with autoregressive models and enhancing long-sequence generation through innovative approaches, the ACG and Hi-ACG frameworks stand to revolutionize how music is generated, leading to richer and more coherent musical compositions.
