JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency
Summary: arXiv:2604.03044v1 Announce Type: cross
Abstract: We introduce JoyAI-LLM Flash, an efficient Mixture-of-Experts (MoE) language model designed to redefine the trade-off between strong performance and token efficiency in the sub-50B parameter regime. JoyAI-LLM Flash is pretrained on a massive corpus of 20 trillion tokens and further optimized through a rigorous post-training pipeline, including supervised fine-tuning (SFT), Direct Preference Optimization (DPO), and large-scale reinforcement learning (RL) across diverse environments.
Key Features of JoyAI-LLM Flash
JoyAI-LLM Flash introduces several innovative mechanisms aimed at enhancing token efficiency and overall model performance. The key features include:
- Balanced Cognitive Modes: The model strategically balances between thinking and non-thinking cognitive modes to optimize decision-making processes.
- FiberPO Algorithm: This novel reinforcement learning (RL) algorithm, inspired by fibration theory, decomposes trust-region maintenance into global and local components, allowing for unified multi-scale stability control during policy optimization.
- Architectural Sparsity: JoyAI-LLM Flash comprises 48B total parameters while activating only 2.7B parameters per forward pass, resulting in a significantly higher sparsity ratio compared to other industry-leading models of similar scale.
- Joint Training-Inference Co-Design: To improve inference throughput, the model adopts a co-design approach that integrates dense Multi-Token Prediction (MTP) and Quantization-Aware Training (QAT).
Performance and Efficiency
In benchmarking tests, JoyAI-LLM Flash has demonstrated remarkable efficiency in both training and inference phases. The model’s ability to maintain high performance while utilizing a fraction of its total parameters sets a new standard in the realm of mid-scale language models.
Community Contribution
To foster collaboration and innovation within the AI community, we are excited to announce the release of checkpoints for both JoyAI-LLM-48B-A3B Base and its post-trained variants on Hugging Face. This initiative aims to support researchers and developers in their endeavors to explore and build upon the advancements made with JoyAI-LLM Flash.
Conclusion
JoyAI-LLM Flash represents a significant leap forward in the development of efficient language models, particularly within the sub-50B parameter space. By innovatively addressing the challenges of token efficiency and architectural sparsity, this model not only enhances performance but also sets the groundwork for future advancements in the field of AI. As the open-source community engages with this groundbreaking model, we anticipate a surge of new applications and research opportunities that will further push the boundaries of what is possible in AI technology.
