TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds
In the realm of recommender systems, two distinct paradigms have evolved: feature interaction models that focus on multi-field categorical features, and sequential models that analyze user behavior through historical interaction sequences. Recent attempts to bridge these paradigms have led to new architectures, but they have also uncovered significant challenges. A new research paper, identified as arXiv:2604.13737v1, presents a novel solution called TokenFormer, designed to unify these two approaches effectively.
Understanding the Challenge
Traditional recommender systems have operated independently within these two frameworks, leading to a lack of integration that impedes the ability to harness the full potential of user data. The paper highlights a critical issue known as Sequential Collapse Propagation (SCP), which occurs when integrating dimensionally ill non-sequence fields with sequence features. This phenomenon can cause a degradation of the model’s performance, resulting in less effective recommendations.
Introducing TokenFormer
To address the challenges posed by SCP, the authors propose TokenFormer, a unified recommendation architecture that incorporates several innovative techniques:
- Bottom-Full-Top-Sliding (BFTS) Attention Scheme: This novel attention mechanism applies full self-attention in the lower layers, allowing for comprehensive interaction modeling, while employing a shrinking-window sliding attention in the upper layers to optimize sequential dynamics.
- Non-Linear Interaction Representation (NLIR): TokenFormer introduces one-sided non-linear multiplicative transformations to the hidden states. This enhances the model’s capacity to capture complex interactions between features, thereby improving its overall predictive power.
Empirical Validation
The authors conducted extensive experiments on public benchmarks as well as Tencent’s advertising platform to validate the performance of TokenFormer. The results demonstrated that TokenFormer not only outperforms existing models but also addresses the dimensional robustness and representation discriminability issues that plague traditional unified models.
Conclusion
TokenFormer represents a significant advancement in the field of recommender systems by effectively bridging the gap between multi-field and sequential recommendation paradigms. Its innovative attention scheme and interaction representation techniques provide a robust framework that enhances predictive accuracy and user experience. As the demand for personalized recommendations continues to grow, solutions like TokenFormer will play a crucial role in shaping the future of recommender systems.
