Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning
Summary: arXiv:2505.15467v2 Announce Type: replace-cross
Abstract: Large language models have achieved remarkable success in various tasks. However, it is challenging for them to learn new tasks incrementally due to catastrophic forgetting. Existing approaches rely on experience replay, optimization constraints, or task differentiation, which encounter strict limitations in real-world scenarios. To address these issues, we propose Joint Flashback Adaptation. We first introduce flashbacks — a limited number of prompts from old tasks — when adapting to new tasks and constrain the deviations of the model outputs compared to the original one. We then interpolate latent tasks between flashbacks and new tasks to enable jointly learning relevant latent tasks, new tasks, and flashbacks, alleviating data sparsity in flashbacks and facilitating knowledge sharing for smooth adaptation. Our method requires only a limited number of flashbacks without access to the replay data and is task-agnostic. We conduct extensive experiments on state-of-the-art large language models across 1000+ instruction-following tasks, arithmetic reasoning tasks, and general reasoning tasks. The results demonstrate the superior performance of our method in improving generalization on new tasks and reducing forgetting in old tasks.
Introduction
The development of large language models has revolutionized the field of artificial intelligence, enabling machines to perform a wide array of tasks with high accuracy. However, one persistent challenge in the realm of machine learning is the issue of catastrophic forgetting. This phenomenon occurs when a model forgets previously learned information upon being trained on new data. Traditional strategies for mitigating this issue, such as experience replay and optimization constraints, often fall short in practical applications.
Proposed Solution: Joint Flashback Adaptation
To tackle the challenges associated with catastrophic forgetting, our research introduces a novel approach known as Joint Flashback Adaptation. This method focuses on two primary features:
- Flashbacks: A limited set of prompts from prior tasks are utilized during the adaptation of new tasks. This mechanism allows the model to retain essential information from old tasks while learning new ones.
- Latent Task Interpolation: Our method interpolates between flashbacks and new tasks, facilitating the joint learning of relevant latent tasks. This approach significantly alleviates data sparsity issues that arise with flashbacks and enhances knowledge sharing, leading to smoother adaptations.
Key Advantages
The Joint Flashback Adaptation technique offers several advantages:
- Requires only a limited number of flashbacks, which reduces the need for extensive replay data.
- Is task-agnostic, making it applicable across various domains without specific adjustments.
- Enhances generalization on new tasks while minimizing the forgetting of previously learned tasks.
Experimental Validation
Our approach was rigorously tested on state-of-the-art large language models across a diverse set of tasks, including over 1000 instruction-following tasks, arithmetic reasoning tasks, and general reasoning tasks. The experimental results showcase the effectiveness of Joint Flashback Adaptation in improving model performance, demonstrating a significant reduction in forgetting while enhancing the ability to generalize to new tasks.
Conclusion
In conclusion, Joint Flashback Adaptation represents a significant advancement in the ongoing battle against catastrophic forgetting in large language models. By leveraging flashbacks and latent task interpolation, our method not only preserves knowledge from old tasks but also facilitates the smooth integration of new tasks. This innovative approach has the potential to reshape how models learn and adapt in real-world scenarios, paving the way for more robust and efficient machine learning systems.
