A Layer-wise Analysis of Supervised Fine-Tuning
Summary: arXiv:2604.11838v1 Announce Type: cross
Abstract: While critical for alignment, Supervised Fine-Tuning (SFT) incurs the risk of catastrophic forgetting, yet the layer-wise emergence of instruction-following capabilities remains elusive. We investigate this mechanism via a comprehensive analysis utilizing information-theoretic, geometric, and optimization metrics across model scales (1B-32B).
Our experiments reveal a distinct depth-dependent pattern: middle layers (20%-80%) are stable, whereas final layers exhibit high sensitivity. Leveraging this insight, we propose Mid-Block Efficient Tuning, which selectively updates these critical intermediate layers.
Empirically, our method outperforms standard LoRA up to 10.2% on GSM8K (OLMo2-7B) with reduced parameter overhead, demonstrating that effective alignment is architecturally localized rather than distributed. The code is publicly available at https://anonymous.4open.science/r/base_sft.
Introduction
Supervised Fine-Tuning (SFT) is an essential process in training models to follow instructions effectively. However, it often leads to catastrophic forgetting, where previously learned information is lost as new data is introduced. Understanding how different layers in a neural network contribute to instruction-following capabilities is vital for improving SFT.
Methodology
In this research, we utilize a combination of information-theoretic, geometric, and optimization metrics to analyze the behavior of various model scales ranging from 1 billion to 32 billion parameters. This multifaceted approach allows us to gain insights into how different layers of the model respond to fine-tuning.
Key Findings
- Layer Stability: Our analysis revealed that the middle layers of the model, specifically those between 20% and 80% depth, demonstrate a remarkable stability during the fine-tuning process.
- Layer Sensitivity: In contrast, the final layers exhibited a high sensitivity to changes, indicating that they are more prone to the effects of catastrophic forgetting.
- Mid-Block Efficient Tuning: Based on these findings, we developed a method called Mid-Block Efficient Tuning that focuses on selectively updating the stable middle layers while minimizing changes to the sensitive final layers.
- Performance Improvement: Our empirical evaluations showed that this new approach outperformed the standard Low-Rank Adaptation (LoRA) method by up to 10.2% on the GSM8K dataset when using the OLMo2-7B model.
Conclusion
The results of our study indicate that effective alignment in supervised fine-tuning is not uniformly distributed across the layers of a model. Instead, it is architecturally localized within the middle layers. This insight opens new avenues for optimizing fine-tuning strategies and reducing parameter overhead while maintaining or improving performance.
For those interested in exploring this further, our code is publicly available, allowing researchers and practitioners to implement and test the Mid-Block Efficient Tuning approach in their own work.
