SimDiff: Depth Pruning via Similarity and Difference
Summary: arXiv:2604.19520v1 Announce Type: new
Abstract: Depth pruning improves the deployment efficiency of large language models (LLMs) by identifying and removing redundant layers. A widely accepted standard for this identification process is to measure the similarity between layers using cosine distance. However, we find that methods relying solely on this one-dimensional heuristic can exhibit unpredictable performance and even catastrophic collapse across different architectures. To address this issue, we propose SimDiff, a novel layer importance criterion that jointly evaluates layers from two orthogonal perspectives: representational similarity and transformation difference.
Introduction to SimDiff
As the demand for efficient deployment of large language models (LLMs) continues to grow, depth pruning has emerged as a critical technique. Traditional methods often depend on cosine distance to gauge layer similarity, but this approach can lead to unreliable outcomes. SimDiff distinguishes itself by introducing a dual-perspective evaluation method that assesses both representational similarity and transformation differences.
Key Metrics in SimDiff
The innovation of SimDiff lies in its use of two distinct metrics that offer a nuanced understanding of layer importance:
- MSSD (Mean Squared Standard Deviation): This metric is sensitive to outliers and is effective in identifying layers that make decisive corrections, ensuring that critical transformations within the model are preserved.
- MASD (Mean Average Standard Deviation): Contrasting with MSSD, MASD provides a robust measurement of a layer’s average contribution, allowing for a more stable evaluation across different architectures.
Experimental Results
Extensive experiments conducted on a range of models from 0.5B to 13B parameters have demonstrated the effectiveness of SimDiff. The results reveal that:
- SimDiff significantly outperforms state-of-the-art baselines across various pruning ratios.
- At a 25% pruning ratio, SimDiff retains over 91% of LLaMA2-7B’s performance, showcasing its efficiency.
- The method achieves up to a 1.49x inference speedup when pruning 12 layers on LLaMA3.1-8B.
Recovery of Pruned Models
An additional advantage of the SimDiff approach is the ability to effectively recover pruned models with minimal fine-tuning. This characteristic is crucial for practitioners looking to maintain model integrity while benefiting from reduced computational overhead.
Conclusion
In conclusion, SimDiff represents a significant advancement in depth pruning methodologies for large language models. By moving beyond traditional similarity measures and incorporating transformation differences, SimDiff offers a more reliable and effective framework for enhancing model efficiency. As the field of artificial intelligence continues to evolve, techniques like SimDiff will likely play a pivotal role in optimizing LLM deployment.
