SimDiff: Advanced Depth Pruning for Efficient LLMs

SimDiff: Depth Pruning via Similarity and Difference

Summary: arXiv:2604.19520v1 Announce Type: new

Abstract: Depth pruning improves the deployment efficiency of large language models (LLMs) by identifying and removing redundant layers. A widely accepted standard for this identification process is to measure the similarity between layers using cosine distance. However, we find that methods relying solely on this one-dimensional heuristic can exhibit unpredictable performance and even catastrophic collapse across different architectures. To address this issue, we propose SimDiff, a novel layer importance criterion that jointly evaluates layers from two orthogonal perspectives: representational similarity and transformation difference.

Introduction to SimDiff

As the demand for efficient deployment of large language models (LLMs) continues to grow, depth pruning has emerged as a critical technique. Traditional methods often depend on cosine distance to gauge layer similarity, but this approach can lead to unreliable outcomes. SimDiff distinguishes itself by introducing a dual-perspective evaluation method that assesses both representational similarity and transformation differences.

Key Metrics in SimDiff

The innovation of SimDiff lies in its use of two distinct metrics that offer a nuanced understanding of layer importance:

MSSD (Mean Squared Standard Deviation): This metric is sensitive to outliers and is effective in identifying layers that make decisive corrections, ensuring that critical transformations within the model are preserved.
MASD (Mean Average Standard Deviation): Contrasting with MSSD, MASD provides a robust measurement of a layer’s average contribution, allowing for a more stable evaluation across different architectures.

Experimental Results

Extensive experiments conducted on a range of models from 0.5B to 13B parameters have demonstrated the effectiveness of SimDiff. The results reveal that:

SimDiff significantly outperforms state-of-the-art baselines across various pruning ratios.
At a 25% pruning ratio, SimDiff retains over 91% of LLaMA2-7B’s performance, showcasing its efficiency.
The method achieves up to a 1.49x inference speedup when pruning 12 layers on LLaMA3.1-8B.

Recovery of Pruned Models

An additional advantage of the SimDiff approach is the ability to effectively recover pruned models with minimal fine-tuning. This characteristic is crucial for practitioners looking to maintain model integrity while benefiting from reduced computational overhead.

Conclusion

In conclusion, SimDiff represents a significant advancement in depth pruning methodologies for large language models. By moving beyond traditional similarity measures and incorporating transformation differences, SimDiff offers a more reliable and effective framework for enhancing model efficiency. As the field of artificial intelligence continues to evolve, techniques like SimDiff will likely play a pivotal role in optimizing LLM deployment.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

SimDiff: Advanced Depth Pruning for Efficient LLMs

SimDiff: Depth Pruning via Similarity and Difference

Introduction to SimDiff

Key Metrics in SimDiff

Experimental Results

Recovery of Pruned Models

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related