Reliable Multi-Teacher Distillation for Low-Resource Summarization

Reliability Gated Multi-Teacher Distillation for Low Resource Abstractive Summarization

Published on: arXiv:2604.03192v1

Type: Cross

Abstract

In recent years, the demand for effective summarization techniques has increased, particularly in the context of low-resource languages. This article explores the implementation of multiteacher knowledge distillation for abstractive summarization, emphasizing a reliability-aware perspective. We propose two novel mechanisms: EWAD (Entropy Weighted Agreement Aware Distillation) and CPDP (Capacity Proportional Divergence Preservation), to enhance the summarization process.

Key Mechanisms

EWAD: This token-level mechanism facilitates the routing of supervision between teacher distillation and gold supervision, driven by inter-teacher agreement.
CPDP: This mechanism imposes a geometric constraint on the student model’s position, ensuring alignment with heterogeneous teachers.

Research Findings

Our comprehensive experiments utilized two Bangla datasets, involving 13 ablations of the BanglaT5 model and eight experiments with the Qwen2.5 model. The findings reveal several critical insights:

Logit level knowledge distillation (KD) yields the most reliable performance improvements.
More sophisticated distillation approaches enhance semantic similarity in short summaries but tend to degrade the quality of longer outputs.
Cross-lingual pseudo-labeling KD, applied across ten languages, managed to retain 71-122% of the teacher’s ROUGE L scores while achieving a compression rate of 3.2x.

Evaluation Insights

To ensure the robustness of our findings, we conducted a human-validated multi-judge evaluation of large language model (LLM) outputs. This evaluation highlighted a significant calibration bias within single-judge assessment pipelines, suggesting that a multi-judge approach may provide more reliable evaluations.

Conclusion

The results of our study underscore the importance of reliability-aware distillation approaches in enhancing low-resource abstractive summarization. By characterizing the conditions under which multi-teacher supervision improves summarization quality, we provide valuable insights for future research. Additionally, our findings indicate that in some circumstances, scaling data may outweigh the benefits gained from loss engineering.

Future Directions

As the field of natural language processing continues to evolve, the integration of reliability-aware mechanisms in summarization tasks could pave the way for more effective models, particularly for low-resource languages. Future work may explore further refinements to the EWAD and CPDP mechanisms, as well as their applicability to other languages and summarization contexts.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Reliable Multi-Teacher Distillation for Low-Resource Summarization

Reliability Gated Multi-Teacher Distillation for Low Resource Abstractive Summarization

Abstract

Key Mechanisms

Research Findings

Evaluation Insights

Conclusion

Future Directions

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related