GRACE: Efficient Dynamic Coreset Selection for LLMs

Date:

GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization

Summary: arXiv:2604.11810v1 Announce Type: cross

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. However, their immense number of parameters and complex transformer-based architectures result in significant resource demands and computational complexity during training, making it challenging to optimize them efficiently on large datasets.

To reduce training costs while preserving performance, researchers have investigated coreset selection techniques, which aim to identify small, representative subsets of the entire training dataset to accelerate LLM training. However, existing coreset selection methods fail to adapt to the dynamic nature of LLM training and often struggle with scalability for models of this size.

Introducing GRACE

To address these limitations, we propose a graph-guided adaptive and dynamic coreset selection framework for LLMs, namely GRACE. GRACE dynamically constructs and updates coresets by combining representation diversity with gradient-based importance metrics, ensuring both informativeness and efficiency.

Key Features of GRACE

  • Dynamic Coreset Construction: GRACE constructs coresets that evolve in response to changing training conditions, maintaining optimal representation of the dataset.
  • Combination of Metrics: By leveraging representation diversity alongside gradient-based importance metrics, GRACE ensures that the selected coresets are both informative and efficient.
  • $k$-NN Graph-Based Propagation: To mitigate the computational cost of frequent updates, GRACE employs a $k$-nearest neighbors graph-based mechanism, which helps in selectively updating scores and embeddings.
  • Adaptation to Training Dynamics: The framework is capable of adapting to the evolving dynamics of training, allowing it to remain effective even for large-scale models.

Experimental Validation

Extensive experiments conducted across three benchmarks demonstrate that GRACE significantly improves training efficiency and downstream performance across diverse LLMs and tasks. The results indicate that GRACE not only reduces resource demands but also maintains or enhances the performance of the models trained with this framework.

Conclusion

The emergence of GRACE represents a significant advancement in the field of large language model optimization. By effectively addressing the challenges associated with traditional coreset selection methods, GRACE provides a robust solution for enhancing the efficiency of LLM training. Future research will focus on refining the framework and exploring its applications across various domains in natural language processing.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.