Why Random Sampling Beats Active Selection in Modern LLMs

Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

Summary: arXiv:2604.02766v1 Announce Type: cross

Abstract

Modern LLMs inherit strong priors from web-scale pretraining, which can limit the headroom of post-training data-selection strategies. While Active Preference Learning (APL) seeks to optimize query efficiency in online Direct Preference Optimization (DPO), the inherent richness of on-policy candidate pools often renders simple Random sampling a surprisingly formidable baseline.

Key Findings

This article evaluates uncertainty-based APL against Random across various settings, including harmlessness, helpfulness, and instruction-following. The evaluation employs both reward models and LLM-as-a-judge proxies to measure the effectiveness of these strategies.

Methodology

The study involves the following key components:

Active Preference Learning (APL): A method intended to enhance query efficiency in the context of online Direct Preference Optimization.
Random Sampling: A baseline method that utilizes random selection from a rich pool of on-policy candidates.
Evaluation Criteria: The effectiveness of these methods is assessed based on three primary metrics: harmlessness, helpfulness, and instruction-following.

Results

The findings from the evaluation reveal some surprising insights:

APL yields negligible improvements in proxy win-rates compared to Random sampling.
A dissociation is observed where win-rate improves even as the general capability, measured by standard benchmarks, degrades.
APL does not effectively mitigate capability collapse or significantly reduce variance when compared to random sampling.

Implications

This research highlights important implications for the field of AI and machine learning:

In scenarios dominated by strong pre-trained priors, the computational overhead associated with active selection may not be justified.
The “cheap diversity” offered by simple random samples can often outperform more complex selection strategies.
Future research should consider the balance between computational efficiency and the effectiveness of selection methods in LLM training.

Conclusion

The study’s conclusions prompt a reevaluation of how active selection methods are applied in the training of modern LLMs. As the field continues to evolve, understanding the dynamics between pre-trained priors and selection strategies will be crucial for optimizing performance.

For more details, the code and data used in this research are publicly available at https://github.com/BootsofLagrangian/random-vs-apl.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Why Random Sampling Beats Active Selection in Modern LLMs

Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

Abstract

Key Findings

Methodology

Results

Implications

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related