Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment
Summary: arXiv:2604.05965v1 Announce Type: new
Abstract
Transcending the single-preference paradigm, aligning LLMs with diverse human values is pivotal for robust deployment. Contemporary Multi-Objective Preference Alignment (MPA) approaches predominantly rely on static linear scalarization or rigid gradient projection to navigate these trade-offs. However, by enforcing strict conflict avoidance or simultaneous descent, these paradigms often prematurely converge to local stationary points. While mathematically stable, these points represent a conservative compromise where the model sacrifices potential global Pareto improvements to avoid transient local trade-offs.
Introduction
To break this deadlock, we propose the Pareto-Lenient Consensus (PLC), a game-theoretic framework that reimagines alignment as a dynamic negotiation process. Unlike rigid approaches, PLC introduces consensus-driven lenient gradient rectification, which dynamically tolerates local degradation provided there is a sufficient dominant coalition surplus. This strategy empowers the optimization trajectory to escape local suboptimal equilibrium and explore the distal Pareto-optimal frontier.
Theoretical Analysis
Theoretical analysis validates that PLC can facilitate stalemate escape and asymptotically converge to a Pareto consensus equilibrium. This represents a significant departure from traditional methods that often yield limited improvements in alignment, as PLC allows for greater flexibility and adaptability in the optimization process.
Experimental Results
Extensive experiments demonstrate that PLC surpasses baseline models in two critical areas:
- Fixed-Preference Alignment: PLC shows improved performance in aligning LLMs to specific user preferences without sacrificing overall alignment quality.
- Global Pareto Frontier Quality: The framework effectively enhances the exploration of the Pareto frontier, leading to solutions that better reflect diverse human values.
Conclusion
This work highlights the potential of negotiation-driven alignment as a promising avenue for Multi-Objective Preference Alignment (MPA). By adopting a game-theoretic approach, PLC not only addresses the limitations of existing paradigms but also sets the stage for future research into more dynamic and flexible alignment strategies.
Availability
For those interested in exploring this innovative framework further, our codes are available at https://anonymous.4open.science/r/aaa-6BB8.
