Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation
Summary: arXiv:2604.00536v1 Announce Type: cross
Large language models (LLMs) have shown remarkable performance across various tasks, particularly due to their access to abundant supervised fine-tuning (SFT) data. However, there exists a significant challenge in acquiring high-quality SFT data in knowledge-intensive domains such as humanities, social sciences, medicine, law, and finance. The reasons for this scarcity include the high cost of expert curation, stringent privacy regulations, and the difficulty in ensuring consistent labeling.
To address this issue, recent research has shifted towards the utilization of synthetic data. This approach typically involves prompting a generator with domain-specific documents and filtering the outputs based on handcrafted rubrics. However, this rubric design is often dependent on expert knowledge, poorly transfers across different domains, and is improved through a fragile heuristic loop. This loop includes writing rubrics, synthesizing data, training models, inspecting results, and making manual adjustments. A critical limitation of this process is the lack of reliable quantitative feedback regarding how well a rubric impacts downstream performance.
To overcome these challenges, we propose an innovative framework called Optimsyn. This framework evaluates synthetic data based on its training utility for the target model, using this feedback to enhance data generation. Drawing inspiration from influence estimation, we implement an optimizer-aware estimator that leverages gradient information to assess the contribution of each synthetic sample to the target model’s objective across specific tasks.
Key Innovations of Optimsyn
- Influence Estimation: Our analysis reveals that even when synthetic and real samples are closely related in embedding space, their impact on learning can differ significantly. This insight drives our optimization-based approach.
- Rubric Adaptation: We adapt rubrics using feedback from the target model, ensuring that the generated synthetic data aligns more closely with the desired outcomes.
- Lightweight Guiding Text: By providing concise guiding text, we facilitate the generation of task-conditioned rubrics.
- Reinforcement Learning Optimization: The influence score serves as a reward signal, optimizing the rubric generator through reinforcement learning techniques.
Experimental Validation
The effectiveness of Optimsyn has been validated through numerous experiments across various domains, target models, and data generators. The results consistently demonstrate significant improvements in the quality of synthetic data generated, along with strong generalization capabilities without the need for task-specific tuning.
Conclusion
Optimsyn represents a significant advancement in the field of synthetic data generation, addressing the challenges associated with traditional rubric design and optimization. By incorporating model feedback into the rubric adaptation process and leveraging reinforcement learning, we can improve the training utility of synthetic data. This method not only enhances the performance of LLMs across various domains but also paves the way for future research in efficient data generation strategies.
