Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning
Summary: arXiv:2511.21075v2 Announce Type: replace-cross
Abstract
Aligning Large Language Models (LLMs) with biomedical knowledge requires understanding both concepts and causal mechanisms in scientific reports. Supervised Fine-Tuning (SFT) often fails to capture these logical structures, while Reinforcement Learning (RL) is limited by sparse reward signals. We propose Balanced Fine-Tuning (BFT), a dual-scale post-training method that stabilizes training via confidence-weighted token-level optimization and adaptively emphasizes knowledge-dense hard samples using minimum group confidence.
Introduction
The integration of Large Language Models (LLMs) into the biomedical field has revolutionized the way researchers and practitioners access, interpret, and generate biomedical knowledge. However, the effectiveness of these models largely depends on their alignment with the nuanced complexities of biomedical concepts and causal relationships found in scientific literature.
The Challenges of Current Approaches
Despite the advancements in supervised learning techniques, Supervised Fine-Tuning (SFT) often falls short in capturing the intricate logical structures inherent in biomedical data. Furthermore, Reinforcement Learning (RL) approaches are impeded by sparse reward signals, which can lead to inefficient learning and suboptimal model performance.
Introducing Balanced Fine-Tuning (BFT)
To address these limitations, we propose a novel approach known as Balanced Fine-Tuning (BFT). BFT is a dual-scale post-training method designed to enhance the alignment of LLMs with biomedical knowledge through the following key strategies:
- Confidence-Weighted Token-Level Optimization: BFT stabilizes the training process by focusing on the confidence levels of tokens, allowing the model to prioritize more critical elements in the data.
- Adaptive Emphasis on Hard Samples: The method intelligently identifies and emphasizes knowledge-dense hard samples using minimum group confidence, ensuring that the model learns from its most challenging examples.
Experimental Results
Extensive experiments conducted on medical and biological reasoning benchmarks demonstrate that BFT consistently outperforms traditional SFT methods. Notably, it achieves competitive or superior performance compared to specialized systems like GeneAgent. These findings highlight the effectiveness of BFT in enhancing the generative accuracy of LLMs.
Improving Fidelity in Biomedical Descriptions
In addition to improving generative accuracy, BFT significantly enhances the fidelity of LLM-generated biomedical entity descriptions. The embeddings produced by standard encoders in models fine-tuned using BFT outperform those generated by domain-specific biological foundation models, showcasing the versatility and robustness of a single post-trained LLM.
Conclusion
Overall, Balanced Fine-Tuning (BFT) provides a concise and effective framework for aligning Large Language Models with biomedical knowledge. By bridging generative capabilities with representational analysis, BFT opens new avenues for more sophisticated applications in biomedical research and clinical practice. As the field continues to evolve, the integration of such innovative methodologies will be crucial in harnessing the full potential of LLMs in the biomedical domain.
