Learning Preference-Based Objectives from Clinical Narratives for Sequential Treatment Decision-Making
Summary: arXiv:2604.10783v1 Announce Type: new
Abstract: Designing reward functions remains a central challenge in reinforcement learning (RL) for healthcare, where outcomes are sparse, delayed, and difficult to specify. While structured data capture physiological states, they often fail to reflect the overall quality of a patient’s clinical trajectory, including recovery dynamics, treatment burden, and stability. Clinical narratives, in contrast, summarize longitudinal reasoning and implicitly encode evaluations of treatment effectiveness.
Introduction
In the realm of healthcare, the application of reinforcement learning is becoming increasingly prominent. However, the design of effective reward functions remains a significant hurdle. Traditional approaches often rely on structured data, which can overlook critical aspects of patient care. This article presents a novel framework, Clinical Narrative-informed Preference Rewards (CN-PR), which aims to address these challenges by leveraging clinical narratives.
Framework Overview
The CN-PR framework proposes to learn reward functions directly from discharge summaries. By treating these narratives as scalable supervision for trajectory-level preferences, we can derive valuable insights into treatment effectiveness. The use of a large language model allows for the extraction of trajectory quality scores (TQS) and the construction of pairwise preferences over patient trajectories.
Methodology
The framework incorporates a confidence signal to account for variability in narrative informativeness. This signal weights the supervision based on its relevance to the decision-making task, ensuring that the learning process is both efficient and effective.
Results
The results of our study indicate that the learned reward aligns strongly with trajectory quality, achieving a Spearman correlation coefficient of 0.63. Furthermore, the policies derived from these rewards are consistently associated with improved recovery-related outcomes. Notable achievements include:
- Increased organ support-free days.
- Faster resolution of shock conditions.
- Comparable performance on mortality outcomes.
Significantly, these effects have been validated externally, underscoring the robustness of our approach.
Conclusion
Our findings demonstrate that narrative-derived supervision offers a scalable and expressive alternative to traditional methods of reward design in dynamic treatment regimes. By utilizing clinical narratives, we can better capture the complexities of patient trajectories, leading to more informed and effective treatment decisions. This innovative approach holds promise for enhancing patient care through the application of advanced machine learning techniques in healthcare.
