Resource-Conscious Modeling for Next-Day Discharge Prediction Using Clinical Notes
Summary: arXiv:2604.03498v1 Announce Type: new
Abstract
Timely discharge prediction is essential for optimizing bed turnover and resource allocation in elective spine surgery units. This study evaluates the feasibility of lightweight, fine-tuned large language models (LLMs) and traditional text-based models for predicting next-day discharge using postoperative clinical notes. We compared 13 models, including TF-IDF with XGBoost and LGBM, and compact LLMs (DistilGPT-2, Bio_ClinicalBERT) fine-tuned via LoRA. TF-IDF with LGBM achieved the best balance, with an F1-score of 0.47 for the discharge class, a recall of 0.51, and the highest AUC-ROC (0.80). While LoRA improved recall in DistilGPT2, overall transformer-based and generative models underperformed. These findings suggest interpretable, resource-efficient models may outperform compact LLMs in real-world, imbalanced clinical prediction tasks.
Introduction
The efficient management of hospital resources is a pressing concern, particularly in elective surgery departments where bed availability is critical. Predicting patient discharge is a key component in this management, enabling better planning and allocation of healthcare resources. This study focuses on the application of various modeling techniques to predict next-day discharge based on postoperative clinical notes.
Methodology
In this research, we explored a combination of traditional and modern approaches to text modeling. We examined the following methodologies:
- TF-IDF with XGBoost: A traditional machine learning approach using term frequency-inverse document frequency vectorization combined with XGBoost for classification.
- TF-IDF with LGBM: Similar to the previous method, but utilizing LightGBM for potentially improved performance.
- Lightweight LLMs: Compact versions of large language models like DistilGPT-2 and Bio_ClinicalBERT, optimized using Low-Rank Adaptation (LoRA) to enhance performance while maintaining efficiency.
Results
After extensive testing, the results highlighted the following:
- TF-IDF with LGBM achieved the best overall performance, resulting in an F1-score of 0.47 for predicting discharges.
- The recall rate for this model was 0.51, indicating a fair ability to identify patients who would be discharged the next day.
- The model also recorded the highest Area Under the Curve – Receiver Operating Characteristics (AUC-ROC) score of 0.80, indicating a strong capability in distinguishing between discharged and non-discharged patients.
- Although the LoRA technique improved the recall for DistilGPT-2, it was evident that transformer-based models generally underperformed in this specific clinical context.
Conclusion
This study underscores the importance of selecting appropriate models for clinical predictions, particularly in resource-constrained environments. The findings suggest that while compact large language models can be beneficial, traditional methods such as TF-IDF with LGBM may provide more interpretable and resource-efficient solutions for real-world clinical tasks. Future research should focus on refining these predictive models and exploring their applicability in various clinical settings.
