A Latent Risk-Aware Machine Learning Approach for Predicting Operational Success in Clinical Trials based on TrialsBank
Clinical trials are critical in the development of new therapies, yet they face significant challenges including high costs, lengthy timelines, and considerable operational risks. The ability to reliably predict the success of a clinical trial before its initiation is essential for optimizing resources and ensuring effective outcomes. However, existing methods for such predictions often fall short, focusing on isolated metrics or specific stages of development, and frequently relying on variables that are unavailable during the trial design phase.
In response to these challenges, a new hierarchical latent risk-aware machine learning framework has been proposed, aimed at predicting operational success in clinical trials. This framework utilizes a curated subset of TrialsBank, an AI-ready database developed by Sorintellis that includes data from 13,700 trials. The operational success in this context is defined by the trial’s ability to commence, progress, and conclude within the planned timelines, recruitment goals, and protocol specifications up to the point of database lock.
Framework Overview
The proposed approach decomposes the prediction of operational success into two distinct modeling stages:
- Prediction of Intermediate Latent Operational Risk Factors: This initial stage employs over 180 drug- and trial-level features that are available before the trial begins to predict intermediate latent operational risks.
- Estimation of Operational Success Probability: The predicted latent risks from the first stage are then integrated into a downstream model to estimate the overall probability of operational success.
Data Handling and Model Benchmarking
To ensure the integrity of the analysis, a staged data-splitting strategy was implemented to prevent information leakage across model training and testing phases. The models were benchmarked using several advanced machine learning techniques, including:
- XGBoost
- CatBoost
- Explainable Boosting Machines
Performance Metrics
The framework demonstrates impressive out-of-sample performance across various phases of clinical trials. The F1-scores achieved are:
- Phase I: 0.93
- Phase II: 0.92
- Phase III: 0.91
Notably, the incorporation of latent risk drivers significantly enhances the model’s ability to discriminate between operational successes and failures. Furthermore, the performance of the model remains robust when subjected to independent inference evaluation, underscoring its reliability.
Conclusion
The results of this study present a pivotal advancement in the prospective forecasting of clinical trial operational success through a latent risk-aware AI framework. By enabling early risk assessments, this innovative approach supports data-driven decision-making in clinical development, paving the way for more efficient and successful therapeutic advancements.
