Bayes-MICE: A Bayesian Approach to Multiple Imputation for Time Series Data
Summary: arXiv:2603.27142v1 Announce Type: cross
Abstract: Time-series analysis is often affected by missing data, a common problem across several fields, including healthcare and environmental monitoring.
Multiple Imputation by Chained Equations (MICE) has been a prominent method for imputing missing values through “fully conditional specification”. In a new development, researchers have extended MICE using the Bayesian framework, termed Bayes-MICE, which leverages Bayesian inference to impute missing values through Markov Chain Monte Carlo (MCMC) sampling. This approach allows for a comprehensive account of uncertainty in both MICE model parameters and the imputed values.
Key Features of Bayes-MICE
- Bayesian Inference: Utilizes MCMC sampling techniques to impute missing data while considering the uncertainty inherent in the data and model.
- Temporally Informed Initialization: Incorporates prior information related to the time-series nature of the data, ensuring that the temporal dynamics are respected.
- Time-lagged Features: Integrates time-lagged variables to enhance the predictive power of the imputation model.
Methodology
The researchers evaluated the Bayes-MICE method using two real-world datasets: the AirQuality dataset and the PhysioNet dataset. They employed two different MCMC sampling techniques in their analysis, namely:
- Random Walk Metropolis (RWM): A traditional MCMC method that generates samples based on a random walk process.
- Metropolis-Adjusted Langevin Algorithm (MALA): An advanced sampling method that uses gradient information to improve sampling efficiency.
Results and Findings
The results from the evaluation of Bayes-MICE showed a significant reduction in imputation errors compared to baseline methods across all variables analyzed. Furthermore, the method effectively accounted for uncertainty in the imputation process, providing a more accurate measure of imputation error. Notably, the findings indicated that:
- MALA exhibited faster convergence compared to RWM, achieving comparable accuracy in less time.
- MALA provided more consistent posterior exploration, enhancing the robustness of the imputation results.
Conclusion
Overall, the Bayes-MICE framework presents a significant advancement in time-series imputation methodologies. By balancing increased accuracy with a meaningful quantification of uncertainty, Bayes-MICE demonstrates its potential applicability in various environmental and clinical settings. This innovative approach not only improves the quality of imputed data but also enhances the reliability of subsequent analyses performed on time-series datasets.
