Extending Tabular Denoising Diffusion Probabilistic Models for Time-Series Data Generation
Summary: arXiv:2604.05257v1 Announce Type: cross
Abstract
Diffusion models are increasingly being utilized to create synthetic tabular and time series data for privacy-preserving augmentation. Tabular Denoising Diffusion Probabilistic Models (TabDDPM) generate high-quality synthetic data from heterogeneous tabular datasets but assume independence between samples, limiting their applicability to time-series domains where temporal dependencies are critical. To address this, we propose a temporal extension of TabDDPM, introducing sequence awareness through the use of lightweight temporal adapters and context-aware embedding modules.
Introduction
The ability to generate synthetic data is crucial in many applications, especially in fields such as healthcare, finance, and social sciences where data privacy is paramount. Traditional methods for generating synthetic data often struggle with maintaining the temporal dependencies present in time-series data. This article discusses the enhancements made to TabDDPM, focusing on how these improvements can facilitate more effective time-series data generation.
Methodology
Our approach involves reformulating sensor data into windowed sequences and explicitly modeling temporal context. This is achieved through several techniques:
- Lightweight Temporal Adapters: These components allow the model to account for temporal relationships in the data.
- Context-Aware Embedding Modules: These modules enhance the representation of data by incorporating contextual information.
- Windowed Sequences: By breaking down the data into sequences, we enable the model to understand and generate coherent temporal patterns.
- Timestep Embeddings: These embeddings help the model track the progression of time within the sequence.
- Conditional Activity Labels and Observed/Missing Masks: These features further refine the model’s ability to produce realistic time-series data.
Results
To validate our approach, we conducted experiments using the WISDM accelerometer dataset. Our results indicate that the proposed system generates synthetic time-series data that closely resembles real-world sensor patterns. Specifically, we achieved:
- Enhanced temporal realism as evidenced by bigram transition matrices.
- Improved diversity and coherence of generated sequences.
- A comparable classification performance with a macro F1-score of 0.64 and an accuracy of 0.71.
- Effective representation of minority classes and preservation of statistical alignment with real distributions.
Conclusion
These developments illustrate that diffusion-based models can serve as effective and adaptable solutions for sequential data synthesis, particularly when equipped for temporal reasoning. The introduction of temporal adapters and context-aware embedding modules significantly enhances the model’s capability to generate high-quality, temporally coherent data. As we look to the future, further research will focus on scaling these methods to longer sequences and integrating more robust temporal architectures.
