Stress Classification from ECG Signals Using Vision Transformer
Summary: arXiv:2603.26721v1 Announce Type: cross
Abstract
Vision Transformers have shown tremendous success in numerous computer vision applications; however, they have not been exploited for stress assessment using physiological signals such as Electrocardiogram (ECG). In order to get the maximum benefit from the vision transformer for multilevel stress assessment, in this paper, we transform the raw ECG data into 2D spectrograms using short time Fourier transform (STFT).
Methodology
The 2D spectrograms are divided into patches for feeding to the transformer encoder. We also perform experiments with 1D CNN and ResNet-18 (CNN model). The methodology includes:
- Transformation of raw ECG data into 2D spectrograms using STFT.
- Patch division of spectrograms for transformer encoder input.
- Comparative analysis with 1D CNN and ResNet-18 models.
- Implementation of leave-one-subject-out cross-validation (LOSOCV) on WESAD and Ryerson Multimedia Lab (RML) datasets.
Challenges and Solutions
One of the biggest challenges of LOSOCV based experiments is to tackle the problem of intersubject variability. In this research, we address the issue of intersubject variability and show our success using 2D spectrograms and the attention mechanism of the transformer. The key points include:
- Development of a robust method that deals with intersubject variability effectively.
- Utilization of the attention mechanism in transformers to enhance model performance.
- Demonstration of the vision transformer’s superior handling of variability compared to CNN-based models.
Results
Experiments show that the vision transformer handles the effect of intersubject variability much better than CNN-based models and beats all previous state-of-the-art methods by a considerable margin. The proposed method achieved:
- 71.01% accuracy with the RML dataset.
- 76.7% accuracy with the WESAD dataset for three-class classification.
- 88.3% accuracy for binary classification on the WESAD dataset.
Conclusion
Moreover, our method is end-to-end, does not require handcrafted features, and can learn robust representations. This research highlights the potential of vision transformers for physiological signal analysis and opens avenues for further exploration in the domain of stress assessment. The findings emphasize the importance of leveraging advanced models like vision transformers to improve classification accuracy in challenging scenarios.
