Stress Detection from ECG Using Vision Transformer

Date:

Stress Classification from ECG Signals Using Vision Transformer

Summary: arXiv:2603.26721v1 Announce Type: cross

Abstract

Vision Transformers have shown tremendous success in numerous computer vision applications; however, they have not been exploited for stress assessment using physiological signals such as Electrocardiogram (ECG). In order to get the maximum benefit from the vision transformer for multilevel stress assessment, in this paper, we transform the raw ECG data into 2D spectrograms using short time Fourier transform (STFT).

Methodology

The 2D spectrograms are divided into patches for feeding to the transformer encoder. We also perform experiments with 1D CNN and ResNet-18 (CNN model). The methodology includes:

  • Transformation of raw ECG data into 2D spectrograms using STFT.
  • Patch division of spectrograms for transformer encoder input.
  • Comparative analysis with 1D CNN and ResNet-18 models.
  • Implementation of leave-one-subject-out cross-validation (LOSOCV) on WESAD and Ryerson Multimedia Lab (RML) datasets.

Challenges and Solutions

One of the biggest challenges of LOSOCV based experiments is to tackle the problem of intersubject variability. In this research, we address the issue of intersubject variability and show our success using 2D spectrograms and the attention mechanism of the transformer. The key points include:

  • Development of a robust method that deals with intersubject variability effectively.
  • Utilization of the attention mechanism in transformers to enhance model performance.
  • Demonstration of the vision transformer’s superior handling of variability compared to CNN-based models.

Results

Experiments show that the vision transformer handles the effect of intersubject variability much better than CNN-based models and beats all previous state-of-the-art methods by a considerable margin. The proposed method achieved:

  • 71.01% accuracy with the RML dataset.
  • 76.7% accuracy with the WESAD dataset for three-class classification.
  • 88.3% accuracy for binary classification on the WESAD dataset.

Conclusion

Moreover, our method is end-to-end, does not require handcrafted features, and can learn robust representations. This research highlights the potential of vision transformers for physiological signal analysis and opens avenues for further exploration in the domain of stress assessment. The findings emphasize the importance of leveraging advanced models like vision transformers to improve classification accuracy in challenging scenarios.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.