Accurate LIBRAS Gesture Recognition Using CNN & MediaPipe

Dynamic LIBRAS Gesture Recognition via CNN over Spatiotemporal Matrix Representation

Summary: arXiv:2603.25863v1 Announce Type: cross

Abstract

This paper proposes a method for dynamic hand gesture recognition based on the composition of two models: the MediaPipe Hand Landmarker, responsible for extracting 21 skeletal keypoints of the hand, and a convolutional neural network (CNN) trained to classify gestures from a spatiotemporal matrix representation of dimensions 90 by 21 of those keypoints. The method is applied to the recognition of LIBRAS (Brazilian Sign Language) gestures for device control in a home automation system, covering 11 classes of static and dynamic gestures.

Introduction

Gesture recognition technology has gained significant traction in recent years, particularly in applications related to accessibility and interaction with smart devices. This paper introduces a novel approach to recognizing LIBRAS gestures, which are vital for effective communication within the Brazilian deaf community. By leveraging advanced machine learning techniques, the proposed method aims to enhance user experience in home automation systems.

Methodology

The proposed method integrates two primary components:

MediaPipe Hand Landmarker: This tool is essential for extracting 21 skeletal keypoints from the user’s hand. These keypoints provide a detailed representation of hand movements, which are critical for accurate gesture recognition.
Convolutional Neural Network (CNN): The CNN is specifically trained to classify the gestures derived from the spatiotemporal matrix representation. The matrix is structured with dimensions of 90 by 21, allowing the CNN to learn temporal dynamics and spatial configurations simultaneously.

Results

The method was evaluated for its effectiveness in recognizing LIBRAS gestures, specifically focusing on its application in device control. The study encompassed 11 different classes of static and dynamic gestures. A sliding window technique was employed, which utilized temporal frame triplication to facilitate real-time inference without the need for recurrent networks.

Test results demonstrated a remarkable accuracy of 95% under low-light conditions and 92% under normal lighting. These findings suggest that the proposed method is robust and can function effectively in varying environmental conditions, making it suitable for practical applications.

Discussion

While the results are promising, the study acknowledges that further investigation is necessary. Systematic experiments involving a more diverse user base will be crucial for evaluating the generalization capabilities of the model. Understanding how different users may influence gesture recognition accuracy will help refine the system for broader applicability.

Conclusion

This research presents a significant step forward in the field of gesture recognition, specifically for LIBRAS. By combining the strengths of MediaPipe Hand Landmarker and CNN, the proposed method offers a reliable solution for real-time gesture recognition, which can greatly enhance user interaction with smart home devices. Future work will focus on expanding the dataset and optimizing the model for even greater accuracy and user adaptability.

Keywords

Gesture Recognition, LIBRAS, Convolutional Neural Network, MediaPipe, Home Automation, Spatiotemporal Matrix.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Accurate LIBRAS Gesture Recognition Using CNN & MediaPipe

Dynamic LIBRAS Gesture Recognition via CNN over Spatiotemporal Matrix Representation

Abstract

Introduction

Methodology

Results

Discussion

Conclusion

Keywords

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related