Dynamic LIBRAS Gesture Recognition via CNN over Spatiotemporal Matrix Representation
Summary: arXiv:2603.25863v1 Announce Type: cross
Abstract
This paper proposes a method for dynamic hand gesture recognition based on the composition of two models: the MediaPipe Hand Landmarker, responsible for extracting 21 skeletal keypoints of the hand, and a convolutional neural network (CNN) trained to classify gestures from a spatiotemporal matrix representation of dimensions 90 by 21 of those keypoints. The method is applied to the recognition of LIBRAS (Brazilian Sign Language) gestures for device control in a home automation system, covering 11 classes of static and dynamic gestures.
Introduction
Gesture recognition technology has gained significant traction in recent years, particularly in applications related to accessibility and interaction with smart devices. This paper introduces a novel approach to recognizing LIBRAS gestures, which are vital for effective communication within the Brazilian deaf community. By leveraging advanced machine learning techniques, the proposed method aims to enhance user experience in home automation systems.
Methodology
The proposed method integrates two primary components:
- MediaPipe Hand Landmarker: This tool is essential for extracting 21 skeletal keypoints from the user’s hand. These keypoints provide a detailed representation of hand movements, which are critical for accurate gesture recognition.
- Convolutional Neural Network (CNN): The CNN is specifically trained to classify the gestures derived from the spatiotemporal matrix representation. The matrix is structured with dimensions of 90 by 21, allowing the CNN to learn temporal dynamics and spatial configurations simultaneously.
Results
The method was evaluated for its effectiveness in recognizing LIBRAS gestures, specifically focusing on its application in device control. The study encompassed 11 different classes of static and dynamic gestures. A sliding window technique was employed, which utilized temporal frame triplication to facilitate real-time inference without the need for recurrent networks.
Test results demonstrated a remarkable accuracy of 95% under low-light conditions and 92% under normal lighting. These findings suggest that the proposed method is robust and can function effectively in varying environmental conditions, making it suitable for practical applications.
Discussion
While the results are promising, the study acknowledges that further investigation is necessary. Systematic experiments involving a more diverse user base will be crucial for evaluating the generalization capabilities of the model. Understanding how different users may influence gesture recognition accuracy will help refine the system for broader applicability.
Conclusion
This research presents a significant step forward in the field of gesture recognition, specifically for LIBRAS. By combining the strengths of MediaPipe Hand Landmarker and CNN, the proposed method offers a reliable solution for real-time gesture recognition, which can greatly enhance user interaction with smart home devices. Future work will focus on expanding the dataset and optimizing the model for even greater accuracy and user adaptability.
Keywords
Gesture Recognition, LIBRAS, Convolutional Neural Network, MediaPipe, Home Automation, Spatiotemporal Matrix.
