Sign Language Recognition Using Video, Skeleton Data and Deep Learning
Fecha
2025-10-20Autor
Mederos, Boris
Mejia, Jose
Díaz Román, José David
Rascon Madrigal, Lidia Hortencia
Cota Ruiz, Juan De Dios
Medina Reyes, Alejandro
Metadatos
Mostrar el registro completo del ítemResumen
Isolated Sign Language Recognition (SLR) focuses on classifying individual signs from video, a task typically addressed using
accurate but computationally intensive vision-based models. This work explores skeleton-based representations extracted from RGB sequences, which capture essential motion patterns with lower dimensionality.
We propose four deep learning models combining convolutional layers with GRU or minGRU units, processing skeletons as 1D vectors or 3D joint trajectories, with ensembles to improve robustness.
Results show skeleton-based models achieve accuracy comparable to
video-based approaches while requiring far fewer resources. Notably, the
Conv1D+GRU ensemble reaches 88.32% Top-1 accuracy, nearly matching 88.90% of ResNet2D+1, while cutting training time from over 43 h to about 1 h and inference from hundreds of seconds to under one second on the test set. Ensembles consistently enhance performance across architectures. These findings show that skeleton-based modeling retains
discriminative information, providing a fast, efficient solution for SLR.
Colecciones
- Memoria en extenso [336]
El ítem tiene asociados los siguientes archivos de licencia:

