A Survey of Sign Language Recognition Technology Based on Sign Language Expression Content and Expression Characteristics
-
摘要: 手语识别(SLR)技术是打破听障人群与健听人群间交流壁垒的重要技术手段。该文综述了近几年的手语数据集、评价指标以及手语识别方法。首先,系统梳理了手语数据集并分析了手语识别方法的数据集发展方向。其次,详细介绍了手语识别方法的评价指标。然后,根据手语表达内容、手语识别方法所采用的特征分类总结分析了孤立词手语识别方法与连续语句识别方法、仅依靠手部特征的手语识别方法与多特征融合的手语识别方法。最后探讨了手语识别技术面临的挑战及其发展方向。Abstract: Sign Language Recognition (SLR) technology is an important technical means to break the communication barrier between hearing-impaired people and healthy people. The sign language datasets, evaluation indicators and sign language recognition methods in recent years are summarized. Firstly, the sign language dataset is systematically summarized and the development trend of the dataset of sign language recognition methods is analyzed. Secondly, the evaluation indicator of sign language recognition method is introduced in detail. Then, according to the content of sign language expression and the features used in sign language recognition methods, isolated word sign language recognition methods and continuous sign language recognition methods, sign language recognition methods relying only on hand features and sign language recognition methods of multi feature fusion are summarized and analyzed. Finally, the challenges and development direction of sign language recognition technology are discussed.
-
表 1 孤立词手语数据集
建立年份 数据集名称 语言 样本数量 标签数量 数据形式 样本类型 录制人数 开放程度 真实场景 2007 GSL-20[26] 希腊语 840 20 RGB 词语 6 请求 × 2007 GSL isol.[27] 希腊语 40785 310 RGB-D 词语 7 注册 × 2011 ASLLVD[28] 英语 9800 3300 RGB 词语 6 开放 × 2012 DGS Kinect 40[29] 德语 3000 40 RGB-D/骨架 词语 15 请求 × 2013 PSL TOF 84[30] 波兰语 1680 84 RGB-D 词语 1 开放 × 2014 PSL Kinect 30[30] 300 30 DEVISIGN-G[23] 中文 432 36 RGB 词语 8 请求 × DEVISIGN-D[23] 6000 200 DEVISIGN-L[23] 24000 500 2014 ChaLearn[31] 英语 50000 249 RGB-D 词语 7 部分开放 × 2015 CSL-500[9] 中文 125000 500 RGB-D/骨架 词语 50 开放 × 2016 LSA64[32] 西班牙语 3200 64 RGB 词语 10 开放 × 2019 GSL[33] 希腊语 40785 310 RGB-D 词语 7 请求 × 2019 WLASL2000[34] 英语 21083 2000 RGB 词语 119 开放 多背景 2020 RKS-PERSIANSIGN[35] 波斯语 10000 100 RGB 词语 10 开放 √(10) 2020 KSL[36] 韩语 1229 77 RGB/光流 词语 20 开放 √ 2021 NCSL[24] 中文 90000 300 RGB 词语 30 请求 × 2021 NMFs-CSL[25] 中文 32010 1067 RGB 词语 10 请求 × 2022 ASL-SKELETON3D[21]
ASL-Phono[21]英语 9747 3300 3D
RGB词语 6 请求 × 2022 ASLLRP Sign Bank[37] 英语 41830 6000 RGB 词语 开放 × 表 2 连续语句手语数据集
建立年份 数据集名称 语言 样本数量 标签数量 数据形式 样本类型 录制人数 开放程度 真实场景 2007 SIGNUM[38] 德语 33 210 780 RGB 句子 20 开放 × 2007 GSL SD[27] 希腊语 10 290 310 RGB 句子 7 请求 × GSL SI[27] 10 290 310 句子 7 × 2012 RWTH-PHOENIX-
Weather[39]德语 45 760 1 200 RGB 句子 9 开放 × 2015 CSL-100[9] 中文 25 000 100 RGB-D/骨架 句子 50 开放 × 2016 LSE-Sign[40] 西班牙语 2 400 2 400 RGB 句子 2 注册 √ 2016 MSR[22] 德语 33 210 450 RGB 句子 25 × 2018 RWTH-PHOENIX-
Weather 2014T[41]德语 67 781 1 066 RGB 句子 9 开放 √ 2019 GSL[33] 希腊语 10 295 310 RGB-D 句子 7 请求 × 2019 How2Sign[42] 英语 36 773 16 000 RGB-D/骨架/语音等 句子 11 开放 多场景 表 3 孤立词手语识别方法
模型分类 方法 模型方法 数据集 Acc(%) 备注(工作关注点) 传统模型 图像处理 Canny边缘检测[48] ASL Alphabet
ASL99.00
84.30Bag Of Features[49] 自制英文字母数据集 85.20 阈值、颜色检测等 特征提取 Bag Of Features[49] 自制英文字母数据集 85.20 SURF、K-近邻等 HOG-PCA[50] 阿拉伯字母数据集 99.20 RGB SURF、SIFT[51] Kinect Depth Datasets >80.00 比较两种变换效果 分类识别 Quadratic SVM[49] 自制英文字母数据集 85.20 比较2次与3次SVM SVM[50] 阿拉伯字母数据集 99.20 DTW-HMM[52] AUSLdataset 87.40,92.40 CTC[53] Real-time 42.00 室外进行 神经网络 卷积神经
网络CNN, diffGrad优化[54] 自制印度数据集 99.64 结合数据增强 C3D, 2DCNN[55] Graffiti数据集
In-house92.60
89.70OFMT C3D [56] Kinect Datasets 94.20 多模态信息 I3D[57] WLASL 2000
MS-ASL10087.47
96.66多特征、多模态 R(2+1)D[58] CSL-500 97.45 预训练,注意力 循环神经网络 LSTM[59] CSL-500 63.30 LSTM-RNN结合k-近邻[60] ASL Fingerspelling 99.44 BiLSTM[61] ASL Datasets 97.98 结合迁移学习 Bi-ConvLSTM[62] ASL Datasets 98.81 实时摄像头下ACC为90%,
结合迁移学习FFV-Bi-LSTM[63] ASL Datasets 98.33 体感系统 图神经网络 STGC-Transformer[64] 自制日本数据集 12.14(WER) CTC结合交叉熵 MS-G3D AUTSL[65]
MS-G3D LSE[65]WLASL2000 95.24
93.91迁移学习 GAN网络 H-GANs[66] ASLLVD
RWTH-PHOENIX
Weather 20141.40(CER)
20.70(WER)20个特征融合 注意力机制 Spatial
Temporal3D CNN[67] CSL-500
ChaLearn1488.70
95.30(Jaccard Index)3DCNN提取时空特征
结合时间,空间注意力Hierarchical Temporal HTAN[68] CSL-500 93.10 分层时间注意力网络 Global
localRes-C3D[69] CSL-500
DEVSIGN_D89.20
91.00全局-视频时间序列
局部-目标检测定位Transformer Transformer[70] WLASL-100
WLASL-300
LSA6463.20 (TOP1)
43.80 (TOP1)
100.00致力于小计算量模型 BERT BERT,3DCNN,LSTM[71] RKS-PERSIANSIGN
ASLLVD74.60
68.80提取特征,权衡多模态,特征映射 迁移学习 特征 I3D[72] ChaLearn249 62.09 迁移时空特征 共享参数 Alexnet,R-CNN[73] Turkish Sign Language 99.70 TensorFlow Object
Detection API[74]自制印度手语数据集 85.45 零样本学习 Zero-Shot 3DCNN, LSTM[75] ASL-Text 51.40 LSTM, BERT[71]
C3D, VSDRKS-PERSIANSIGN
First-Person
ASLVID
isoGD74.60
67.20
68.80
60.20Multi-modal 表 4 连续语句手语识别方法
时间(年) 模型方法 数据集 WER(%) 备注(工作关注点) 2002 HMM[89] 97 German signs 91.70(ACC) 结合束搜索 2012 GHMM[90] SIGNUM 13.00 MLP 2017 CNN, LSTM[91] 视频教材 98.43 双流2DCNN 2019 CNN-Transformer-CTC[92] PHOENIX-2014
PHOENIX-2014-T26.00
26.10特征提取, 上下文信息 2020 CNN-Transformer-CTC[93] PHOENIX-2014-T 24.59 手语识别+口语翻译 2021 CNN-BiLSTM-CTC[94] Indian Sign Language 15.14 孤立词迁移句子 2021 RNN-Transducer[83] CSL-100 6.10 H2SNet 2021 HST-GNN[95] PHOENIX-2014-T
CSL-10019.50
27.60graph convolution
graph self-attentions2021 SLRGAN[96] PHOENIX-2014
CSL-10023.40
2.10语境信息 2021 CNN-Transformer-CTC [97] PHOENIX-2014 29.78 多模态,注意力 2022 CNN-Transformer-CTC[98] PHOENIX-2014-T
PHOENIX-201422.90
23.20相对位置编码 2022 GoogleLeNet-Tconvs-CTC[91]
3D-ResNet-BLSTM-CTC[91]
I3D-BLSTM-CTC[91]PHOENIX-2014, CSL-100 46.41, 2.41
50.98, 13.36
52.71, 2.72卷积网络对比研究 表 5 基于模型所利用表达特征的手语识别方法
特征部位 年份 方法模型 数据集 Acc(%) 备注(工作关注点) 手部 2018[106] RBM NYU
ASL Fingerspelling A90.01
98.13multi-modal 2018[44] CNN STB
Dexter
EgoDexter96.5(AUC)
64(AUC)
54(AUC)Real-time
Pose estimation
Hand tracking2019[104] CNN RWTH-BOSTON-50
ASLLVD89.33
31.50Hand tracking
Pre-trained2019[108] CNN Kinect and LM data 97.66 multi-modal 2020[109] ASLNN DGSLR dataset 96.78 Hand pose tracking 2021[48] CNN ASL Alphabet 99.00 Image processing 2021[105] HMM, Camshift ASSLRP dataset 77.75 Hand tracking 2021[107] S2VT USTC-SLR 95.60(98.40) 减少训练参数 2021[71] LSTM, BERT
C3D, VSDRKS-PERSIANSIGN
First-Person
ASLVID
isoGD74.60
67.20
68.80
60.20Zero-shot
Transformer
Multi-modal2021[100] MPH, SVM, GBM Massey
ASL Alphabet
Finger Spelling A99.39
87.60
98.45Hand Pose
Estimation
(网络摄像头)2022[102] CNN, SVD ASLVID 93.00 手部关键点 2022[101] MPH Thai Finger Spelling schemes 84.57(S1,S2)
23.66(P1)hand-keypoint
detection2022[73] Alexnet(预训练)
R-CNNTurkish Sign Language 99.70(AP) Transfer learning 2022[110] mRMR-PSO ASL
ISL dataset
NUS Dataset II
Arabic Dataset84.30
98.70
92.06
85.60多模型组合
复杂背景环境2022[74] TensorFlow Object
Detection APIIndian Sign Language 85.45 Real-time
Transfer learning手部、口型、表情、
身体姿态等2015[48] CNN, HMM RWTH-PHOENIX-Weather 55.70(精度) 手部、口型 2020[111] 3DCNN Bosphorus-Sign22k Turkish
Isolated SL dataset99.78 手、面部、身体按照权重融合训练 2020[15] SMPL reverse SURREAL
Human3.6M datasets62.3,40.8(mm)
提升25(mm)姿态恢复(RGB-3D)
身体姿态2021[112] SMPL-X GSLL Dataset 94.77 手、面部、身体结合
光流+RGB,姿态恢复2021[66] H-GANs RWTH-PHOENIX-Weather 2014
ASLLVD20.70(WER)
1.40(CER)手、脸型、头、眼睛等20种特征,
参数优化,连续手语,降维2020[25] GLE-Net NMFs-CSL
SLR50090.50
96.80上下文关系,判别
fine-grained cues -
[1] MURRAY J. World federation of the deaf[EB/OL]. http://wfdeaf.org/our-work/, 2020. [2] 贾湧强. 我国语言障碍康复需求超3000万人[EB/OL]. https://m.btime.com/item/router?gid=40ea0atodav8fk8lcqahek43ilu, 2020. [3] 网信郑州. 「健康生活 有你有我」在中国, 2780万人的生活被按下了静音键······[EB/OL]. https://baijiahao.baidu.com/s?id=1678969855122384979&8;wfr=spider&;for=pc, 2020. [4] 周宇. 中国手语识别中自适应问题的研究[D]. [博士论文], 哈尔滨工业大学, 2009.ZHOU Y. Research on signer adaptation in Chinese sign language recognition[D]. [Ph. D. dissertation], Harbin Institute of Technology, 2009. [5] GAO Wen, FANG Gaolin, ZHAO Debin, et al. A Chinese sign language recognition system based on SOFM/SRN/HMM[J]. Pattern Recognition, 2004, 37(12): 2389–2402. doi: 10.1016/S0031-3203(04)00165-7 [6] MAZUMDAR D, TALUKDAR A K, and SARMA K K. Gloved and free hand tracking based hand gesture recognition[C]. The 1st International Conference on Emerging Trends and Applications in Computer Science, Shillong, India, 2013: 197–202. [7] WANG R Y and POPOVIĆ J. Real-time hand-tracking with a color glove[J]. ACM Transactions on Graphics, 2009, 28(3): 63. doi: 10.1145/1531326.1531369 [8] HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Sign language recognition using real-sense[C]. 2015 IEEE China Summit and International Conference on Signal and Information Processing, Chengdu, China, 2015: 166–170. [9] HUANG Jie, ZHOU Wengang, ZHANG Qilin, et al. Video-based sign language recognition without temporal segmentation[C]. The Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, Louisiana, USA, 2018: 275. [10] MITTAL A, KUMAR P, ROY P P, et al. A modified LSTM model for continuous sign language recognition using leap motion[J]. IEEE Sensors Journal, 2019, 19(16): 7056–7063. doi: 10.1109/JSEN.2019.2909837 [11] WANG Hanjie, CHAI Xiujuan, and CHEN Xilin. A novel sign language recognition framework using hierarchical Grassmann covariance matrix[J]. IEEE Transactions on Multimedia, 2019, 21(11): 2806–2814. doi: 10.1109/TMM.2019.2915032 [12] 王骐, 陈熙霖, 王春立, 等. 一种可处理数据缺失的视角无关手语识别方法[J]. 计算机学报, 2009, 32(5): 953–961. doi: 10.3724/SP.J.1016.2009.00953WANG Qi, CHEN Xilin, WANG Chunli, et al. A data-deficiency-tolerated method for viewpoint independent sign language recognition[J]. Chinese Journal of Computers, 2009, 32(5): 953–961. doi: 10.3724/SP.J.1016.2009.00953 [13] LIANG R H and OUHYOUNG M. A real-time continuous gesture recognition system for sign language[C]. The 3rd IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 1998: 558–567. [14] YU S H, HUANG C L, HSU S C, et al. Vision-based continuous sign language recognition using product HMM[C]. The 1st Asian Conference on Pattern Recognition, Beijing, China, 2011: 510–514. [15] MADADI M, BERTICHE H, and ESCALERA S. SMPLR: Deep learning based SMPL reverse for 3D human pose and shape recovery[J]. Pattern Recognition, 2020, 106: 107472. doi: 10.1016/j.patcog.2020.107472 [16] CELEBI S, AYDIN A S, TEMIZ T T, et al. Gesture recognition using skeleton data with weighted dynamic time warping[C]. The International Conference on Computer Vision Theory and Applications, Barcelona, Spain, 2013: 620–625. [17] SUN Chao, ZHANG Tianzhu, and XU Changsheng. Latent support vector machine modeling for sign language recognition with Kinect[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(2): 20. doi: 10.1145/2629481 [18] 张淑军, 张群, 李辉. 基于深度学习的手语识别综述[J]. 电子与信息学报, 2020, 42(4): 1021–1032. doi: 10.11999/JEIT190416ZHANG Shujun, ZHANG Qun, and LI Hui. Review of sign language recognition based on deep learning[J]. Journal of Electronics &Information Technology, 2020, 42(4): 1021–1032. doi: 10.11999/JEIT190416 [19] 米娜瓦尔·阿不拉, 阿里甫·库尔班, 解启娜, 等. 手语识别方法与技术综述[J]. 计算机工程与应用, 2021, 57(18): 1–12. doi: 10.3778/j.issn.1002-8331.2104-0220MINAWAER·ABULA, ALIFU·KUERBAN, XIE Qina, et al. Review of sign language recognition methods and techniques[J]. Computer Engineering and Applications, 2021, 57(18): 1–12. doi: 10.3778/j.issn.1002-8331.2104-0220 [20] 郭丹, 唐申庚, 洪日昌, 等. 手语识别、翻译与生成综述[J]. 计算机科学, 2021, 48(3): 60–70. doi: 10.11896/jsjkx.210100227GUO Dan, TANG Shengeng, HONG Richang, et al. Review of sign language recognition, translation and generation[J]. Computer Science, 2021, 48(3): 60–70. doi: 10.11896/jsjkx.210100227 [21] DE AMORIM C C and ZANCHETTIN C. ASL-Skeleton3D and ASL-phono: Two novel datasets for the American sign language[J]. arXiv: 2201.02065, 2022. [22] CHEN Chen, ZHANG Baochang, HOU Zhenjie, et al. Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features[J]. Multimedia Tools and Applications, 2017, 76(3): 4651–4669. doi: 10.1007/s11042-016-3284-7 [23] CHAI X, WANG H, and CHEN X. The DEVISIGN large vocabulary of Chinese sign language database and baseline evaluations[R]. Technical Report VIPL-TR-14-SLR-001, 2014. [24] WANG Fei, DU Yuxuan, WANG Guorui, et al. (2+1) D-SLR: An efficient network for video sign language recognition[J]. Neural Computing and Applications, 2022, 34(3): 2413–2423. doi: 10.1007/s00521-021-06467-9 [25] HU Hezhen, ZHOU Wengang, PU Junfu, et al. Global-local enhancement network for NMF-aware sign language recognition[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2021, 17(3): 80. doi: 10.1145/3436754 [26] EFTHIMIOU E and FOTINEA S E. GSLC: Creation and annotation of a Greek sign language corpus for HCI[C]. The 4th International Conference on Universal Access in Human-Computer Interaction, Beijing, China, 2007: 657–666. [27] ADALOGLOU N, CHATZIS T, PAPASTRATIS I, et al. A comprehensive study on sign language recognition methods[J]. arXiv: 2007.12530, 2020. [28] NEIDLE C, THANGALI A, and SCLAROFF S. Challenges in development of the American sign language lexicon video dataset (ASLLVD) corpus[C]. The 5th Workshop on the Representation and Processing of Sign Languages: Interactions Between Corpus and Lexicon, Istanbul, Turkey, 2012: 1–8. [29] COOPER H, ONG E J, PUGEAULT N, et al. Sign language recognition using sub-units[J]. The Journal of Machine Learning Research, 2012, 13(1): 2205–2231. [30] OSZUST M and WYSOCKI M. Polish sign language words recognition with Kinect[C]. The 6th International Conference on Human System Interactions, Sopot, Poland, 2013: 219–226. [31] ESCALERA S, BARÓ X, GONZÀLEZ J, et al. ChaLearn looking at people challenge 2014: Dataset and results[C]. The European Conference on Computer Vision, Zurich, Switzerland, 2015: 459–473. [32] RONCHETTI F, QUIROGA F, ESTREBOU C A, et al. LSA64: An Argentinian sign language dataset[C]. The XXII Congreso Argentino de Ciencias de la Computación, Córdoba, Argentina, 2016: 794–803. [33] ADALOGLOU N, CHATZIS T, PAPASTRATIS I, et al. A comprehensive study on deep learning-based methods for sign language recognition[J]. IEEE Transactions on Multimedia, 2022, 24: 1750–1762. doi: 10.1109/TMM.2021.3070438 [34] LI Dongxu, OPAZO C R, YU Xin, et al. Word-Level deep sign language recognition from video: A new large-scale dataset and methods comparison[C]. 2020 IEEE Winter Conference on Applications of Computer Vision, Snowmass, USA, 2020: 1459–1469. [35] RASTGOO R, KIANI K, and ESCALERA S. Hand sign language recognition using multi-view hand skeleton[J]. Expert Systems with Applications, 2020, 150: 113336. doi: 10.1016/j.eswa.2020.113336 [36] YANG S, JUNG S, KANG H, et al. The Korean sign language dataset for action recognition[C]. The 26th International Conference on Multimedia Modeling, Daejeon, South Korea, 2020: 532–542. [37] NEIDLE C, OPOKU A, and METAXAS D. ASL video corpora & sign bank: Resources available through the American sign language linguistic research project (ASLLRP)[J]. arXiv: 2201.07899, 2022. [38] VON AGRIS U and KRAISS K F. Towards a video corpus for signer-independent continuous sign language recognition[C]. The 7th International Workshop on Gesture in Human-Computer Interaction and Simulation, Lisbon, Portugal, 2007, 11: 2. [39] FORSTER J, SCHMIDT C, HOYOUX T, et al. RWTH-PHOENIX-weather: A large vocabulary sign language recognition and translation corpus[C]. The Eighth International Conference on Language Resources and Evaluation, Istanbul, Turkey, 2012: 3785–3789. [40] GUTIERREZ-SIGUT E, COSTELLO B, BAUS C, et al. LSE-Sign: A lexical database for Spanish sign language[J]. Behavior Research Methods, 2016, 48(1): 123–137. doi: 10.3758/s13428-014-0560-1 [41] CAMGOZ N C, HADFIELD S, KOLLER O, et al. Neural sign language translation[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7784–7793. [42] DUARTE A, PALASKAR S, VENTURA L, et al. How2Sign: A large-scale multimodal dataset for continuous American sign language[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 2735–2744. [43] PUGEAULT N and BOWDEN R. Spelling it out: Real-time ASL fingerspelling recognition[C]. 2011 IEEE International Conference on Computer Vision workshops (ICCV Workshops), Barcelona, Spain, 2011: 1114–1119. [44] MUELLER F, BERNARD F, SOTNYCHENKO O, et al. GANerated hands for real-time 3D hand tracking from monocular RGB[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, United States, 2018: 49–59. [45] HAQUE A, PENG Boya, LUO Zelun, et al. Towards viewpoint invariant 3D human pose estimation[C]. The 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 160–177. [46] TANG Ao, LU Ke, WANG Yufei, et al. A real-time hand posture recognition system using deep neural networks[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(2): 21. doi: 10.1145/2735952 [47] KOLLER O, NEY H, and BOWDEN R. Deep learning of mouth shapes for sign language[C]. 2015 IEEE International Conference on Computer Vision Workshop, Santiago, Chile, 2015: 85–91. [48] SARASWATHI S and KUMAR K A. Predicting American sign language from hand gestures using image processing and deep learning[M]. TRIPATHY A K, SARKAR M, SAHOO J P, et al. Advances in Distributed Computing and Machine Learning. Singapore: Springer, 2021: 423–431. [49] SRIDEVI P, ISLAM T, DEBNATH U, et al. Sign language recognition for speech and hearing impaired by image processing in MATLAB[C]. 2018 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), Malambe, Sri Lanka, 2018: 1–4. [50] HAMED A, BELAL N A, and MAHAR K M. Arabic sign language alphabet recognition based on HOG-PCA using microsoft kinect in complex backgrounds[C]. The 6th International Conference on Advanced Computing, Bhimavaram, India, 2016: 451–458. [51] SYKORA P, KAMENCAY P, and HUDEC R. Comparison of SIFT and SURF methods for use on hand gesture recognition based on depth map[J]. AASRI Procedia, 2014, 9: 19–24. doi: 10.1016/j.aasri.2014.09.005 [52] MA Xiang, YUAN Lin, WEN Ruoshi, et al. Sign language recognition based on concept learning[C]. 2020 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Dubrovnik, Croatia, 2020: 1–6. [53] SHI Bowen, DEL RIO A M, KEANE J, et al. American sign language fingerspelling recognition in the wild[C]. 2018 IEEE Spoken Language Technology Workshop, Athens, Greece, 2018: 145–152. [54] NANDI U, GHORAI A, SINGH M M, et al. Indian sign language alphabet recognition system using CNN with diffGrad optimizer and stochastic pooling[J]. Multimedia Tools and Applications, 2022, 82: 9627–9648. doi: 10.1007/S11042-021-11595-4 [55] SARMA D, KAVYASREE V, and BHUYAN M K. Two-stream fusion model for dynamic hand gesture recognition using 3D-CNN and 2D-CNN optical flow guided motion template[J]. arXiv: 2007.08847, 2020. [56] HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Sign language recognition using 3D convolutional neural networks[C]. 2015 IEEE International Conference on Multimedia and Expo (ICME), Turin, Italy, 2015: 1–6. [57] MARUYAMA M, GHOSE S, INOUE K, et al. Word-level sign language recognition with multi-stream neural networks focusing on local regions[J]. arXiv: 2106.15989, 2021. [58] HAN Xiangzu, LU Fei, YIN Jianqin, et al. Sign language recognition based on R(2+1)D with spatial–temporal–channel attention[J]. IEEE Transactions on Human-Machine Systems, 2022, 52(4): 687–698. doi: 10.1109/THMS.2022.3144000 [59] LIU Tao, ZHOU Wengang, and LI Houqiang. Sign language recognition with long short-term memory[C]. 2016 IEEE International Conference on Image Processing, Phoenix, USA, 2016: 25–28. [60] LEE C K M, NG K K H, CHEN C H, et al. American sign language recognition and training method with recurrent neural network[J]. Expert Systems with Applications, 2021, 167: 114403. doi: 10.1016/j.eswa.2020.114403 [61] ABDULLAHI S B and CHAMNONGTHAI K. American sign language words recognition of skeletal videos using processed video driven multi-stacked deep LSTM[J]. Sensors, 2022, 22(4): 1406. doi: 10.3390/s22041406 [62] BENDARKAR D S, SOMASE P A, REBARI P K, et al. Web based recognition and translation of American sign language with CNN and RNN[J]. International Journal of Online and Biomedical Engineering, 2021, 17(1): 34–50. doi: 10.3991/ijoe.v17i01.18585 [63] ABDULLAHI S B and CHAMNONGTHAI K. American sign language words recognition using spatio-temporal prosodic and angle features: A sequential learning approach[J]. IEEE Access, 2022, 10: 15911–15923. doi: 10.1109/ACCESS.2022.3148132 [64] TAKAYAMA N, BENITEZ-GARCIA G, and TAKAHASHI H. Sign language recognition based on spatial-temporal graph convolution-transformer[J]. Journal of the Japan Society for Precision Engineering, 2021, 87(12): 1028–1035. doi: 10.2493/jjspe.87.1028 [65] VÁZQUEZ-ENRÍQUEZ M, ALBA-CASTRO J L, DOCÍO-FERNÁNDEZ L, et al. Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Nashville, USA, 2021: 3457–3466. [66] ELAKKIYA R, VIJAYAKUMAR P, and KUMAR N. An optimized Generative Adversarial Network based continuous sign language classification[J]. Expert Systems with Applications, 2021, 182: 115276. doi: 10.1016/J.ESWA.2021.115276 [67] HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Attention-based 3D-CNNs for large-vocabulary sign language recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(9): 2822–2832. doi: 10.1109/TCSVT.2018.2870740 [68] 黄杰. 基于深度学习的手语识别技术研究[D]. [博士论文], 中国科学技术大学, 2018.HUANG Jie. Deep learning based sign language recognition[D]. [Ph. D. dissertation], University of Science and Technology of China, 2018. [69] ZHANG Shujun and ZHANG Qun. Sign language recognition based on global-local attention[J]. Journal of Visual Communication and Image Representation, 2021, 80: 103280. doi: 10.1016/j.jvcir.2021.103280 [70] BOHÁČEK M and HRÚZ M. Sign pose-based transformer for word-level sign language recognition[C]. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, Waikoloa, USA, 2022: 182–191. [71] RASTGOO R, KIANI K, ESCALERA S, et al. Multi-modal zero-shot sign language recognition[J]. arXiv: 2109.00796, 2021. [72] SARHAN N and FRINTROP S. Transfer learning for videos: From action recognition to sign language recognition[C]. 2020 IEEE International Conference on Image Processing, Abu Dhabi, United Arab Emirates, 2020: 1811–1815. [73] YIRTICI T and YURTKAN K. Regional-CNN-based enhanced Turkish sign language recognition[J]. Signal, Image and Video Processing, 2022, 16(5): 1305–1311. doi: 10.1007/s11760-021-02082-2 [74] SRIVASTAVA S, GANGWAR A, MISHRA R, et al. Sign language recognition system using TensorFlow object detection API[C]. The 1st International Conference on Advanced Network Technologies and Intelligent Computing, Varanasi, India, 2021: 634–646. [75] BILGE Y C, IKIZLER-CINBIS N, and CINBIS R G. Zero-shot sign language recognition: Can textual data uncover sign languages?[C]. BMVC, Cardiff, UK, 2019: 169–182. [76] HAN Mengmeng, CHEN Jiajun, LI Ling, et al. Visual hand gesture recognition with convolution neural network[C]. The 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Shanghai, China, 2016: 287–291. [77] LAHIANI H, ELLEUCH M, and KHERALLAH M. Real time hand gesture recognition system for android devices[C]. The 15th International Conference on Intelligent Systems Design and Applications (ISDA), Marrakech, Morocco, 2015: 591–596. [78] YAMAGUCHI Y, YOSHITOMI Y, and FUSHIMI H. Recognition of words expressed by sign language using thermal-image processing[J]. Artificial Life and Robotics, 2007, 11(1): 18–22. doi: 10.1007/s10015-006-0391-y [79] CUI Runpeng, LIU Hu, and ZHANG Changshui. Recurrent convolutional neural networks for continuous sign language recognition by staged optimization[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7361–7369. [80] LIPTON Z C, BERKOWITZ J, and ELKAN C. A critical review of recurrent neural networks for sequence learning[J]. arXiv: 1506.00019, 2015. [81] MOLCHANOV P, GUPTA S, KIM K, et al. Hand gesture recognition with 3D convolutional neural networks[C]. The 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, USA, 2015: 1–7. [82] QIU Zhaofan, YAO Ting, and MEI Tao. Learning spatio-temporal representation with pseudo-3D residual networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 5533–5541. [83] GAO Liqing, LI Haibo, LIU Zhijian, et al. RNN-Transducer based Chinese sign language recognition[J]. Neurocomputing, 2021, 434: 45–54. doi: 10.1016/j.neucom.2020.12.006 [84] YAN Sijie, XIONG Yuanjun, and LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. The Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, Louisiana, USA, 2018: 912. [85] WEN Shuhuan, TIAN Wenbo, ZHANG Hong, et al. Semantic segmentation using a GAN and a weakly supervised method based on deep transfer learning[J]. IEEE Access, 2020, 8: 176480–176494. doi: 10.1109/ACCESS.2020.3026684 [86] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 4489–4497. [87] ZHOU Yizhou, SUN Xiaoyan, ZHA Zhengjun, et al. MiCT: Mixed 3D/2D convolutional tube for human action recognition[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 449–458. [88] PINTO R F, BORGES C D B, ALMEIDA A M A, et al. Static hand gesture recognition based on convolutional neural networks[J]. Journal of Electrical and Computer Engineering, 2019, 2019: 4167890. doi: 10.1155/2019/4167890 [89] BAUER B and HIENZ H. Relevant features for video-based continuous sign language recognition[C]. The Fourth IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, 2000: 440–445. [90] GWETH Y L, PLAHL C, and NEY H. Enhanced continuous sign language recognition using PCA and neural network features[C]. 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, USA, 2012: 55–60. [91] YANG Su and ZHU Qing. Continuous Chinese sign language recognition with CNN-LSTM[C]. Proceedings of SPIE 10420, Ninth International Conference on Digital Image Processing (ICDIP 2017), Hong Kong, China, 2017. [92] ZHOU Hao, ZHOU Wengang, and LI Houqiang. Dynamic pseudo label decoding for continuous sign language recognition[C]. 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 2019: 1282–1287. [93] CAMGÖZ N C, KOLLER O, HADFIELD S, et al. Sign language transformers: Joint end-to-end sign language recognition and translation[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10023–10033. [94] SHARMA S, GUPTA R, and KUMAR A. Continuous sign language recognition using isolated signs data and deep transfer learning[J]. Journal of Ambient Intelligence and Humanized Computing, 2021, 14: 1531–1542. doi: 10.1007/s12652-021-03418-z [95] KAN Jichao, HU Kun, HAGENBUCHNER M, et al. Sign language translation with hierarchical spatio-temporal graph neural network[C]. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2022: 2131–2140. [96] PAPASTRATIS I, DIMITROPOULOS K, and DARAS P. Continuous sign language recognition through a context-aware generative adversarial network[J]. Sensors, 2021, 21(7): 2437. doi: 10.3390/s21072437 [97] BEN SLIMANE F and BOUGUESSA M. Context matters: Self-attention for sign language recognition[C]. 2020 25th International Conference on Pattern Recognition, Milan, Italy, 2021: 7884–7891. [98] XIE Pan, ZHAO Mengyi, and HU Xiaohui. PiSLTRc: Position-informed sign language transformer with content-aware convolution[J]. IEEE Transactions on Multimedia, 2022, 24: 3908–3919. doi: 10.1109/TMM.2021.3109665 [99] HAN Xiangzu, LU Fei, and TIAN Guohui. Efficient 3D CNNs with knowledge transfer for sign language recognition[J]. Multimedia Tools and Applications, 2022, 81(7): 10071–10090. doi: 10.1007/s11042-022-12051-7 [100] SHIN J, MATSUOKA A, HASAN A M, et al. American sign language alphabet recognition by extracting feature from hand pose estimation[J]. Sensors, 2021, 21(17): 5856. doi: 10.3390/s21175856 [101] SANALOHIT J and KATANYUKUL T. TFS recognition: Investigating MPH] {Thai finger spelling recognition: Investigating MediaPipe Hands potentials[J]. arXiv: 2201.03170, 2022. [102] RASTGOO R, KIANI K, and ESCALERA S. Real-time isolated hand sign language recognition using deep networks and SVD[J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13(1): 591–611. doi: 10.1007/s12652-021-02920-8 [103] AIOUEZ S, HAMITOUCHE A, BELMADOUI M, et al. Real-time Arabic sign language recognition based on YOLOv5[C/OL]. The 2nd International Conference on Image Processing and Vision Engineering - IMPROVE, 2022: 17–25. [104] LIM K M, TAN A W C, LEE C P, et al. Isolated sign language recognition using Convolutional Neural Network hand modelling and Hand Energy Image[J]. Multimedia Tools and Applications, 2019, 78(14): 19917–19944. doi: 10.1007/s11042-019-7263-7 [105] ROY P P, KUMAR P, and KIM B G. An efficient sign language recognition (SLR) system using camshift tracker and hidden Markov model (HMM)[J]. SN Computer Science, 2021, 2(2): 79. doi: 10.1007/S42979-021-00485-Z [106] RASTGOO R, KIANI K, and ESCALERA S. Multi-modal deep hand sign language recognition in still images using restricted Boltzmann machine[J]. Entropy, 2018, 20(11): 809. doi: 10.3390/e20110809 [107] XU Biao, HUANG Shiliang, and YE Zhongfu. Application of tensor train decomposition in S2VT model for sign language recognition[J]. IEEE Access, 2021, 9: 35646–35653. doi: 10.1109/ACCESS.2021.3059660 [108] FERREIRA P M, CARDOSO J S, and REBELO A. On the role of multimodal learning in the recognition of sign language[J]. Multimedia Tools and Applications, 2019, 78(8): 10035–10056. doi: 10.1007/s11042-018-6565-5 [109] KOLIVAND H, JOUDAKI S, SUNAR M S, et al. A new framework for sign language alphabet hand posture recognition using geometrical features through artificial neural network (part 1)[J]. Neural Computing and Applications, 2021, 33(10): 4945–4963. doi: 10.1007/s00521-020-05279-7 [110] BANSAL S R, WADHAWAN S, and GOEL R. mRMR-PSO: A hybrid feature selection technique with a multiobjective approach for sign language recognition[J]. Arabian Journal for Science and Engineering, 2022, 47(8): 10365–10380. doi: 10.1007/s13369-021-06456-z [111] GÖKÇE Ç, ÖZDEMIR O, KINDIROĞLU A A, et al. Score-level multi cue fusion for sign language recognition[C]. Proceedings of the European Conference on Computer Vision, Glasgow, UK, 2020: 294–309. [112] KRATIMENOS A, PAVLAKOS G, and MARAGOS P. Independent sign language recognition with 3d body, hands, and face reconstruction[C]. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021: 4270–4274.