Review of Sign Language Recognition Based on Deep Learning
-
摘要:
手语识别涉及计算机视觉、模式识别、人机交互等领域,具有重要的研究意义与应用价值。深度学习技术的蓬勃发展为更加精准、实时的手语识别带来了新的机遇。该文综述了近年来基于深度学习的手语识别技术,从孤立词与连续语句两个分支展开详细的算法阐述与分析。孤立词识别技术划分为基于卷积神经网络(CNN)、3维卷积神经网络(3D-CNN)和循环神经网络(RNN) 3种架构的方法;连续语句识别所用模型复杂度更高,通常需要辅助某种长时时序建模算法,按其主体结构分为双向长短时记忆网络模型、3维卷积网络模型和混合模型。归纳总结了目前国内外常用手语数据集,探讨了手语识别技术的研究挑战与发展趋势,高精度前提下的鲁棒性和实用化仍有待于推进。
Abstract:Sign language recognition involves computer vision, pattern recognition, human-computer interaction, etc. It has important research significance and application value. The flourishing of deep learning technology brings new opportunities for more accurate and real-time sign language recognition. This paper reviews the sign language recognition technology based on deep learning in recent years, formulates and analyzes the algorithms from two branches - isolated words and continuous sentences. The isolated-word recognition technology is divided into three structures: Convolutional Neural Network (CNN), Three-Dimensional Convolutional Neural Network (3D-CNN) and Recurrent Neural Network (RNN) based method. The model used for continuous sentence recognition has higher complexity and is usually assisted with certain kind of long-term temporal sequence modeling algorithm. According to the major structure, there are three categories: the bidirectional LSTM, the 3D convolutional network model and the hybrid model. Common sign language datasets at home and abroad are summarized. Finally, the research challenges and development trends of sign language recognition technology are discussed, concluding that the robustness and practicality on the premise of high-precision still requires to be promoted.
-
表 1 基于深度学习的孤立词手语识别技术及代表性工作
作者/单位 年份 技术特点 准确率(%) 数据集 样本大小 Tang Ao, Li HouQiang, Huang Jie, Li Xiaoxu, Huang Shiliang/中国科学技术大学 2013 卷积神经网络(基于RGB-D并对手部
进行分割与追踪)[4]98.12 American Sign Language(ASL) 50700帧 2015 3维卷积神经网络(多模态输入)[17] 94.20 Chinese Sign Language(CSL) 25类 2016 循环神经网络(加入轨迹数据)[27] 85.60 500类 2017 长短时记忆网络(加入手型描述符)[28] 86.20 100类 2018 循环神经网络(关键帧视频序列筛选)[29] 91.18 310类 3维卷积网络(基于注意力机制)[18] 88.70 500类 Pigou L/根特大学 2014 卷积神经网络[5] 91.70 Chalearn 20类 2016 3维卷积网络(多模态数据的特征融合)[16] 81.00 2014 Molchanov P,Garcia B,Hardie Cate/斯坦福大学 2015 3维卷积网络(多尺度数据)[15] 77.50 VIVA Dataset 循环神经网络[25] 90.80 南威尔士大学数据集 95类 2016 卷积神经网络[9] 91.63 ASL fingerspelling Kang B /加州大学 2015 卷积神经网络[6] 99.99 ASL fingerspelling 31类 Miao Qiguang /西安电子科技大学 2016 3维卷积神经网络(基于RGB-D)[19] 56.90 Chalearn 2017 (基于显著性特征和RGB-D)[20] 59.43 (基于多模态数据和手部特征增强)[21] 67.71 Koller O/亚琛工业大学 2016 卷积神经网络(关注手型变化)[8] Danish Sign Language 分辨率4730×22 Chai Xiujuan/中科院计算所 2017 改进的RNN(对手部分割定位)[26] 99.00 Chinese Sign Language(CSL) 40类 Yang Su/北京工业大学 2017 RNN和CNN相结合[30] 98.43 CSL 40类 RNN(数据预处理)[31] 99.00 CSL 40类 Hossen M A /特斯瓦拉工程学院 2017 卷积神经网络[7] 100.00 Kinect录制 10类 ElBadawy M /埃及埃因萨姆斯大学 2017 3维卷积网络[22] 98.00 阿拉伯数据集 25类 Kim S /韩国首尔大学 2017 卷积神经网络(帧间采样)[10] 86.00 摄像头采集 20类 2018 卷积神经网络(手部分割)[11] 98.00 12类 Kopuklu O/德国慕尼黑大学 2018 卷积神经网络(时空特征融合)[12] 96.28 Jester Chalearn 57.40 Konstantinidis D /希腊大学 2018 卷积神经网络(RGB和骨架数据)[13] 98.09 阿根廷数据集LSA64 循环神经网络(多模态数据融合)[36] 89.50 印度手语数据集(IIT) Devineau G /巴黎圣米歇尔研究大学 2018 卷积神经网络(骨架数据、加入手部关节点位置序列)[14] 84.35 DHG Dataset 28 类 Ye Yuancheng /纽约城市大学 2018 3维卷积网络(特征融合)[23] 69.20 American Sign Language 27类 Liang Zhijie /华中师范大学 2018 3维卷积网络(骨架、轮廓、深度数据)[24] 83.60 Chalearn Lin Chi/中国科学院自动化所 2018 带有掩膜的ResC3D网络与RNN相结合[32] 68.42 Chalearn Halim K /印尼大学 2018 循环神经网络(基于SIBI词性变化手势的特征集)[33] 96.15 印尼手语数据集 Masood S /新德里大学 2018 循环神经网络和卷积神经网络相结合[34] 95.20 阿根廷数据集LSA64 46类 Bantupalli K /美国肯尼索州立大学 2018 循环神经网络和卷积神经网络相结合[35] 93.00 American Sign Language(ASL) 100类 Hernandez V /东京农业大学 2019 卷积神经网络与长短时记忆网络相结合[37] 89.30 American Sign Language(ASL) 19类 Liao YanQiu/南昌大学 2019 循环神经网络和3维卷积网络相结合[38] 86.90 Chinese Sign Language(CSL) 500类 表 2 基于深度学习的连续语句的手语识别技术及代表性工作
作者/单位 年份 技术特点 评估标准(%) 数据集 样本大小 Camgoz NC, Koller O/亚琛工业大学 2016 3维卷积网络(从RGB数据提取时序特征)[45] Jaccard系数:26.9 Chalearn 2016 基于卷积神经网络和HMM的混合模型[49] WER:39.7 RWTH-PHOENX-Weather 2017 基于CNN、HMM、CTC[50] WER:38.8 2017 双向长短时网络-BLSTM(基于CTC算法)[39] WER:43.1 分辨率:5000×90 2018 基于CNN、HMM及RNN的混合模型[51] Pigou L /根特大学 2017 基于3维网络和LSTM混合模型(RGB-D)[52] Jaccard系数:31.6 Chalearn Cui Runpeng/清华大学 2017 基于CNN和BLSTM(基于CTC算法)[53] WER:38.7 RWTHPHOENIX-Weather 分辨率:16000×20 2018 双向长短时网络-BLSTM(多模态数据)[40] WER:46.9 Shi B /美国芝加哥大学 2018 基于注意力机制的长短时网络[41] WER:41.9 AmericanSign Language (ASL) Ko S K /韩国电子研究所 2018 循环神经网络(加入骨架关节点数据)[42] Acc:89.5 KETI韩国手语数据集 100类 Zhang Qian/上海交通大学 2018 双向长短时网络-BLSTM[43] Acc:93.1 AmericanSign Language(ASL) 100类 Li Houqiang, Huang Jie /中国科学技术大学 2018 3维卷积网络(时间分类的对齐算法)[46] WER:37.3 RWTH-PHOENIX-Weather 双流3维卷积网络(加入LSTM)[47] Acc:82.7 ChineseSign Language 100类 Guo Dan/合肥工业大学,中国科学技术大学 2018 3维卷积神经网络(时域卷积、CTC算法、后融合策略)[48] WER:37.8 RWTH-PHOENIX-Weather 3维卷积网络和RNN相结合(自适应变长在线关键片段挖掘关键帧)[55] Acc:92.9 ChineseSign Language(CSL) 100类 Ariesta M C /雅加达大学 2018 3维卷积网络和RNN相结合(基于CTC)[54] SIBI 30类 Mittal A /印尼科技大学 2019 改进的长短时记忆网络[44] Acc:72.3 印度手语数据集(ISL) 942类 表 3 手语数据集分类
名称 所属国家 类别 场景 样本 数据特点 数据类型 可用性 RWTH-PHOENIX-Weather[56] 德国 1200 9 45760 RGB 句子 公开 Chalearn[57] 美国 249 7 50000 RGB/深度 单词 部分公开 DGS Kinect 40[58] 德国 40 15 3000 多视角 孤立词 CSL[47] 中国 500/100 1 25000 深度/骨架/RGB 孤立词/句子 公开 SIGNUM[59] 德国 450 25 33210 RGB 句子 公开 GSL 20[60] 希腊 20 6 840 RGB 单词 Boston ASLLVD[61] 美国 3300+ 6 9800 RGB 单词 公开 PSL Kinect 30[62] 波兰 30 1 300 RGB/深度 单词 公开 LSA64[63] 阿根廷 64 10 3200 RGB 单词 公开 DEVISIGN-G[64] 中国 36 8 432 RGB 单词 DEVISIGN-D[64] 500 6000 DEVISIGN-L[64] 2000 24000 CUNY ASL[65] 美国 8 RGB 句子 SignsWorld Atlas[66] 阿拉伯 32 10 RGB 单词 公开 ASL Fingerspelling[67] 美国 24 5 131000 RGB/深度 单词 公开 表 4 RWTH-PHOENIX-Weather参数
参数 2012年版 2014年版 # 操作者数量 7 9 # 样例 190 645 # 帧数 293077 965940 # 语句数量 1980 6861 # 词汇量 911 1558 # 分辨率 210×260 720×576 表 5 CSL数据集参数
参数名称 数值 RGB分辨率 1920×1080 深度数据分辨率 512×424 视频时长(s) 10~14 平均样例数 7 总样例 25000 # 操作者数量 50 词汇量 178 骨架关节点数 21 fps 25 总时长 100+ -
HINTON G E, OSINDERO S, and TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527–1554. doi: 10.1162/neco.2006.18.7.1527 周宇. 中国手语识别中自适应问题的研究[D].[博士论文], 哈尔滨工业大学, 2009.ZHOU Yu. Research on signer adaptation in Chinese sign language recognition[D].[Ph.D. dissertation], Harbin Institute of Technology, 2009. CHEOK M J, OMAR Z, and JAWARD M H. A review of hand gesture and sign language recognition techniques[J]. International Journal of Machine Learning and Cybernetics, 2019, 10(1): 131–153. doi: 10.1007/s13042-017-0705-5 TANG Ao, LU Ke, WANG Yufei, et al. A real-time hand posture recognition system using deep neural networks[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(2): 1–23. doi: 10.1145/2735952 PIGOU L, DIELEMAN S, KINDERMANS P J, et al. Sign language recognition using convolutional neural networks[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014: 572–578. KANG B, TRIPATHI S, and NGUYEN T Q. Real-time sign language fingerspelling recognition using convolutional neural networks from depth map[C]. The 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 2015: 136–140. HOSSEN M A, GOVINDAIAH A, SULTANA S, et al. Bengali sign language recognition using Deep Convolutional Neural Network[C]. The 7th Joint International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Kitakyushu, Japan, 2018: 369–373. KOLLER O, BOWDEN R, and NEY H. Automatic alignment of hamNoSys subunits for continuous sign language recognition[C]. The 10th Edition of the Language Resources and Evaluation Conference, Portorož, Slovenia, 2016: 121–128. GARCIA B and VIESCA S A. Real-time American sign language recognition with convolutional neural networks[J]. Convolutional Neural Networks for Visual Recognition, 2016, 2: 225–232. JI Y, KIM S, and LEE K B. Sign language learning system with image sampling and convolutional neural network[C]. The 1st IEEE International Conference on Robotic Computing (IRC), Taichung, China, 2017: 371–375. KIM S, JI Y, and LEE K B. An effective sign language learning with object detection based ROI segmentation[C]. The 2nd IEEE International Conference on Robotic Computing (IRC), Laguna Hills, USA, 2018: 330–333. KÖPÜKLÜ O, KÖSE N, and RIGOLL G. Motion fused frames: Data level fusion strategy for hand gesture recognition[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018: 2103–2111. KONSTANTINIDIS D, DIMITROPOULOS K, and DARAS P. Sign language recognition based on hand and body skeletal data[C]. 2018-3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), Helsinki, Finland, 2018: 1–4. DEVINEAU G, MOUTARDE F, WANG Xi, et al. Deep learning for hand gesture recognition on skeletal data[C]. The 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xian, China, 2018: 106–113. MOLCHANOV P, GUPTA S, KIM K, et al. Hand gesture recognition with 3D convolutional neural networks[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition workshops, Boston, USA, 2015: 1–7. WU Di, PIGOU L, KINDERMANS P J, et al. Deep dynamic neural networks for multimodal gesture segmentation and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(8): 1583–1597. doi: 10.1109/TPAMI.2016.2537340 HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Sign language recognition using 3D convolutional neural networks[C]. 2015 IEEE International Conference on Multimedia and Expo (ICME), Turin, Italy, 2015: 1–6. HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Attention-based 3D-CNNs for large-vocabulary sign language recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(9): 2822–2832. doi: 10.1109/TCSVT.2018.2870740 LI Yunan, MIAO Qiguang, TIAN Kuan, et al. Large-scale gesture recognition with a fusion of RGB-D data based on the C3D model[C]. The 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016: 25–30. LI Yunan, MIAO Qiguang, TIAN Kuan, et al. Large-scale gesture recognition with a fusion of RGB-D data based on saliency theory and C3D model[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(10): 2956–2964. doi: 10.1109/TCSVT.2017.2749509 MIAO Qiguang, LI Yunan, OUYANG Wanli, et al. Multimodal gesture recognition based on the resc3d network[C]. 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 3047–3055. ELBADAWY M, ELONS A S, SHEDEED H A, et al. Arabic sign language recognition with 3d convolutional neural networks[C]. The 8th International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 2017: 66–71. YE Yuancheng, TIAN Yingli, HUENERFAUTH M, et al. Recognizing American sign language gestures from within continuous videos[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018: 2064–2073. LIANG Zhijie, LIAO Shengbin, and HU Bingzhang. 3D convolutional neural networks for dynamic sign language recognition[J]. The Computer Journal, 2018, 61(11): 1724–1736. doi: 10.1093/comjnl/bxy049 CATE H, DALVI F, and HUSSAIN Z. Sign language recognition using temporal classification[EB/OL]. http://arxiv.org/abs/1701.01875v1, 2017. CHAI Xiujuan, LIU Zhipeng, YIN Fang, et al. Two streams recurrent neural networks for large-scale continuous gesture recognition[C]. The 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016: 31–36. LIU Tao, ZHOU Wengang, and LI Houqiang. Sign language recognition with long short-term memory[C]. 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, USA, 2016: 2871–2875. LI Xiaoxu, MAO Chensi, HUANG Shiliang, et al. Chinese sign language recognition based on SHS descriptor and encoder-decoder LSTM model[C]. The 12th Chinese Conference on Biometric Recognition. Shenzhen, China, 2017: 719–728. HUANG Shiliang, MAO Chensi, TAO Jinxu, et al. A novel chinese sign language recognition method based on keyframe-centered clips[J]. IEEE Signal Processing Letters, 2018, 25(3): 442–446. doi: 10.1109/LSP.2018.2797228 YANG Su and ZHU Qing. Continuous Chinese sign language recognition with CNN-LSTM[J]. SPIE, 2017, 10420. YANG Su and ZHU Qing. Video-based Chinese sign language recognition using convolutional neural network[C]. The 9th IEEE International Conference on Communication Software and Networks (ICCSN), Guangzhou, China, 2017: 929–934. LIN Chi, WAN Jun, LIANG Yanyan, et al. Large-scale isolated gesture recognition using a refined fused model based on masked Res-C3D network and skeleton LSTM[C]. The 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 2018: 52–58. HALIM K and RAKUN E. Sign language system for Bahasa Indonesia (Known as SIBI) recognizer using TensorFlow and Long Short-Term Memory[C]. 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Yogyakarta, Indonesia, 2018: 403–407. BHATEJA V, COELLO C A C, and SATAPATHY S C. Intelligent Engineering Informatics[C]. The 6th International Conference on FICTA. Singapore: 2018: 623–632. BANTUPALLI K and XIE Ying. American Sign Language recognition using deep learning and computer vision[C]. 2018 IEEE International Conference on Big Data (Big Data), Seattle, USA, 2018: 4896–4899. KONSTANTINIDIS D, DIMITROPOULOS K, and DARAS P. A deep learning approach for analyzing video and skeletal features in sign language recognition[C]. 2018 IEEE International Conference on Imaging Systems and Techniques (IST), Krakow, Poland, 2018: 1–6. VINCENT H, TOMOYA S, and GENTIANE V. Convolutional and recurrent neural network for human action recognition: Application on American sign language[EB/OL]. http://biorxiv.org/content/10.1101/535492v1, 2019. LIAO Yanqiu, XIONG Pengwen, MIN Weidong, et al. Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks[J]. IEEE Access, 2019, 7: 38044–38054. doi: 10.1109/ACCESS.2019.2904749 CAMGOZ N C, HADFIELD S, KOLLER O, et al. SubUNets: End-to-end hand shape and continuous sign language recognition[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 3075–3084. CUI Runpeng, LIU Hu, and ZHANG Changshui. A deep neural framework for continuous sign language recognition by iterative training[J]. IEEE Transactions on Multimedia, 2019, 21(7): 1880–1891. doi: 10.1109/TMM.2018.2889563 SHI Bowen, DEL RIO A M, KEANE J, et al. American Sign Language fingerspelling recognition in the wild[C]. 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 2018: 145–152. KO S K, SON J G, and JUNG H. Sign language recognition with recurrent neural network using human keypoint detection[C]. 2018 Conference on Research in Adaptive and Convergent Systems, Honolulu, USA, 2018: 326–328. ZHANG Qian, WANG Dong, ZHAO Run, et al. MyoSign: Enabling end-to-end sign language recognition with wearables[C]. The 24th International Conference on Intelligent User Interfaces, Marina del Ray, USA, 2019: 650–660. MITTAL A, KUMAR P, ROY P P, et al. A modified LSTM model for continuous sign language recognition using leap motion[J]. IEEE Sensors Journal, 2019, 19(16): 7056–7063. doi: 10.1109/JSEN.2019.2909837 CAMGOZ N C, HADFIELD S, KOLLER O, et al. Using convolutional 3d neural networks for user-independent continuous gesture recognition[C]. The 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016: 49–54. PU Junfu, ZHOU Wengang, and LI Houqiang. Dilated convolutional network with iterative optimization for continuous sign language recognition[C]. The 27th International Joint Conference on Artificial Intelligence, Wellington, New Zealand, 2018: 885–891. HUANG Jie, ZHOU Wengang, ZHANG Qilin, et al. Video-based sign language recognition without temporal segmentation[C]. The 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 2257–2264. WANG Shuo, GUO Dan, ZHOU Wengang, et al. Connectionist temporal fusion for sign language translation[C]. The 26th ACM International Conference on Multimedia, Seoul, Korea, 2018: 1483–1491. KOLLER O, ZARGARAN O, NEY H, et al. Deep sign: Hybrid CNN-HMM for continuous sign language recognition[C]. 2016 British Machine Vision Conference, York, UK, 2016: 1–2. KOLLER O, ZARGARAN S, and NEY H. Re-sign: Re-aligned end-to-end sequence modelling with deep recurrent CNN-HMMs[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 2017: 4297–4305. KOLLER O, ZARGARAN S, NEY H, et al. Deep sign: Enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs[J]. International Journal of Computer Vision, 2018, 126(12): 1311–1325. doi: 10.1007/s11263-018-1121-3 PIGOU L, VAN HERREWEGHE M, and DAMBRE J. Gesture and sign language recognition with temporal residual networks[C]. 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 3086–3093. CUI Runpeng, LIU Hu, and ZHANG Changshui. Recurrent convolutional neural networks for continuous sign language recognition by staged optimization[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7361–7369. ARIESTA M C, WIRYANA F, SUHARJITO, et al. Sentence level Indonesian sign language recognition using 3D convolutional neural network and bidirectional recurrent neural network[C]. 2018 Indonesian Association for Pattern Recognition International Conference (INAPR), Jakarta, Indonesia, 2018: 16–22. GUO Dan, ZHOU Wengang, LI Houqiang, et al. Hierarchical LSTM for sign language translation[C]. The 32nd AAAI Conference on Artificial Intelligence, the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, USA, 2018: 6845–6852. FORSTER J, SCHMIDT C, HOYOUX T, et al. RWTH-PHOENIX-Weather: A large vocabulary sign language recognition and translation corpus[C]. The 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, 2012: 3785–3789. ESCALERA S, BARÓ X, GONZÀLEZ J, et al. Chalearn looking at people challenge 2014: Dataset and results[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014: 459–473. ONG E J, COOPER H, PUGEAULT N, et al. Sign language recognition using sequential pattern trees[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 2200–2207. VON AGRIS U, ZIEREN J, CANZLER U, et al. Recent developments in visual sign language recognition[J]. Universal Access in the Information Society, 2008, 6(4): 323–362. doi: 10.1007/s10209-007-0104-x EFTHIMIOU E and FOTINEA S E. GSLC: Creation and annotation of a Greek sign language corpus for HCI[C]. The 4th International Conference on Universal Access in Human-Computer Interaction, Beijing, China, 2007: 657–666. NEIDLE C, THANGALI A, and SCLAROFF S. Challenges in development of the American Sign Language lexicon video dataset (ASLLVD) corpus[C]. The 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, Istanbul, Turkey, 2012: 1–8. OSZUST M and WYSOCKI M. Polish sign language words recognition with Kinect[C]. The 6th International Conference on Human System Interactions (HSI), Sopot, Poland, 2013: 219–226. RONCHETT F, QUIROGA F, ESTREBOU C A, et al. LSA64: An Argentinian sign language dataset[C]. The 22nd Congreso Argentino de Ciencias de la Computación (CACIC 2016), San Luis, USA, 2016: 794–803. CHAI Xiujuan, WANG Hanjie, and CHEN Xilin. The DEVISIGN large vocabulary of Chinese sign language database and baseline evaluations[R]. Technical Report VIPL-TR-14-SLR-001, 2014. LU Pengfei and HUENERFAUTH M. Collecting and evaluating the CUNY ASL corpus for research on American sign language animation[J]. Computer Speech & Language, 2014, 28(3): 812–831. doi: 10.1016/j.csl.2013.10.004 SHOHIEB S M, ELMINIR H K, and RIAD A M. Signsworld atlas; a benchmark Arabic sign language database[J]. Journal of King Saud University-Computer and Information Sciences, 2015, 27(1): 68–76. doi: 10.1016/j.jksuci.2014.03.011 PUGEAULT N and BOWDEN R. Spelling it out: Real-time ASL fingerspelling recognition[C]. 2011 IEEE International Conference on Computer Vision workshops (ICCV Workshops), Barcelona, Spain, 2011: 1114–1119. PRABHAVALKAR R, SAINATH T N, WU Yonghui, et al. Minimum word error rate training for attention-based sequence-to-sequence models[C]. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 2018: 4839–4843. KOLLER O, FORSTER J, and NEY H. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers[J]. Computer Vision and Image Understanding, 2015, 141: 108–125. doi: 10.1016/j.cviu.2015.09.013