基于手语表达内容与表达特征的手语识别技术综述

陶唐飞; 刘天宇

doi:10.11999/JEIT221051

基于手语表达内容与表达特征的手语识别技术综述

doi: 10.11999/JEIT221051 cstr: 32379.14.JEIT221051

陶唐飞^{1, 2, ,},
刘天宇²

1.
现代设计及转子轴承系统教育部重点实验室西安 710049
2.
西安交通大学机械工程学院西安 710049

基金项目: 陕西省重点研发计划(2020KWZ-003)

详细信息

作者简介:
陶唐飞：男，副教授，研究方向为面向智造、智能诊断的图像处理与机器视觉技术等

刘天宇：男，硕士生，研究方向为计算机视觉

通讯作者:
陶唐飞　taotangfei@mail.xjtu.edu.cn

中图分类号: TP3-05
计量
- 文章访问数: 1772
- HTML全文浏览量: 2600
- PDF下载量: 491
- 被引次数: 0
出版历程
- 收稿日期: 2022-08-10
- 修回日期: 2022-10-27
- 网络出版日期: 2022-11-07
- 刊出日期: 2023-10-31

A Survey of Sign Language Recognition Technology Based on Sign Language Expression Content and Expression Characteristics

TAO Tangfei^{1, 2
, ,},
LIU Tianyu²

1.
Key Laboratory of Education Ministry for Modern Design & Rotor-Bearing System, Xi’an 710049, China
2.
School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an 710049, China

Funds: The Key Research and Development Program in Shaanxi Province of China (2020KWZ-003)

摘要

摘要: 手语识别(SLR)技术是打破听障人群与健听人群间交流壁垒的重要技术手段。该文综述了近几年的手语数据集、评价指标以及手语识别方法。首先，系统梳理了手语数据集并分析了手语识别方法的数据集发展方向。其次，详细介绍了手语识别方法的评价指标。然后，根据手语表达内容、手语识别方法所采用的特征分类总结分析了孤立词手语识别方法与连续语句识别方法、仅依靠手部特征的手语识别方法与多特征融合的手语识别方法。最后探讨了手语识别技术面临的挑战及其发展方向。
- 手语识别技术 /
- 手语数据集 /
- 孤立词手语识别 /
- 连续手语识别 /
- 多特征融合手语识别
Abstract: Sign Language Recognition (SLR) technology is an important technical means to break the communication barrier between hearing-impaired people and healthy people. The sign language datasets, evaluation indicators and sign language recognition methods in recent years are summarized. Firstly, the sign language dataset is systematically summarized and the development trend of the dataset of sign language recognition methods is analyzed. Secondly, the evaluation indicator of sign language recognition method is introduced in detail. Then, according to the content of sign language expression and the features used in sign language recognition methods, isolated word sign language recognition methods and continuous sign language recognition methods, sign language recognition methods relying only on hand features and sign language recognition methods of multi feature fusion are summarized and analyzed. Finally, the challenges and development direction of sign language recognition technology are discussed.

HTML全文

图 1 手语零样本学习示意图

下载: 全尺寸图片幻灯片

图 2 多特征融合示意图

下载: 全尺寸图片幻灯片

图 3 本文所收录的手语识别模型在几种典型数据集下的识别表现

下载: 全尺寸图片幻灯片

表 1 孤立词手语数据集

建立年份	数据集名称	语言	样本数量	标签数量	数据形式	样本类型	录制人数	开放程度	真实场景
2007	GSL-20^[26]	希腊语	840	20	RGB	词语	6	请求	×
2007	GSL isol.^[27]	希腊语	40785	310	RGB-D	词语	7	注册	×
2011	ASLLVD^[28]	英语	9800	3300	RGB	词语	6	开放	×
2012	DGS Kinect 40^[29]	德语	3000	40	RGB-D/骨架	词语	15	请求	×
2013	PSL TOF 84^[30]	波兰语	1680	84	RGB-D	词语	1	开放	×
2014	PSL Kinect 30^[30]		300	30
	DEVISIGN-G^[23]	中文	432	36	RGB	词语	8	请求	×
	DEVISIGN-D^[23]		6000	200
	DEVISIGN-L^[23]		24000	500
2014	ChaLearn^[31]	英语	50000	249	RGB-D	词语	7	部分开放	×
2015	CSL-500^[9]	中文	125000	500	RGB-D/骨架	词语	50	开放	×
2016	LSA64^[32]	西班牙语	3200	64	RGB	词语	10	开放	×
2019	GSL^[33]	希腊语	40785	310	RGB-D	词语	7	请求	×
2019	WLASL2000^[34]	英语	21083	2000	RGB	词语	119	开放	多背景
2020	RKS-PERSIANSIGN^[35]	波斯语	10000	100	RGB	词语	10	开放	√(10)
2020	KSL^[36]	韩语	1229	77	RGB/光流	词语	20	开放	√
2021	NCSL^[24]	中文	90000	300	RGB	词语	30	请求	×
2021	NMFs-CSL^[25]	中文	32010	1067	RGB	词语	10	请求	×
2022	ASL-SKELETON3D^[21] ASL-Phono^[21]	英语	9747	3300	3D RGB	词语	6	请求	×
2022	ASLLRP Sign Bank^[37]	英语	41830	6000	RGB	词语		开放	×

下载: 导出CSV

表 2 连续语句手语数据集

建立年份	数据集名称	语言	样本数量	标签数量	数据形式	样本类型	录制人数	开放程度	真实场景
2007	SIGNUM^[38]	德语	33 210	780	RGB	句子	20	开放	×
2007	GSL SD^[27]	希腊语	10 290	310	RGB	句子	7	请求	×
2007	GSL SI^[27]		10 290	310		句子	7		×
2012	RWTH-PHOENIX- Weather^[39]	德语	45 760	1 200	RGB	句子	9	开放	×
2015	CSL-100^[9]	中文	25 000	100	RGB-D/骨架	句子	50	开放	×
2016	LSE-Sign^[40]	西班牙语	2 400	2 400	RGB	句子	2	注册	√
2016	MSR^[22]	德语	33 210	450	RGB	句子	25		×
2018	RWTH-PHOENIX- Weather 2014T^[41]	德语	67 781	1 066	RGB	句子	9	开放	√
2019	GSL^[33]	希腊语	10 295	310	RGB-D	句子	7	请求	×
2019	How2Sign^[42]	英语	36 773	16 000	RGB-D/骨架/语音等	句子	11	开放	多场景

下载: 导出CSV

表 3 孤立词手语识别方法

模型分类	方法	模型方法	数据集	Acc(%)	备注(工作关注点)
传统模型	图像处理	Canny边缘检测^[48]	ASL Alphabet ASL	99.00 84.30
	图像处理	Bag Of Features^[49]	自制英文字母数据集	85.20	阈值、颜色检测等
	特征提取	Bag Of Features^[49]	自制英文字母数据集	85.20	SURF、K-近邻等
		HOG-PCA^[50]	阿拉伯字母数据集	99.20	RGB
		SURF、SIFT^[51]	Kinect Depth Datasets	>80.00	比较两种变换效果
	分类识别	Quadratic SVM^[49]	自制英文字母数据集	85.20	比较2次与3次SVM
		SVM^[50]	阿拉伯字母数据集	99.20
		DTW-HMM^[52]	AUSLdataset	87.40,92.40
		CTC^[53]	Real-time	42.00	室外进行
神经网络	卷积神经网络	CNN, diffGrad优化^[54]	自制印度数据集	99.64	结合数据增强
		C3D, 2DCNN^[55]	Graffiti数据集 In-house	92.60 89.70	OFMT
		C3D^[56]	Kinect Datasets	94.20	多模态信息
		I3D^[57]	WLASL 2000 MS-ASL100	87.47 96.66	多特征、多模态
		R(2+1)D^[58]	CSL-500	97.45	预训练，注意力
	循环神经网络	LSTM^[59]	CSL-500	63.30
		LSTM-RNN结合k-近邻^[60]	ASL Fingerspelling	99.44
		BiLSTM^[61]	ASL Datasets	97.98	结合迁移学习
		Bi-ConvLSTM^[62]	ASL Datasets	98.81	实时摄像头下ACC为90%，结合迁移学习
		FFV-Bi-LSTM^[63]	ASL Datasets	98.33	体感系统
	图神经网络	STGC-Transformer^[64]	自制日本数据集	12.14(WER)	CTC结合交叉熵
	图神经网络	MS-G3D AUTSL^[65] MS-G3D LSE^[65]	WLASL2000	95.24 93.91	迁移学习
	GAN网络	H-GANs^[66]	ASLLVD RWTH-PHOENIX Weather 2014	1.40(CER) 20.70(WER)	20个特征融合
注意力机制	Spatial Temporal	3D CNN^[67]	CSL-500 ChaLearn14	88.70 95.30(Jaccard Index)	3DCNN提取时空特征结合时间,空间注意力
	Hierarchical Temporal	HTAN^[68]	CSL-500	93.10	分层时间注意力网络
	Global local	Res-C3D^[69]	CSL-500 DEVSIGN_D	89.20 91.00	全局-视频时间序列局部-目标检测定位
	Transformer	Transformer^[70]	WLASL-100 WLASL-300 LSA64	63.20 (TOP1) 43.80 (TOP1) 100.00	致力于小计算量模型
	BERT	BERT,3DCNN,LSTM^[71]	RKS-PERSIANSIGN ASLLVD	74.60 68.80	提取特征，权衡多模态，特征映射
迁移学习	特征	I3D^[72]	ChaLearn249	62.09	迁移时空特征
	共享参数	Alexnet,R-CNN^[73]	Turkish Sign Language	99.70
	共享参数	TensorFlow Object Detection API^[74]	自制印度手语数据集	85.45
零样本学习	Zero-Shot	3DCNN, LSTM^[75]	ASL-Text	51.40
零样本学习	Zero-Shot	LSTM, BERT^[71] C3D, VSD	RKS-PERSIANSIGN First-Person ASLVID isoGD	74.60 67.20 68.80 60.20	Multi-modal

下载: 导出CSV

表 4 连续语句手语识别方法

时间(年)	模型方法	数据集	WER(%)	备注(工作关注点)
2002	HMM^[89]	97 German signs	91.70(ACC)	结合束搜索
2012	GHMM^[90]	SIGNUM	13.00	MLP
2017	CNN, LSTM^[91]	视频教材	98.43	双流2DCNN
2019	CNN-Transformer-CTC^[92]	PHOENIX-2014 PHOENIX-2014-T	26.00 26.10	特征提取, 上下文信息
2020	CNN-Transformer-CTC^[93]	PHOENIX-2014-T	24.59	手语识别+口语翻译
2021	CNN-BiLSTM-CTC^[94]	Indian Sign Language	15.14	孤立词迁移句子
2021	RNN-Transducer^[83]	CSL-100	6.10	H2SNet
2021	HST-GNN^[95]	PHOENIX-2014-T CSL-100	19.50 27.60	graph convolution graph self-attentions
2021	SLRGAN^[96]	PHOENIX-2014 CSL-100	23.40 2.10	语境信息
2021	CNN-Transformer-CTC^[97]	PHOENIX-2014	29.78	多模态，注意力
2022	CNN-Transformer-CTC^[98]	PHOENIX-2014-T PHOENIX-2014	22.90 23.20	相对位置编码
2022	GoogleLeNet-Tconvs-CTC^[91] 3D-ResNet-BLSTM-CTC^[91] I3D-BLSTM-CTC^[91]	PHOENIX-2014, CSL-100	46.41, 2.41 50.98, 13.36 52.71, 2.72	卷积网络对比研究

下载: 导出CSV

表 5 基于模型所利用表达特征的手语识别方法

特征部位	年份	方法模型	数据集	Acc(%)	备注(工作关注点)
手部	2018^[106]	RBM	NYU ASL Fingerspelling A	90.01 98.13	multi-modal
	2018^[44]	CNN	STB Dexter EgoDexter	96.5(AUC) 64(AUC) 54(AUC)	Real-time Pose estimation Hand tracking
	2019^[104]	CNN	RWTH-BOSTON-50 ASLLVD	89.33 31.50	Hand tracking Pre-trained
	2019^[108]	CNN	Kinect and LM data	97.66	multi-modal
	2020^[109]	ASLNN	DGSLR dataset	96.78	Hand pose tracking
	2021^[48]	CNN	ASL Alphabet	99.00	Image processing
	2021^[105]	HMM, Camshift	ASSLRP dataset	77.75	Hand tracking
	2021^[107]	S2VT	USTC-SLR	95.60(98.40)	减少训练参数
	2021^[71]	LSTM, BERT C3D, VSD	RKS-PERSIANSIGN First-Person ASLVID isoGD	74.60 67.20 68.80 60.20	Zero-shot Transformer Multi-modal
	2021^[100]	MPH, SVM, GBM	Massey ASL Alphabet Finger Spelling A	99.39 87.60 98.45	Hand Pose Estimation (网络摄像头)
	2022^[102]	CNN, SVD	ASLVID	93.00	手部关键点
	2022^[101]	MPH	Thai Finger Spelling schemes	84.57(S1,S2) 23.66(P1)	hand-keypoint detection
	2022^[73]	Alexnet(预训练) R-CNN	Turkish Sign Language	99.70(AP)	Transfer learning
	2022^[110]	mRMR-PSO	ASL ISL dataset NUS Dataset II Arabic Dataset	84.30 98.70 92.06 85.60	多模型组合复杂背景环境
	2022^[74]	TensorFlow Object Detection API	Indian Sign Language	85.45	Real-time Transfer learning
手部、口型、表情、身体姿态等	2015^[48]	CNN, HMM	RWTH-PHOENIX-Weather	55.70(精度)	手部、口型
	2020^[111]	3DCNN	Bosphorus-Sign22k Turkish Isolated SL dataset	99.78	手、面部、身体按照权重融合训练
	2020^[15]	SMPL reverse	SURREAL Human3.6M datasets	62.3,40.8(mm) 提升25(mm)	姿态恢复(RGB-3D) 身体姿态
	2021^[112]	SMPL-X	GSLL Dataset	94.77	手、面部、身体结合光流+RGB，姿态恢复
	2021^[66]	H-GANs	RWTH-PHOENIX-Weather 2014 ASLLVD	20.70(WER) 1.40(CER)	手、脸型、头、眼睛等20种特征，参数优化，连续手语，降维
	2020^[25]	GLE-Net	NMFs-CSL SLR500	90.50 96.80	上下文关系，判别 fine-grained cues

下载: 导出CSV

参考文献(112)

[1]	MURRAY J. World federation of the deaf[EB/OL]. http://wfdeaf.org/our-work/, 2020.
[2]	贾湧强. 我国语言障碍康复需求超3000万人[EB/OL]. https://m.btime.com/item/router?gid=40ea0atodav8fk8lcqahek43ilu, 2020.
[3]	网信郑州. 「健康生活有你有我」在中国, 2780万人的生活被按下了静音键······[EB/OL]. https://baijiahao.baidu.com/s?id=1678969855122384979&8;wfr=spider&;for=pc, 2020.
[4]	周宇. 中国手语识别中自适应问题的研究[D]. [博士论文], 哈尔滨工业大学, 2009. ZHOU Y. Research on signer adaptation in Chinese sign language recognition[D]. [Ph. D. dissertation], Harbin Institute of Technology, 2009.
[5]	GAO Wen, FANG Gaolin, ZHAO Debin, et al. A Chinese sign language recognition system based on SOFM/SRN/HMM[J]. Pattern Recognition, 2004, 37(12): 2389–2402. doi: 10.1016/S0031-3203(04)00165-7
[6]	MAZUMDAR D, TALUKDAR A K, and SARMA K K. Gloved and free hand tracking based hand gesture recognition[C]. The 1st International Conference on Emerging Trends and Applications in Computer Science, Shillong, India, 2013: 197–202.
[7]	WANG R Y and POPOVIĆ J. Real-time hand-tracking with a color glove[J]. ACM Transactions on Graphics, 2009, 28(3): 63. doi: 10.1145/1531326.1531369
[8]	HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Sign language recognition using real-sense[C]. 2015 IEEE China Summit and International Conference on Signal and Information Processing, Chengdu, China, 2015: 166–170.
[9]	HUANG Jie, ZHOU Wengang, ZHANG Qilin, et al. Video-based sign language recognition without temporal segmentation[C]. The Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, Louisiana, USA, 2018: 275.
[10]	MITTAL A, KUMAR P, ROY P P, et al. A modified LSTM model for continuous sign language recognition using leap motion[J]. IEEE Sensors Journal, 2019, 19(16): 7056–7063. doi: 10.1109/JSEN.2019.2909837
[11]	WANG Hanjie, CHAI Xiujuan, and CHEN Xilin. A novel sign language recognition framework using hierarchical Grassmann covariance matrix[J]. IEEE Transactions on Multimedia, 2019, 21(11): 2806–2814. doi: 10.1109/TMM.2019.2915032
[12]	王骐, 陈熙霖, 王春立, 等. 一种可处理数据缺失的视角无关手语识别方法[J]. 计算机学报, 2009, 32(5): 953–961. doi: 10.3724/SP.J.1016.2009.00953 WANG Qi, CHEN Xilin, WANG Chunli, et al. A data-deficiency-tolerated method for viewpoint independent sign language recognition[J]. Chinese Journal of Computers, 2009, 32(5): 953–961. doi: 10.3724/SP.J.1016.2009.00953
[13]	LIANG R H and OUHYOUNG M. A real-time continuous gesture recognition system for sign language[C]. The 3rd IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 1998: 558–567.
[14]	YU S H, HUANG C L, HSU S C, et al. Vision-based continuous sign language recognition using product HMM[C]. The 1st Asian Conference on Pattern Recognition, Beijing, China, 2011: 510–514.
[15]	MADADI M, BERTICHE H, and ESCALERA S. SMPLR: Deep learning based SMPL reverse for 3D human pose and shape recovery[J]. Pattern Recognition, 2020, 106: 107472. doi: 10.1016/j.patcog.2020.107472
[16]	CELEBI S, AYDIN A S, TEMIZ T T, et al. Gesture recognition using skeleton data with weighted dynamic time warping[C]. The International Conference on Computer Vision Theory and Applications, Barcelona, Spain, 2013: 620–625.
[17]	SUN Chao, ZHANG Tianzhu, and XU Changsheng. Latent support vector machine modeling for sign language recognition with Kinect[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(2): 20. doi: 10.1145/2629481
[18]	张淑军, 张群, 李辉. 基于深度学习的手语识别综述[J]. 电子与信息学报, 2020, 42(4): 1021–1032. doi: 10.11999/JEIT190416 ZHANG Shujun, ZHANG Qun, and LI Hui. Review of sign language recognition based on deep learning[J]. Journal of Electronics &Information Technology, 2020, 42(4): 1021–1032. doi: 10.11999/JEIT190416
[19]	米娜瓦尔·阿不拉, 阿里甫·库尔班, 解启娜, 等. 手语识别方法与技术综述[J]. 计算机工程与应用, 2021, 57(18): 1–12. doi: 10.3778/j.issn.1002-8331.2104-0220 MINAWAER·ABULA, ALIFU·KUERBAN, XIE Qina, et al. Review of sign language recognition methods and techniques[J]. Computer Engineering and Applications, 2021, 57(18): 1–12. doi: 10.3778/j.issn.1002-8331.2104-0220
[20]	郭丹, 唐申庚, 洪日昌, 等. 手语识别、翻译与生成综述[J]. 计算机科学, 2021, 48(3): 60–70. doi: 10.11896/jsjkx.210100227 GUO Dan, TANG Shengeng, HONG Richang, et al. Review of sign language recognition, translation and generation[J]. Computer Science, 2021, 48(3): 60–70. doi: 10.11896/jsjkx.210100227
[21]	DE AMORIM C C and ZANCHETTIN C. ASL-Skeleton3D and ASL-phono: Two novel datasets for the American sign language[J]. arXiv: 2201.02065, 2022.
[22]	CHEN Chen, ZHANG Baochang, HOU Zhenjie, et al. Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features[J]. Multimedia Tools and Applications, 2017, 76(3): 4651–4669. doi: 10.1007/s11042-016-3284-7
[23]	CHAI X, WANG H, and CHEN X. The DEVISIGN large vocabulary of Chinese sign language database and baseline evaluations[R]. Technical Report VIPL-TR-14-SLR-001, 2014.
[24]	WANG Fei, DU Yuxuan, WANG Guorui, et al. (2+1) D-SLR: An efficient network for video sign language recognition[J]. Neural Computing and Applications, 2022, 34(3): 2413–2423. doi: 10.1007/s00521-021-06467-9
[25]	HU Hezhen, ZHOU Wengang, PU Junfu, et al. Global-local enhancement network for NMF-aware sign language recognition[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2021, 17(3): 80. doi: 10.1145/3436754
[26]	EFTHIMIOU E and FOTINEA S E. GSLC: Creation and annotation of a Greek sign language corpus for HCI[C]. The 4th International Conference on Universal Access in Human-Computer Interaction, Beijing, China, 2007: 657–666.
[27]	ADALOGLOU N, CHATZIS T, PAPASTRATIS I, et al. A comprehensive study on sign language recognition methods[J]. arXiv: 2007.12530, 2020.
[28]	NEIDLE C, THANGALI A, and SCLAROFF S. Challenges in development of the American sign language lexicon video dataset (ASLLVD) corpus[C]. The 5th Workshop on the Representation and Processing of Sign Languages: Interactions Between Corpus and Lexicon, Istanbul, Turkey, 2012: 1–8.
[29]	COOPER H, ONG E J, PUGEAULT N, et al. Sign language recognition using sub-units[J]. The Journal of Machine Learning Research, 2012, 13(1): 2205–2231.
[30]	OSZUST M and WYSOCKI M. Polish sign language words recognition with Kinect[C]. The 6th International Conference on Human System Interactions, Sopot, Poland, 2013: 219–226.
[31]	ESCALERA S, BARÓ X, GONZÀLEZ J, et al. ChaLearn looking at people challenge 2014: Dataset and results[C]. The European Conference on Computer Vision, Zurich, Switzerland, 2015: 459–473.
[32]	RONCHETTI F, QUIROGA F, ESTREBOU C A, et al. LSA64: An Argentinian sign language dataset[C]. The XXII Congreso Argentino de Ciencias de la Computación, Córdoba, Argentina, 2016: 794–803.
[33]	ADALOGLOU N, CHATZIS T, PAPASTRATIS I, et al. A comprehensive study on deep learning-based methods for sign language recognition[J]. IEEE Transactions on Multimedia, 2022, 24: 1750–1762. doi: 10.1109/TMM.2021.3070438
[34]	LI Dongxu, OPAZO C R, YU Xin, et al. Word-Level deep sign language recognition from video: A new large-scale dataset and methods comparison[C]. 2020 IEEE Winter Conference on Applications of Computer Vision, Snowmass, USA, 2020: 1459–1469.
[35]	RASTGOO R, KIANI K, and ESCALERA S. Hand sign language recognition using multi-view hand skeleton[J]. Expert Systems with Applications, 2020, 150: 113336. doi: 10.1016/j.eswa.2020.113336
[36]	YANG S, JUNG S, KANG H, et al. The Korean sign language dataset for action recognition[C]. The 26th International Conference on Multimedia Modeling, Daejeon, South Korea, 2020: 532–542.
[37]	NEIDLE C, OPOKU A, and METAXAS D. ASL video corpora & sign bank: Resources available through the American sign language linguistic research project (ASLLRP)[J]. arXiv: 2201.07899, 2022.
[38]	VON AGRIS U and KRAISS K F. Towards a video corpus for signer-independent continuous sign language recognition[C]. The 7th International Workshop on Gesture in Human-Computer Interaction and Simulation, Lisbon, Portugal, 2007, 11: 2.
[39]	FORSTER J, SCHMIDT C, HOYOUX T, et al. RWTH-PHOENIX-weather: A large vocabulary sign language recognition and translation corpus[C]. The Eighth International Conference on Language Resources and Evaluation, Istanbul, Turkey, 2012: 3785–3789.
[40]	GUTIERREZ-SIGUT E, COSTELLO B, BAUS C, et al. LSE-Sign: A lexical database for Spanish sign language[J]. Behavior Research Methods, 2016, 48(1): 123–137. doi: 10.3758/s13428-014-0560-1
[41]	CAMGOZ N C, HADFIELD S, KOLLER O, et al. Neural sign language translation[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7784–7793.
[42]	DUARTE A, PALASKAR S, VENTURA L, et al. How2Sign: A large-scale multimodal dataset for continuous American sign language[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 2735–2744.
[43]	PUGEAULT N and BOWDEN R. Spelling it out: Real-time ASL fingerspelling recognition[C]. 2011 IEEE International Conference on Computer Vision workshops (ICCV Workshops), Barcelona, Spain, 2011: 1114–1119.
[44]	MUELLER F, BERNARD F, SOTNYCHENKO O, et al. GANerated hands for real-time 3D hand tracking from monocular RGB[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, United States, 2018: 49–59.
[45]	HAQUE A, PENG Boya, LUO Zelun, et al. Towards viewpoint invariant 3D human pose estimation[C]. The 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 160–177.
[46]	TANG Ao, LU Ke, WANG Yufei, et al. A real-time hand posture recognition system using deep neural networks[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(2): 21. doi: 10.1145/2735952
[47]	KOLLER O, NEY H, and BOWDEN R. Deep learning of mouth shapes for sign language[C]. 2015 IEEE International Conference on Computer Vision Workshop, Santiago, Chile, 2015: 85–91.
[48]	SARASWATHI S and KUMAR K A. Predicting American sign language from hand gestures using image processing and deep learning[M]. TRIPATHY A K, SARKAR M, SAHOO J P, et al. Advances in Distributed Computing and Machine Learning. Singapore: Springer, 2021: 423–431.
[49]	SRIDEVI P, ISLAM T, DEBNATH U, et al. Sign language recognition for speech and hearing impaired by image processing in MATLAB[C]. 2018 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), Malambe, Sri Lanka, 2018: 1–4.
[50]	HAMED A, BELAL N A, and MAHAR K M. Arabic sign language alphabet recognition based on HOG-PCA using microsoft kinect in complex backgrounds[C]. The 6th International Conference on Advanced Computing, Bhimavaram, India, 2016: 451–458.
[51]	SYKORA P, KAMENCAY P, and HUDEC R. Comparison of SIFT and SURF methods for use on hand gesture recognition based on depth map[J]. AASRI Procedia, 2014, 9: 19–24. doi: 10.1016/j.aasri.2014.09.005
[52]	MA Xiang, YUAN Lin, WEN Ruoshi, et al. Sign language recognition based on concept learning[C]. 2020 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Dubrovnik, Croatia, 2020: 1–6.
[53]	SHI Bowen, DEL RIO A M, KEANE J, et al. American sign language fingerspelling recognition in the wild[C]. 2018 IEEE Spoken Language Technology Workshop, Athens, Greece, 2018: 145–152.
[54]	NANDI U, GHORAI A, SINGH M M, et al. Indian sign language alphabet recognition system using CNN with diffGrad optimizer and stochastic pooling[J]. Multimedia Tools and Applications, 2022, 82: 9627–9648. doi: 10.1007/S11042-021-11595-4
[55]	SARMA D, KAVYASREE V, and BHUYAN M K. Two-stream fusion model for dynamic hand gesture recognition using 3D-CNN and 2D-CNN optical flow guided motion template[J]. arXiv: 2007.08847, 2020.
[56]	HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Sign language recognition using 3D convolutional neural networks[C]. 2015 IEEE International Conference on Multimedia and Expo (ICME), Turin, Italy, 2015: 1–6.
[57]	MARUYAMA M, GHOSE S, INOUE K, et al. Word-level sign language recognition with multi-stream neural networks focusing on local regions[J]. arXiv: 2106.15989, 2021.
[58]	HAN Xiangzu, LU Fei, YIN Jianqin, et al. Sign language recognition based on R(2+1)D with spatial–temporal–channel attention[J]. IEEE Transactions on Human-Machine Systems, 2022, 52(4): 687–698. doi: 10.1109/THMS.2022.3144000
[59]	LIU Tao, ZHOU Wengang, and LI Houqiang. Sign language recognition with long short-term memory[C]. 2016 IEEE International Conference on Image Processing, Phoenix, USA, 2016: 25–28.
[60]	LEE C K M, NG K K H, CHEN C H, et al. American sign language recognition and training method with recurrent neural network[J]. Expert Systems with Applications, 2021, 167: 114403. doi: 10.1016/j.eswa.2020.114403
[61]	ABDULLAHI S B and CHAMNONGTHAI K. American sign language words recognition of skeletal videos using processed video driven multi-stacked deep LSTM[J]. Sensors, 2022, 22(4): 1406. doi: 10.3390/s22041406
[62]	BENDARKAR D S, SOMASE P A, REBARI P K, et al. Web based recognition and translation of American sign language with CNN and RNN[J]. International Journal of Online and Biomedical Engineering, 2021, 17(1): 34–50. doi: 10.3991/ijoe.v17i01.18585
[63]	ABDULLAHI S B and CHAMNONGTHAI K. American sign language words recognition using spatio-temporal prosodic and angle features: A sequential learning approach[J]. IEEE Access, 2022, 10: 15911–15923. doi: 10.1109/ACCESS.2022.3148132
[64]	TAKAYAMA N, BENITEZ-GARCIA G, and TAKAHASHI H. Sign language recognition based on spatial-temporal graph convolution-transformer[J]. Journal of the Japan Society for Precision Engineering, 2021, 87(12): 1028–1035. doi: 10.2493/jjspe.87.1028
[65]	VÁZQUEZ-ENRÍQUEZ M, ALBA-CASTRO J L, DOCÍO-FERNÁNDEZ L, et al. Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Nashville, USA, 2021: 3457–3466.
[66]	ELAKKIYA R, VIJAYAKUMAR P, and KUMAR N. An optimized Generative Adversarial Network based continuous sign language classification[J]. Expert Systems with Applications, 2021, 182: 115276. doi: 10.1016/J.ESWA.2021.115276
[67]	HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Attention-based 3D-CNNs for large-vocabulary sign language recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(9): 2822–2832. doi: 10.1109/TCSVT.2018.2870740
[68]	黄杰. 基于深度学习的手语识别技术研究[D]. [博士论文], 中国科学技术大学, 2018. HUANG Jie. Deep learning based sign language recognition[D]. [Ph. D. dissertation], University of Science and Technology of China, 2018.
[69]	ZHANG Shujun and ZHANG Qun. Sign language recognition based on global-local attention[J]. Journal of Visual Communication and Image Representation, 2021, 80: 103280. doi: 10.1016/j.jvcir.2021.103280
[70]	BOHÁČEK M and HRÚZ M. Sign pose-based transformer for word-level sign language recognition[C]. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, Waikoloa, USA, 2022: 182–191.
[71]	RASTGOO R, KIANI K, ESCALERA S, et al. Multi-modal zero-shot sign language recognition[J]. arXiv: 2109.00796, 2021.
[72]	SARHAN N and FRINTROP S. Transfer learning for videos: From action recognition to sign language recognition[C]. 2020 IEEE International Conference on Image Processing, Abu Dhabi, United Arab Emirates, 2020: 1811–1815.
[73]	YIRTICI T and YURTKAN K. Regional-CNN-based enhanced Turkish sign language recognition[J]. Signal, Image and Video Processing, 2022, 16(5): 1305–1311. doi: 10.1007/s11760-021-02082-2
[74]	SRIVASTAVA S, GANGWAR A, MISHRA R, et al. Sign language recognition system using TensorFlow object detection API[C]. The 1st International Conference on Advanced Network Technologies and Intelligent Computing, Varanasi, India, 2021: 634–646.
[75]	BILGE Y C, IKIZLER-CINBIS N, and CINBIS R G. Zero-shot sign language recognition: Can textual data uncover sign languages?[C]. BMVC, Cardiff, UK, 2019: 169–182.
[76]	HAN Mengmeng, CHEN Jiajun, LI Ling, et al. Visual hand gesture recognition with convolution neural network[C]. The 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Shanghai, China, 2016: 287–291.
[77]	LAHIANI H, ELLEUCH M, and KHERALLAH M. Real time hand gesture recognition system for android devices[C]. The 15th International Conference on Intelligent Systems Design and Applications (ISDA), Marrakech, Morocco, 2015: 591–596.
[78]	YAMAGUCHI Y, YOSHITOMI Y, and FUSHIMI H. Recognition of words expressed by sign language using thermal-image processing[J]. Artificial Life and Robotics, 2007, 11(1): 18–22. doi: 10.1007/s10015-006-0391-y
[79]	CUI Runpeng, LIU Hu, and ZHANG Changshui. Recurrent convolutional neural networks for continuous sign language recognition by staged optimization[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7361–7369.
[80]	LIPTON Z C, BERKOWITZ J, and ELKAN C. A critical review of recurrent neural networks for sequence learning[J]. arXiv: 1506.00019, 2015.
[81]	MOLCHANOV P, GUPTA S, KIM K, et al. Hand gesture recognition with 3D convolutional neural networks[C]. The 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, USA, 2015: 1–7.
[82]	QIU Zhaofan, YAO Ting, and MEI Tao. Learning spatio-temporal representation with pseudo-3D residual networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 5533–5541.
[83]	GAO Liqing, LI Haibo, LIU Zhijian, et al. RNN-Transducer based Chinese sign language recognition[J]. Neurocomputing, 2021, 434: 45–54. doi: 10.1016/j.neucom.2020.12.006
[84]	YAN Sijie, XIONG Yuanjun, and LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. The Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, Louisiana, USA, 2018: 912.
[85]	WEN Shuhuan, TIAN Wenbo, ZHANG Hong, et al. Semantic segmentation using a GAN and a weakly supervised method based on deep transfer learning[J]. IEEE Access, 2020, 8: 176480–176494. doi: 10.1109/ACCESS.2020.3026684
[86]	TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 4489–4497.
[87]	ZHOU Yizhou, SUN Xiaoyan, ZHA Zhengjun, et al. MiCT: Mixed 3D/2D convolutional tube for human action recognition[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 449–458.
[88]	PINTO R F, BORGES C D B, ALMEIDA A M A, et al. Static hand gesture recognition based on convolutional neural networks[J]. Journal of Electrical and Computer Engineering, 2019, 2019: 4167890. doi: 10.1155/2019/4167890
[89]	BAUER B and HIENZ H. Relevant features for video-based continuous sign language recognition[C]. The Fourth IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, 2000: 440–445.
[90]	GWETH Y L, PLAHL C, and NEY H. Enhanced continuous sign language recognition using PCA and neural network features[C]. 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, USA, 2012: 55–60.
[91]	YANG Su and ZHU Qing. Continuous Chinese sign language recognition with CNN-LSTM[C]. Proceedings of SPIE 10420, Ninth International Conference on Digital Image Processing (ICDIP 2017), Hong Kong, China, 2017.
[92]	ZHOU Hao, ZHOU Wengang, and LI Houqiang. Dynamic pseudo label decoding for continuous sign language recognition[C]. 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 2019: 1282–1287.
[93]	CAMGÖZ N C, KOLLER O, HADFIELD S, et al. Sign language transformers: Joint end-to-end sign language recognition and translation[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10023–10033.
[94]	SHARMA S, GUPTA R, and KUMAR A. Continuous sign language recognition using isolated signs data and deep transfer learning[J]. Journal of Ambient Intelligence and Humanized Computing, 2021, 14: 1531–1542. doi: 10.1007/s12652-021-03418-z
[95]	KAN Jichao, HU Kun, HAGENBUCHNER M, et al. Sign language translation with hierarchical spatio-temporal graph neural network[C]. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2022: 2131–2140.
[96]	PAPASTRATIS I, DIMITROPOULOS K, and DARAS P. Continuous sign language recognition through a context-aware generative adversarial network[J]. Sensors, 2021, 21(7): 2437. doi: 10.3390/s21072437
[97]	BEN SLIMANE F and BOUGUESSA M. Context matters: Self-attention for sign language recognition[C]. 2020 25th International Conference on Pattern Recognition, Milan, Italy, 2021: 7884–7891.
[98]	XIE Pan, ZHAO Mengyi, and HU Xiaohui. PiSLTRc: Position-informed sign language transformer with content-aware convolution[J]. IEEE Transactions on Multimedia, 2022, 24: 3908–3919. doi: 10.1109/TMM.2021.3109665
[99]	HAN Xiangzu, LU Fei, and TIAN Guohui. Efficient 3D CNNs with knowledge transfer for sign language recognition[J]. Multimedia Tools and Applications, 2022, 81(7): 10071–10090. doi: 10.1007/s11042-022-12051-7
[100]	SHIN J, MATSUOKA A, HASAN A M, et al. American sign language alphabet recognition by extracting feature from hand pose estimation[J]. Sensors, 2021, 21(17): 5856. doi: 10.3390/s21175856
[101]	SANALOHIT J and KATANYUKUL T. TFS recognition: Investigating MPH] {Thai finger spelling recognition: Investigating MediaPipe Hands potentials[J]. arXiv: 2201.03170, 2022.
[102]	RASTGOO R, KIANI K, and ESCALERA S. Real-time isolated hand sign language recognition using deep networks and SVD[J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13(1): 591–611. doi: 10.1007/s12652-021-02920-8
[103]	AIOUEZ S, HAMITOUCHE A, BELMADOUI M, et al. Real-time Arabic sign language recognition based on YOLOv5[C/OL]. The 2nd International Conference on Image Processing and Vision Engineering - IMPROVE, 2022: 17–25.
[104]	LIM K M, TAN A W C, LEE C P, et al. Isolated sign language recognition using Convolutional Neural Network hand modelling and Hand Energy Image[J]. Multimedia Tools and Applications, 2019, 78(14): 19917–19944. doi: 10.1007/s11042-019-7263-7
[105]	ROY P P, KUMAR P, and KIM B G. An efficient sign language recognition (SLR) system using camshift tracker and hidden Markov model (HMM)[J]. SN Computer Science, 2021, 2(2): 79. doi: 10.1007/S42979-021-00485-Z
[106]	RASTGOO R, KIANI K, and ESCALERA S. Multi-modal deep hand sign language recognition in still images using restricted Boltzmann machine[J]. Entropy, 2018, 20(11): 809. doi: 10.3390/e20110809
[107]	XU Biao, HUANG Shiliang, and YE Zhongfu. Application of tensor train decomposition in S2VT model for sign language recognition[J]. IEEE Access, 2021, 9: 35646–35653. doi: 10.1109/ACCESS.2021.3059660
[108]	FERREIRA P M, CARDOSO J S, and REBELO A. On the role of multimodal learning in the recognition of sign language[J]. Multimedia Tools and Applications, 2019, 78(8): 10035–10056. doi: 10.1007/s11042-018-6565-5
[109]	KOLIVAND H, JOUDAKI S, SUNAR M S, et al. A new framework for sign language alphabet hand posture recognition using geometrical features through artificial neural network (part 1)[J]. Neural Computing and Applications, 2021, 33(10): 4945–4963. doi: 10.1007/s00521-020-05279-7
[110]	BANSAL S R, WADHAWAN S, and GOEL R. mRMR-PSO: A hybrid feature selection technique with a multiobjective approach for sign language recognition[J]. Arabian Journal for Science and Engineering, 2022, 47(8): 10365–10380. doi: 10.1007/s13369-021-06456-z
[111]	GÖKÇE Ç, ÖZDEMIR O, KINDIROĞLU A A, et al. Score-level multi cue fusion for sign language recognition[C]. Proceedings of the European Conference on Computer Vision, Glasgow, UK, 2020: 294–309.
[112]	KRATIMENOS A, PAVLAKOS G, and MARAGOS P. Independent sign language recognition with 3d body, hands, and face reconstruction[C]. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021: 4270–4274.