高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于手语表达内容与表达特征的手语识别技术综述

陶唐飞 刘天宇

陶唐飞, 刘天宇. 基于手语表达内容与表达特征的手语识别技术综述[J]. 电子与信息学报, 2023, 45(10): 3439-3457. doi: 10.11999/JEIT221051
引用本文: 陶唐飞, 刘天宇. 基于手语表达内容与表达特征的手语识别技术综述[J]. 电子与信息学报, 2023, 45(10): 3439-3457. doi: 10.11999/JEIT221051
TAO Tangfei, LIU Tianyu. A Survey of Sign Language Recognition Technology Based on Sign Language Expression Content and Expression Characteristics[J]. Journal of Electronics & Information Technology, 2023, 45(10): 3439-3457. doi: 10.11999/JEIT221051
Citation: TAO Tangfei, LIU Tianyu. A Survey of Sign Language Recognition Technology Based on Sign Language Expression Content and Expression Characteristics[J]. Journal of Electronics & Information Technology, 2023, 45(10): 3439-3457. doi: 10.11999/JEIT221051

基于手语表达内容与表达特征的手语识别技术综述

doi: 10.11999/JEIT221051
基金项目: 陕西省重点研发计划(2020KWZ-003)
详细信息
    作者简介:

    陶唐飞:男,副教授,研究方向为面向智造、智能诊断的图像处理与机器视觉技术等

    刘天宇:男,硕士生,研究方向为计算机视觉

    通讯作者:

    陶唐飞 taotangfei@mail.xjtu.edu.cn

  • 中图分类号: TP3-05

A Survey of Sign Language Recognition Technology Based on Sign Language Expression Content and Expression Characteristics

Funds: The Key Research and Development Program in Shaanxi Province of China (2020KWZ-003)
  • 摘要: 手语识别(SLR)技术是打破听障人群与健听人群间交流壁垒的重要技术手段。该文综述了近几年的手语数据集、评价指标以及手语识别方法。首先,系统梳理了手语数据集并分析了手语识别方法的数据集发展方向。其次,详细介绍了手语识别方法的评价指标。然后,根据手语表达内容、手语识别方法所采用的特征分类总结分析了孤立词手语识别方法与连续语句识别方法、仅依靠手部特征的手语识别方法与多特征融合的手语识别方法。最后探讨了手语识别技术面临的挑战及其发展方向。
  • 图  1  手语零样本学习示意图

    图  2  多特征融合示意图

    图  3  本文所收录的手语识别模型在几种典型数据集下的识别表现

    表  1  孤立词手语数据集

    建立年份数据集名称语言样本数量标签数量数据形式样本类型录制人数开放程度真实场景
    2007GSL-20[26]希腊语84020RGB词语6请求×
    2007GSL isol.[27]希腊语40785310RGB-D词语7注册×
    2011ASLLVD[28]英语98003300RGB词语6开放×
    2012DGS Kinect 40[29]德语300040RGB-D/骨架词语15请求×
    2013PSL TOF 84[30]波兰语168084RGB-D词语1开放×
    2014PSL Kinect 30[30]30030
    DEVISIGN-G[23]中文43236RGB词语8请求×
    DEVISIGN-D[23]6000200
    DEVISIGN-L[23]24000500
    2014ChaLearn[31]英语50000249RGB-D词语7部分开放×
    2015CSL-500[9]中文125000500RGB-D/骨架词语50开放×
    2016LSA64[32]西班牙语320064RGB词语10开放×
    2019GSL[33]希腊语40785310RGB-D词语7请求×
    2019WLASL2000[34]英语210832000RGB词语119开放多背景
    2020RKS-PERSIANSIGN[35]波斯语10000100RGB词语10开放√(10)
    2020KSL[36]韩语122977RGB/光流词语20开放
    2021NCSL[24]中文90000300RGB词语30请求×
    2021NMFs-CSL[25]中文320101067RGB词语10请求×
    2022ASL-SKELETON3D[21]
    ASL-Phono[21]
    英语974733003D
    RGB
    词语6请求×
    2022ASLLRP Sign Bank[37]英语418306000RGB词语开放×
    下载: 导出CSV

    表  2  连续语句手语数据集

    建立年份数据集名称语言样本数量标签数量数据形式样本类型录制人数开放程度真实场景
    2007SIGNUM[38]德语33 210780RGB句子20开放×
    2007GSL SD[27]希腊语10 290310RGB句子7请求×
    GSL SI[27]10 290310句子7×
    2012RWTH-PHOENIX-
    Weather[39]
    德语45 7601 200RGB句子9开放×
    2015CSL-100[9]中文25 000100RGB-D/骨架句子50开放×
    2016LSE-Sign[40]西班牙语2 4002 400RGB句子2注册
    2016MSR[22]德语33 210450RGB句子25×
    2018RWTH-PHOENIX-
    Weather 2014T[41]
    德语67 7811 066RGB句子9开放
    2019GSL[33]希腊语10 295310RGB-D句子7请求×
    2019How2Sign[42]英语36 77316 000RGB-D/骨架/语音等句子11开放多场景
    下载: 导出CSV

    表  3  孤立词手语识别方法

    模型分类方法模型方法数据集Acc(%)备注(工作关注点)
    传统模型图像处理Canny边缘检测[48]
    ASL Alphabet
    ASL
    99.00
    84.30
    Bag Of Features[49]自制英文字母数据集85.20阈值、颜色检测等
    特征提取Bag Of Features[49]自制英文字母数据集85.20SURF、K-近邻等
    HOG-PCA[50]阿拉伯字母数据集99.20RGB
    SURF、SIFT[51]Kinect Depth Datasets>80.00比较两种变换效果
    分类识别Quadratic SVM[49]自制英文字母数据集85.20比较2次与3次SVM
    SVM[50]阿拉伯字母数据集99.20
    DTW-HMM[52]AUSLdataset87.40,92.40
    CTC[53]Real-time42.00室外进行
    神经网络卷积神经
    网络
    CNN, diffGrad优化[54]自制印度数据集99.64结合数据增强
    C3D, 2DCNN[55]Graffiti数据集
    In-house
    92.60
    89.70
    OFMT
    C3D [56]Kinect Datasets94.20多模态信息
    I3D[57]WLASL 2000
    MS-ASL100
    87.47
    96.66
    多特征、多模态
    R(2+1)D[58]CSL-50097.45预训练,注意力
    循环神经网络LSTM[59]CSL-50063.30
    LSTM-RNN结合k-近邻[60]ASL Fingerspelling99.44
    BiLSTM[61]ASL Datasets97.98结合迁移学习
    Bi-ConvLSTM[62]ASL Datasets98.81实时摄像头下ACC为90%,
    结合迁移学习
    FFV-Bi-LSTM[63]ASL Datasets98.33体感系统
    图神经网络STGC-Transformer[64]自制日本数据集12.14(WER)CTC结合交叉熵
    MS-G3D AUTSL[65]
    MS-G3D LSE[65]
    WLASL200095.24
    93.91
    迁移学习
    GAN网络H-GANs[66]ASLLVD
    RWTH-PHOENIX
    Weather 2014
    1.40(CER)
    20.70(WER)
    20个特征融合
    注意力机制Spatial
    Temporal
    3D CNN[67]

    CSL-500
    ChaLearn14
    88.70
    95.30(Jaccard Index)
    3DCNN提取时空特征
    结合时间,空间注意力
    Hierarchical TemporalHTAN[68]CSL-50093.10分层时间注意力网络
    Global
    local
    Res-C3D[69]CSL-500
    DEVSIGN_D
    89.20
    91.00
    全局-视频时间序列
    局部-目标检测定位
    TransformerTransformer[70]WLASL-100
    WLASL-300
    LSA64
    63.20 (TOP1)
    43.80 (TOP1)
    100.00
    致力于小计算量模型
    BERTBERT,3DCNN,LSTM[71]RKS-PERSIANSIGN
    ASLLVD
    74.60
    68.80
    提取特征,权衡多模态,特征映射
    迁移学习特征I3D[72]ChaLearn24962.09迁移时空特征
    共享参数Alexnet,R-CNN[73]Turkish Sign Language99.70
    TensorFlow Object
    Detection API[74]
    自制印度手语数据集85.45
    零样本学习Zero-Shot3DCNN, LSTM[75]ASL-Text51.40
    LSTM, BERT[71]
    C3D, VSD

    RKS-PERSIANSIGN
    First-Person
    ASLVID
    isoGD
    74.60
    67.20
    68.80
    60.20
    Multi-modal
    下载: 导出CSV

    表  4  连续语句手语识别方法

    时间(年)模型方法数据集WER(%)备注(工作关注点)
    2002HMM[89]97 German signs91.70(ACC)结合束搜索
    2012GHMM[90]SIGNUM13.00MLP
    2017CNN, LSTM[91]视频教材98.43双流2DCNN
    2019
    CNN-Transformer-CTC[92]
    PHOENIX-2014
    PHOENIX-2014-T
    26.00
    26.10
    特征提取, 上下文信息
    2020CNN-Transformer-CTC[93]PHOENIX-2014-T24.59手语识别+口语翻译
    2021CNN-BiLSTM-CTC[94]Indian Sign Language15.14孤立词迁移句子
    2021RNN-Transducer[83]CSL-1006.10H2SNet
    2021
    HST-GNN[95]
    PHOENIX-2014-T
    CSL-100
    19.50
    27.60
    graph convolution
    graph self-attentions
    2021
    SLRGAN[96]
    PHOENIX-2014
    CSL-100
    23.40
    2.10
    语境信息
    2021CNN-Transformer-CTC [97]PHOENIX-201429.78多模态,注意力
    2022
    CNN-Transformer-CTC[98]
    PHOENIX-2014-T
    PHOENIX-2014
    22.90
    23.20
    相对位置编码
    2022GoogleLeNet-Tconvs-CTC[91]
    3D-ResNet-BLSTM-CTC[91]
    I3D-BLSTM-CTC[91]
    PHOENIX-2014, CSL-10046.41, 2.41
    50.98, 13.36
    52.71, 2.72
    卷积网络对比研究
    下载: 导出CSV

    表  5  基于模型所利用表达特征的手语识别方法

    特征部位年份方法模型数据集Acc(%)备注(工作关注点)
    手部2018[106]
    RBM
    NYU
    ASL Fingerspelling A
    90.01
    98.13
    multi-modal
    2018[44]

    CNN

    STB
    Dexter
    EgoDexter
    96.5(AUC)
    64(AUC)
    54(AUC)
    Real-time
    Pose estimation
    Hand tracking
    2019[104]
    CNN
    RWTH-BOSTON-50
    ASLLVD
    89.33
    31.50
    Hand tracking
    Pre-trained
    2019[108]CNNKinect and LM data97.66multi-modal
    2020[109]ASLNNDGSLR dataset96.78Hand pose tracking
    2021[48]CNNASL Alphabet99.00Image processing
    2021[105]HMM, CamshiftASSLRP dataset77.75Hand tracking
    2021[107]S2VTUSTC-SLR95.60(98.40)减少训练参数
    2021[71]


    LSTM, BERT
    C3D, VSD

    RKS-PERSIANSIGN
    First-Person
    ASLVID
    isoGD
    74.60
    67.20
    68.80
    60.20
    Zero-shot
    Transformer
    Multi-modal
    2021[100]

    MPH, SVM, GBM

    Massey
    ASL Alphabet
    Finger Spelling A
    99.39
    87.60
    98.45
    Hand Pose
    Estimation
    (网络摄像头)
    2022[102]CNN, SVDASLVID93.00手部关键点
    2022[101]
    MPH
    Thai Finger Spelling schemes
    84.57(S1,S2)
    23.66(P1)
    hand-keypoint
    detection
    2022[73]
    Alexnet(预训练)
    R-CNN
    Turkish Sign Language
    99.70(AP)
    Transfer learning
    2022[110]mRMR-PSOASL
    ISL dataset
    NUS Dataset II
    Arabic Dataset
    84.30
    98.70
    92.06
    85.60
    多模型组合
    复杂背景环境
    2022[74]
    TensorFlow Object
    Detection API
    Indian Sign Language
    85.45
    Real-time
    Transfer learning
    手部、口型、表情、
    身体姿态等
    2015[48]CNN, HMMRWTH-PHOENIX-Weather55.70(精度)手部、口型
    2020[111]3DCNNBosphorus-Sign22k Turkish
    Isolated SL dataset
    99.78手、面部、身体按照权重融合训练
    2020[15]
    SMPL reverse
    SURREAL
    Human3.6M datasets
    62.3,40.8(mm)
    提升25(mm)
    姿态恢复(RGB-3D)
    身体姿态
    2021[112]SMPL-XGSLL Dataset94.77手、面部、身体结合
    光流+RGB,姿态恢复
    2021[66]

    H-GANs

    RWTH-PHOENIX-Weather 2014
    ASLLVD
    20.70(WER)
    1.40(CER)
    手、脸型、头、眼睛等20种特征,
    参数优化,连续手语,降维
    2020[25]
    GLE-Net
    NMFs-CSL
    SLR500
    90.50
    96.80
    上下文关系,判别
    fine-grained cues
    下载: 导出CSV
  • [1] MURRAY J. World federation of the deaf[EB/OL]. http://wfdeaf.org/our-work/, 2020.
    [2] 贾湧强. 我国语言障碍康复需求超3000万人[EB/OL]. https://m.btime.com/item/router?gid=40ea0atodav8fk8lcqahek43ilu, 2020.
    [3] 网信郑州. 「健康生活 有你有我」在中国, 2780万人的生活被按下了静音键······[EB/OL]. https://baijiahao.baidu.com/s?id=1678969855122384979&8;wfr=spider&;for=pc, 2020.
    [4] 周宇. 中国手语识别中自适应问题的研究[D]. [博士论文], 哈尔滨工业大学, 2009.

    ZHOU Y. Research on signer adaptation in Chinese sign language recognition[D]. [Ph. D. dissertation], Harbin Institute of Technology, 2009.
    [5] GAO Wen, FANG Gaolin, ZHAO Debin, et al. A Chinese sign language recognition system based on SOFM/SRN/HMM[J]. Pattern Recognition, 2004, 37(12): 2389–2402. doi: 10.1016/S0031-3203(04)00165-7
    [6] MAZUMDAR D, TALUKDAR A K, and SARMA K K. Gloved and free hand tracking based hand gesture recognition[C]. The 1st International Conference on Emerging Trends and Applications in Computer Science, Shillong, India, 2013: 197–202.
    [7] WANG R Y and POPOVIĆ J. Real-time hand-tracking with a color glove[J]. ACM Transactions on Graphics, 2009, 28(3): 63. doi: 10.1145/1531326.1531369
    [8] HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Sign language recognition using real-sense[C]. 2015 IEEE China Summit and International Conference on Signal and Information Processing, Chengdu, China, 2015: 166–170.
    [9] HUANG Jie, ZHOU Wengang, ZHANG Qilin, et al. Video-based sign language recognition without temporal segmentation[C]. The Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, Louisiana, USA, 2018: 275.
    [10] MITTAL A, KUMAR P, ROY P P, et al. A modified LSTM model for continuous sign language recognition using leap motion[J]. IEEE Sensors Journal, 2019, 19(16): 7056–7063. doi: 10.1109/JSEN.2019.2909837
    [11] WANG Hanjie, CHAI Xiujuan, and CHEN Xilin. A novel sign language recognition framework using hierarchical Grassmann covariance matrix[J]. IEEE Transactions on Multimedia, 2019, 21(11): 2806–2814. doi: 10.1109/TMM.2019.2915032
    [12] 王骐, 陈熙霖, 王春立, 等. 一种可处理数据缺失的视角无关手语识别方法[J]. 计算机学报, 2009, 32(5): 953–961. doi: 10.3724/SP.J.1016.2009.00953

    WANG Qi, CHEN Xilin, WANG Chunli, et al. A data-deficiency-tolerated method for viewpoint independent sign language recognition[J]. Chinese Journal of Computers, 2009, 32(5): 953–961. doi: 10.3724/SP.J.1016.2009.00953
    [13] LIANG R H and OUHYOUNG M. A real-time continuous gesture recognition system for sign language[C]. The 3rd IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 1998: 558–567.
    [14] YU S H, HUANG C L, HSU S C, et al. Vision-based continuous sign language recognition using product HMM[C]. The 1st Asian Conference on Pattern Recognition, Beijing, China, 2011: 510–514.
    [15] MADADI M, BERTICHE H, and ESCALERA S. SMPLR: Deep learning based SMPL reverse for 3D human pose and shape recovery[J]. Pattern Recognition, 2020, 106: 107472. doi: 10.1016/j.patcog.2020.107472
    [16] CELEBI S, AYDIN A S, TEMIZ T T, et al. Gesture recognition using skeleton data with weighted dynamic time warping[C]. The International Conference on Computer Vision Theory and Applications, Barcelona, Spain, 2013: 620–625.
    [17] SUN Chao, ZHANG Tianzhu, and XU Changsheng. Latent support vector machine modeling for sign language recognition with Kinect[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(2): 20. doi: 10.1145/2629481
    [18] 张淑军, 张群, 李辉. 基于深度学习的手语识别综述[J]. 电子与信息学报, 2020, 42(4): 1021–1032. doi: 10.11999/JEIT190416

    ZHANG Shujun, ZHANG Qun, and LI Hui. Review of sign language recognition based on deep learning[J]. Journal of Electronics &Information Technology, 2020, 42(4): 1021–1032. doi: 10.11999/JEIT190416
    [19] 米娜瓦尔·阿不拉, 阿里甫·库尔班, 解启娜, 等. 手语识别方法与技术综述[J]. 计算机工程与应用, 2021, 57(18): 1–12. doi: 10.3778/j.issn.1002-8331.2104-0220

    MINAWAER·ABULA, ALIFU·KUERBAN, XIE Qina, et al. Review of sign language recognition methods and techniques[J]. Computer Engineering and Applications, 2021, 57(18): 1–12. doi: 10.3778/j.issn.1002-8331.2104-0220
    [20] 郭丹, 唐申庚, 洪日昌, 等. 手语识别、翻译与生成综述[J]. 计算机科学, 2021, 48(3): 60–70. doi: 10.11896/jsjkx.210100227

    GUO Dan, TANG Shengeng, HONG Richang, et al. Review of sign language recognition, translation and generation[J]. Computer Science, 2021, 48(3): 60–70. doi: 10.11896/jsjkx.210100227
    [21] DE AMORIM C C and ZANCHETTIN C. ASL-Skeleton3D and ASL-phono: Two novel datasets for the American sign language[J]. arXiv: 2201.02065, 2022.
    [22] CHEN Chen, ZHANG Baochang, HOU Zhenjie, et al. Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features[J]. Multimedia Tools and Applications, 2017, 76(3): 4651–4669. doi: 10.1007/s11042-016-3284-7
    [23] CHAI X, WANG H, and CHEN X. The DEVISIGN large vocabulary of Chinese sign language database and baseline evaluations[R]. Technical Report VIPL-TR-14-SLR-001, 2014.
    [24] WANG Fei, DU Yuxuan, WANG Guorui, et al. (2+1) D-SLR: An efficient network for video sign language recognition[J]. Neural Computing and Applications, 2022, 34(3): 2413–2423. doi: 10.1007/s00521-021-06467-9
    [25] HU Hezhen, ZHOU Wengang, PU Junfu, et al. Global-local enhancement network for NMF-aware sign language recognition[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2021, 17(3): 80. doi: 10.1145/3436754
    [26] EFTHIMIOU E and FOTINEA S E. GSLC: Creation and annotation of a Greek sign language corpus for HCI[C]. The 4th International Conference on Universal Access in Human-Computer Interaction, Beijing, China, 2007: 657–666.
    [27] ADALOGLOU N, CHATZIS T, PAPASTRATIS I, et al. A comprehensive study on sign language recognition methods[J]. arXiv: 2007.12530, 2020.
    [28] NEIDLE C, THANGALI A, and SCLAROFF S. Challenges in development of the American sign language lexicon video dataset (ASLLVD) corpus[C]. The 5th Workshop on the Representation and Processing of Sign Languages: Interactions Between Corpus and Lexicon, Istanbul, Turkey, 2012: 1–8.
    [29] COOPER H, ONG E J, PUGEAULT N, et al. Sign language recognition using sub-units[J]. The Journal of Machine Learning Research, 2012, 13(1): 2205–2231.
    [30] OSZUST M and WYSOCKI M. Polish sign language words recognition with Kinect[C]. The 6th International Conference on Human System Interactions, Sopot, Poland, 2013: 219–226.
    [31] ESCALERA S, BARÓ X, GONZÀLEZ J, et al. ChaLearn looking at people challenge 2014: Dataset and results[C]. The European Conference on Computer Vision, Zurich, Switzerland, 2015: 459–473.
    [32] RONCHETTI F, QUIROGA F, ESTREBOU C A, et al. LSA64: An Argentinian sign language dataset[C]. The XXII Congreso Argentino de Ciencias de la Computación, Córdoba, Argentina, 2016: 794–803.
    [33] ADALOGLOU N, CHATZIS T, PAPASTRATIS I, et al. A comprehensive study on deep learning-based methods for sign language recognition[J]. IEEE Transactions on Multimedia, 2022, 24: 1750–1762. doi: 10.1109/TMM.2021.3070438
    [34] LI Dongxu, OPAZO C R, YU Xin, et al. Word-Level deep sign language recognition from video: A new large-scale dataset and methods comparison[C]. 2020 IEEE Winter Conference on Applications of Computer Vision, Snowmass, USA, 2020: 1459–1469.
    [35] RASTGOO R, KIANI K, and ESCALERA S. Hand sign language recognition using multi-view hand skeleton[J]. Expert Systems with Applications, 2020, 150: 113336. doi: 10.1016/j.eswa.2020.113336
    [36] YANG S, JUNG S, KANG H, et al. The Korean sign language dataset for action recognition[C]. The 26th International Conference on Multimedia Modeling, Daejeon, South Korea, 2020: 532–542.
    [37] NEIDLE C, OPOKU A, and METAXAS D. ASL video corpora & sign bank: Resources available through the American sign language linguistic research project (ASLLRP)[J]. arXiv: 2201.07899, 2022.
    [38] VON AGRIS U and KRAISS K F. Towards a video corpus for signer-independent continuous sign language recognition[C]. The 7th International Workshop on Gesture in Human-Computer Interaction and Simulation, Lisbon, Portugal, 2007, 11: 2.
    [39] FORSTER J, SCHMIDT C, HOYOUX T, et al. RWTH-PHOENIX-weather: A large vocabulary sign language recognition and translation corpus[C]. The Eighth International Conference on Language Resources and Evaluation, Istanbul, Turkey, 2012: 3785–3789.
    [40] GUTIERREZ-SIGUT E, COSTELLO B, BAUS C, et al. LSE-Sign: A lexical database for Spanish sign language[J]. Behavior Research Methods, 2016, 48(1): 123–137. doi: 10.3758/s13428-014-0560-1
    [41] CAMGOZ N C, HADFIELD S, KOLLER O, et al. Neural sign language translation[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7784–7793.
    [42] DUARTE A, PALASKAR S, VENTURA L, et al. How2Sign: A large-scale multimodal dataset for continuous American sign language[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 2735–2744.
    [43] PUGEAULT N and BOWDEN R. Spelling it out: Real-time ASL fingerspelling recognition[C]. 2011 IEEE International Conference on Computer Vision workshops (ICCV Workshops), Barcelona, Spain, 2011: 1114–1119.
    [44] MUELLER F, BERNARD F, SOTNYCHENKO O, et al. GANerated hands for real-time 3D hand tracking from monocular RGB[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, United States, 2018: 49–59.
    [45] HAQUE A, PENG Boya, LUO Zelun, et al. Towards viewpoint invariant 3D human pose estimation[C]. The 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 160–177.
    [46] TANG Ao, LU Ke, WANG Yufei, et al. A real-time hand posture recognition system using deep neural networks[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(2): 21. doi: 10.1145/2735952
    [47] KOLLER O, NEY H, and BOWDEN R. Deep learning of mouth shapes for sign language[C]. 2015 IEEE International Conference on Computer Vision Workshop, Santiago, Chile, 2015: 85–91.
    [48] SARASWATHI S and KUMAR K A. Predicting American sign language from hand gestures using image processing and deep learning[M]. TRIPATHY A K, SARKAR M, SAHOO J P, et al. Advances in Distributed Computing and Machine Learning. Singapore: Springer, 2021: 423–431.
    [49] SRIDEVI P, ISLAM T, DEBNATH U, et al. Sign language recognition for speech and hearing impaired by image processing in MATLAB[C]. 2018 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), Malambe, Sri Lanka, 2018: 1–4.
    [50] HAMED A, BELAL N A, and MAHAR K M. Arabic sign language alphabet recognition based on HOG-PCA using microsoft kinect in complex backgrounds[C]. The 6th International Conference on Advanced Computing, Bhimavaram, India, 2016: 451–458.
    [51] SYKORA P, KAMENCAY P, and HUDEC R. Comparison of SIFT and SURF methods for use on hand gesture recognition based on depth map[J]. AASRI Procedia, 2014, 9: 19–24. doi: 10.1016/j.aasri.2014.09.005
    [52] MA Xiang, YUAN Lin, WEN Ruoshi, et al. Sign language recognition based on concept learning[C]. 2020 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Dubrovnik, Croatia, 2020: 1–6.
    [53] SHI Bowen, DEL RIO A M, KEANE J, et al. American sign language fingerspelling recognition in the wild[C]. 2018 IEEE Spoken Language Technology Workshop, Athens, Greece, 2018: 145–152.
    [54] NANDI U, GHORAI A, SINGH M M, et al. Indian sign language alphabet recognition system using CNN with diffGrad optimizer and stochastic pooling[J]. Multimedia Tools and Applications, 2022, 82: 9627–9648. doi: 10.1007/S11042-021-11595-4
    [55] SARMA D, KAVYASREE V, and BHUYAN M K. Two-stream fusion model for dynamic hand gesture recognition using 3D-CNN and 2D-CNN optical flow guided motion template[J]. arXiv: 2007.08847, 2020.
    [56] HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Sign language recognition using 3D convolutional neural networks[C]. 2015 IEEE International Conference on Multimedia and Expo (ICME), Turin, Italy, 2015: 1–6.
    [57] MARUYAMA M, GHOSE S, INOUE K, et al. Word-level sign language recognition with multi-stream neural networks focusing on local regions[J]. arXiv: 2106.15989, 2021.
    [58] HAN Xiangzu, LU Fei, YIN Jianqin, et al. Sign language recognition based on R(2+1)D with spatial–temporal–channel attention[J]. IEEE Transactions on Human-Machine Systems, 2022, 52(4): 687–698. doi: 10.1109/THMS.2022.3144000
    [59] LIU Tao, ZHOU Wengang, and LI Houqiang. Sign language recognition with long short-term memory[C]. 2016 IEEE International Conference on Image Processing, Phoenix, USA, 2016: 25–28.
    [60] LEE C K M, NG K K H, CHEN C H, et al. American sign language recognition and training method with recurrent neural network[J]. Expert Systems with Applications, 2021, 167: 114403. doi: 10.1016/j.eswa.2020.114403
    [61] ABDULLAHI S B and CHAMNONGTHAI K. American sign language words recognition of skeletal videos using processed video driven multi-stacked deep LSTM[J]. Sensors, 2022, 22(4): 1406. doi: 10.3390/s22041406
    [62] BENDARKAR D S, SOMASE P A, REBARI P K, et al. Web based recognition and translation of American sign language with CNN and RNN[J]. International Journal of Online and Biomedical Engineering, 2021, 17(1): 34–50. doi: 10.3991/ijoe.v17i01.18585
    [63] ABDULLAHI S B and CHAMNONGTHAI K. American sign language words recognition using spatio-temporal prosodic and angle features: A sequential learning approach[J]. IEEE Access, 2022, 10: 15911–15923. doi: 10.1109/ACCESS.2022.3148132
    [64] TAKAYAMA N, BENITEZ-GARCIA G, and TAKAHASHI H. Sign language recognition based on spatial-temporal graph convolution-transformer[J]. Journal of the Japan Society for Precision Engineering, 2021, 87(12): 1028–1035. doi: 10.2493/jjspe.87.1028
    [65] VÁZQUEZ-ENRÍQUEZ M, ALBA-CASTRO J L, DOCÍO-FERNÁNDEZ L, et al. Isolated sign language recognition with multi-scale spatial-temporal graph convolutional networks[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Nashville, USA, 2021: 3457–3466.
    [66] ELAKKIYA R, VIJAYAKUMAR P, and KUMAR N. An optimized Generative Adversarial Network based continuous sign language classification[J]. Expert Systems with Applications, 2021, 182: 115276. doi: 10.1016/J.ESWA.2021.115276
    [67] HUANG Jie, ZHOU Wengang, LI Houqiang, et al. Attention-based 3D-CNNs for large-vocabulary sign language recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(9): 2822–2832. doi: 10.1109/TCSVT.2018.2870740
    [68] 黄杰. 基于深度学习的手语识别技术研究[D]. [博士论文], 中国科学技术大学, 2018.

    HUANG Jie. Deep learning based sign language recognition[D]. [Ph. D. dissertation], University of Science and Technology of China, 2018.
    [69] ZHANG Shujun and ZHANG Qun. Sign language recognition based on global-local attention[J]. Journal of Visual Communication and Image Representation, 2021, 80: 103280. doi: 10.1016/j.jvcir.2021.103280
    [70] BOHÁČEK M and HRÚZ M. Sign pose-based transformer for word-level sign language recognition[C]. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, Waikoloa, USA, 2022: 182–191.
    [71] RASTGOO R, KIANI K, ESCALERA S, et al. Multi-modal zero-shot sign language recognition[J]. arXiv: 2109.00796, 2021.
    [72] SARHAN N and FRINTROP S. Transfer learning for videos: From action recognition to sign language recognition[C]. 2020 IEEE International Conference on Image Processing, Abu Dhabi, United Arab Emirates, 2020: 1811–1815.
    [73] YIRTICI T and YURTKAN K. Regional-CNN-based enhanced Turkish sign language recognition[J]. Signal, Image and Video Processing, 2022, 16(5): 1305–1311. doi: 10.1007/s11760-021-02082-2
    [74] SRIVASTAVA S, GANGWAR A, MISHRA R, et al. Sign language recognition system using TensorFlow object detection API[C]. The 1st International Conference on Advanced Network Technologies and Intelligent Computing, Varanasi, India, 2021: 634–646.
    [75] BILGE Y C, IKIZLER-CINBIS N, and CINBIS R G. Zero-shot sign language recognition: Can textual data uncover sign languages?[C]. BMVC, Cardiff, UK, 2019: 169–182.
    [76] HAN Mengmeng, CHEN Jiajun, LI Ling, et al. Visual hand gesture recognition with convolution neural network[C]. The 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Shanghai, China, 2016: 287–291.
    [77] LAHIANI H, ELLEUCH M, and KHERALLAH M. Real time hand gesture recognition system for android devices[C]. The 15th International Conference on Intelligent Systems Design and Applications (ISDA), Marrakech, Morocco, 2015: 591–596.
    [78] YAMAGUCHI Y, YOSHITOMI Y, and FUSHIMI H. Recognition of words expressed by sign language using thermal-image processing[J]. Artificial Life and Robotics, 2007, 11(1): 18–22. doi: 10.1007/s10015-006-0391-y
    [79] CUI Runpeng, LIU Hu, and ZHANG Changshui. Recurrent convolutional neural networks for continuous sign language recognition by staged optimization[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7361–7369.
    [80] LIPTON Z C, BERKOWITZ J, and ELKAN C. A critical review of recurrent neural networks for sequence learning[J]. arXiv: 1506.00019, 2015.
    [81] MOLCHANOV P, GUPTA S, KIM K, et al. Hand gesture recognition with 3D convolutional neural networks[C]. The 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, USA, 2015: 1–7.
    [82] QIU Zhaofan, YAO Ting, and MEI Tao. Learning spatio-temporal representation with pseudo-3D residual networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 5533–5541.
    [83] GAO Liqing, LI Haibo, LIU Zhijian, et al. RNN-Transducer based Chinese sign language recognition[J]. Neurocomputing, 2021, 434: 45–54. doi: 10.1016/j.neucom.2020.12.006
    [84] YAN Sijie, XIONG Yuanjun, and LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. The Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, Louisiana, USA, 2018: 912.
    [85] WEN Shuhuan, TIAN Wenbo, ZHANG Hong, et al. Semantic segmentation using a GAN and a weakly supervised method based on deep transfer learning[J]. IEEE Access, 2020, 8: 176480–176494. doi: 10.1109/ACCESS.2020.3026684
    [86] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 4489–4497.
    [87] ZHOU Yizhou, SUN Xiaoyan, ZHA Zhengjun, et al. MiCT: Mixed 3D/2D convolutional tube for human action recognition[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 449–458.
    [88] PINTO R F, BORGES C D B, ALMEIDA A M A, et al. Static hand gesture recognition based on convolutional neural networks[J]. Journal of Electrical and Computer Engineering, 2019, 2019: 4167890. doi: 10.1155/2019/4167890
    [89] BAUER B and HIENZ H. Relevant features for video-based continuous sign language recognition[C]. The Fourth IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, 2000: 440–445.
    [90] GWETH Y L, PLAHL C, and NEY H. Enhanced continuous sign language recognition using PCA and neural network features[C]. 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, USA, 2012: 55–60.
    [91] YANG Su and ZHU Qing. Continuous Chinese sign language recognition with CNN-LSTM[C]. Proceedings of SPIE 10420, Ninth International Conference on Digital Image Processing (ICDIP 2017), Hong Kong, China, 2017.
    [92] ZHOU Hao, ZHOU Wengang, and LI Houqiang. Dynamic pseudo label decoding for continuous sign language recognition[C]. 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 2019: 1282–1287.
    [93] CAMGÖZ N C, KOLLER O, HADFIELD S, et al. Sign language transformers: Joint end-to-end sign language recognition and translation[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10023–10033.
    [94] SHARMA S, GUPTA R, and KUMAR A. Continuous sign language recognition using isolated signs data and deep transfer learning[J]. Journal of Ambient Intelligence and Humanized Computing, 2021, 14: 1531–1542. doi: 10.1007/s12652-021-03418-z
    [95] KAN Jichao, HU Kun, HAGENBUCHNER M, et al. Sign language translation with hierarchical spatio-temporal graph neural network[C]. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2022: 2131–2140.
    [96] PAPASTRATIS I, DIMITROPOULOS K, and DARAS P. Continuous sign language recognition through a context-aware generative adversarial network[J]. Sensors, 2021, 21(7): 2437. doi: 10.3390/s21072437
    [97] BEN SLIMANE F and BOUGUESSA M. Context matters: Self-attention for sign language recognition[C]. 2020 25th International Conference on Pattern Recognition, Milan, Italy, 2021: 7884–7891.
    [98] XIE Pan, ZHAO Mengyi, and HU Xiaohui. PiSLTRc: Position-informed sign language transformer with content-aware convolution[J]. IEEE Transactions on Multimedia, 2022, 24: 3908–3919. doi: 10.1109/TMM.2021.3109665
    [99] HAN Xiangzu, LU Fei, and TIAN Guohui. Efficient 3D CNNs with knowledge transfer for sign language recognition[J]. Multimedia Tools and Applications, 2022, 81(7): 10071–10090. doi: 10.1007/s11042-022-12051-7
    [100] SHIN J, MATSUOKA A, HASAN A M, et al. American sign language alphabet recognition by extracting feature from hand pose estimation[J]. Sensors, 2021, 21(17): 5856. doi: 10.3390/s21175856
    [101] SANALOHIT J and KATANYUKUL T. TFS recognition: Investigating MPH] {Thai finger spelling recognition: Investigating MediaPipe Hands potentials[J]. arXiv: 2201.03170, 2022.
    [102] RASTGOO R, KIANI K, and ESCALERA S. Real-time isolated hand sign language recognition using deep networks and SVD[J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13(1): 591–611. doi: 10.1007/s12652-021-02920-8
    [103] AIOUEZ S, HAMITOUCHE A, BELMADOUI M, et al. Real-time Arabic sign language recognition based on YOLOv5[C/OL]. The 2nd International Conference on Image Processing and Vision Engineering - IMPROVE, 2022: 17–25.
    [104] LIM K M, TAN A W C, LEE C P, et al. Isolated sign language recognition using Convolutional Neural Network hand modelling and Hand Energy Image[J]. Multimedia Tools and Applications, 2019, 78(14): 19917–19944. doi: 10.1007/s11042-019-7263-7
    [105] ROY P P, KUMAR P, and KIM B G. An efficient sign language recognition (SLR) system using camshift tracker and hidden Markov model (HMM)[J]. SN Computer Science, 2021, 2(2): 79. doi: 10.1007/S42979-021-00485-Z
    [106] RASTGOO R, KIANI K, and ESCALERA S. Multi-modal deep hand sign language recognition in still images using restricted Boltzmann machine[J]. Entropy, 2018, 20(11): 809. doi: 10.3390/e20110809
    [107] XU Biao, HUANG Shiliang, and YE Zhongfu. Application of tensor train decomposition in S2VT model for sign language recognition[J]. IEEE Access, 2021, 9: 35646–35653. doi: 10.1109/ACCESS.2021.3059660
    [108] FERREIRA P M, CARDOSO J S, and REBELO A. On the role of multimodal learning in the recognition of sign language[J]. Multimedia Tools and Applications, 2019, 78(8): 10035–10056. doi: 10.1007/s11042-018-6565-5
    [109] KOLIVAND H, JOUDAKI S, SUNAR M S, et al. A new framework for sign language alphabet hand posture recognition using geometrical features through artificial neural network (part 1)[J]. Neural Computing and Applications, 2021, 33(10): 4945–4963. doi: 10.1007/s00521-020-05279-7
    [110] BANSAL S R, WADHAWAN S, and GOEL R. mRMR-PSO: A hybrid feature selection technique with a multiobjective approach for sign language recognition[J]. Arabian Journal for Science and Engineering, 2022, 47(8): 10365–10380. doi: 10.1007/s13369-021-06456-z
    [111] GÖKÇE Ç, ÖZDEMIR O, KINDIROĞLU A A, et al. Score-level multi cue fusion for sign language recognition[C]. Proceedings of the European Conference on Computer Vision, Glasgow, UK, 2020: 294–309.
    [112] KRATIMENOS A, PAVLAKOS G, and MARAGOS P. Independent sign language recognition with 3d body, hands, and face reconstruction[C]. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021: 4270–4274.
  • 加载中
图(3) / 表(5)
计量
  • 文章访问数:  1012
  • HTML全文浏览量:  1485
  • PDF下载量:  419
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-08-10
  • 修回日期:  2022-10-27
  • 网络出版日期:  2022-11-07
  • 刊出日期:  2023-10-31

目录

    /

    返回文章
    返回