高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于时空特征增强图卷积网络的骨架行为识别

曹毅 吴伟官 李平 夏宇 高清源

曹毅, 吴伟官, 李平, 夏宇, 高清源. 基于时空特征增强图卷积网络的骨架行为识别[J]. 电子与信息学报, 2023, 45(8): 3022-3031. doi: 10.11999/JEIT220749
引用本文: 曹毅, 吴伟官, 李平, 夏宇, 高清源. 基于时空特征增强图卷积网络的骨架行为识别[J]. 电子与信息学报, 2023, 45(8): 3022-3031. doi: 10.11999/JEIT220749
CAO Yi, WU Weiguan, LI Ping, XIA Yu, GAO Qingyuan. Skeleton Action Recognition Based on Spatio-temporal Feature Enhanced Graph Convolutional Network[J]. Journal of Electronics & Information Technology, 2023, 45(8): 3022-3031. doi: 10.11999/JEIT220749
Citation: CAO Yi, WU Weiguan, LI Ping, XIA Yu, GAO Qingyuan. Skeleton Action Recognition Based on Spatio-temporal Feature Enhanced Graph Convolutional Network[J]. Journal of Electronics & Information Technology, 2023, 45(8): 3022-3031. doi: 10.11999/JEIT220749

基于时空特征增强图卷积网络的骨架行为识别

doi: 10.11999/JEIT220749
基金项目: 国家自然科学基金(51375209),江苏省“六大人才高峰”计划 (ZBZZ-012),江苏省优秀科技创新团队基金(2019SK07),高等学校学科创新引智计划(B18027)
详细信息
    作者简介:

    曹毅:男,教授,博士,研究方向为机器人机构学、机器人控制系统、深度学习

    吴伟官:男,硕士生,研究方向为深度学习、行为识别、图像处理

    李平:男,硕士生,研究方向为深度学习、声纹识别

    夏宇:男,硕士生,研究方向为深度学习、行为识别

    高清源:女,硕士生,研究方向为深度学习、行为识别

    通讯作者:

    曹毅 caoyi@jiangnan.edu.cn

  • 中图分类号: TN911.73; TP391.41

Skeleton Action Recognition Based on Spatio-temporal Feature Enhanced Graph Convolutional Network

Funds: The National Natural Science Foundation of China (51375209), The Six Talent Peaks Project in Jiangsu Province (ZBZZ-012), The Excellent Technology Innovation Team Fundation of Jiangsu Province (2019SK07), The Programme of Introducing Talents of Discipline to Universities (B18027)
  • 摘要: 针对骨架行为识别不能充分挖掘时空特征的问题,该文提出一种基于时空特征增强的图卷积行为识别模型(STFE-GCN)。首先,介绍表征人体拓扑结构邻接矩阵的定义及双流自适应图卷积网络模型的结构,其次,采用空域上的图注意力机制,根据邻居节点的重要性程度分配不同的权重系数,生成可充分挖掘空域结构特征的注意力系数矩阵,并结合非局部网络生成的全局邻接矩阵,提出一种新的空域自适应邻接矩阵,以期增强对人体空域结构特征的提取;然后,时域上采用混合池化模型以提取时域关键动作特征和全局上下文特征,并结合时域卷积提取的特征,以期增强对行为信息中时域特征的提取。再者,在模型中引入改进通道注意力网络(ECA-Net)进行通道注意力增强,更有利于模型提取样本的时空特征,同时结合空域特征增强、时域特征增强和通道注意力,构建时空特征增强图卷积网络模型在多流网络下实现端到端的训练,以期实现时空特征的充分挖掘。最后,在NTU-RGB+D和NTU-RGB+D120两个大型数据集上开展骨架行为识别研究,实验结果表明该模型具有优秀的识别准确率和泛化能力,也进一步验证了该模型充分挖掘时空特征的有效性。
  • 图  1  图结构及邻接矩阵

    图  2  双流自适应图卷积网络结构

    图  3  图注意力机制

    图  4  空域自适应邻接矩阵

    图  5  时域特征增强结构

    图  6  基于多流网络的时空特征增强图卷积网络模型

    表  1  STFE-GCN模型不同层数识别准确率对比(%)

    7层STFE-GCN8层STFE-GCN9层STFE-GCN10层STFE-GCN11层STFE-GCN
    关节流93.693.994.094.494.1
    骨架流
    双流
    93.2
    95.1
    93.6
    95.3
    93.9
    95.4
    94.3
    95.6
    93.9
    95.4
    下载: 导出CSV

    表  2  时域不同卷积核大小的识别准确率对比(%)

    5×17×19×111×113×1
    关节流94.594.394.494.093.8
    骨架流
    双流
    93.4
    95.4
    93.9
    95.4
    94.3
    95.6
    94.1
    95.4
    93.7
    95.3
    下载: 导出CSV

    表  3  NTU-RGB+D数据集X-View下消融实验的识别准确率(%)

    模型关节流骨架流双流
    2s-AGCN93.793.295.1
    2s-AGCN+图注意力机制94.193.495.3
    2s-AGCN+混合池化94.093.695.3
    2s-AGCN+ECA-Net94.094.195.4
    STFE-GCN94.494.395.6
    下载: 导出CSV

    表  4  NTU-RGB+D数据集上STFE-GCN模型各支流的识别准确率(%)

    情景关节流骨架流关节运动流骨架运动流双流多流
    X-View94.494.392.893.095.696.0
    X-Sub87.787.485.785.689.389.8
    下载: 导出CSV

    表  5  NTU-RGB+D数据集上不同模型的识别准确率对比(%)

    模型X-ViewX-Sub模型X-ViewX-Sub
    LAGA-Net[4]93.287.1PGCN-TCA[13]93.688.0
    DS-LSTM[6]
    GAT[8]
    87.3
    95.2
    77.8
    89.0
    SS-GCN[14]
    Co-ConvT[15]
    90.3
    94.3
    83.6
    88.1
    ST-GCN[9]88.381.5SGN[21]94.589.0
    2s-AGCN[10]95.188.5ST-GDNs[12]95.989.7
    ST-AGCN[11]94.388.2STFE-GCN(多流) 96.089.8
    下载: 导出CSV

    表  6  NTU-RGB+D120数据集上不同模型的识别准确率对比(%)

    模型X-SubX-Setup模型X-SubX-Setup
    FSNet[22]59.962.4ST-GCN[9]72.471.3
    AS-GCN[23]77.778.92s-AGCN[10]82.984.9
    ST-TR[7]81.984.1SGN[21]79.281.5
    GAT[8]84.086.1STFE-GCN(骨架流)81.283.7
    LAGA-Net[4]81.082.2STFE-GCN(双流)83.185.5
    ST-GDNs[12]80.882.3STFE-GCN(多流)84.1 86.3
    下载: 导出CSV
  • [1] 钱涛. 基于Kinect的动态姿势识别方法在医疗康复中的应用[D]. [硕士论文], 浙江工业大学, 2020.

    QIAN Tao. Application of Kinect-based dynamic posture recognition method in medical rehabilitation[D]. [Master dissertation], Zhejiang University of Technology, 2020.
    [2] 周风余, 尹建芹, 杨阳, 等. 基于时序深度置信网络的在线人体动作识别[J]. 自动化学报, 2016, 42(7): 1030–1039. doi: 10.16383/j.aas.2016.c150629

    ZHOU Fengyu, YIN Jianqin, YANG Yang, et al. Online recognition of human actions based on temporal deep belief neural network[J]. Acta Automatica Sinica, 2016, 42(7): 1030–1039. doi: 10.16383/j.aas.2016.c150629
    [3] LIU Zhi, ZHANG Chenyang, and TIAN Yingli. 3D-based deep convolutional neural network for action recognition with depth sequences[J]. Image and Vision Computing, 2016, 55: 93–100. doi: 10.1016/j.imavis.2016.04.004
    [4] XIA Rongjie, LI Yanshan, and LUO Wenhan. LAGA-Net: Local-and-global attention network for skeleton based action recognition[J]. IEEE Transactions on Multimedia, 2022, 24: 2648–2661. doi: 10.1109/TMM.2021.3086758
    [5] ZHANG Pengfei, LAN Cuiling, XING Junliang, et al. View adaptive neural networks for high performance skeleton-based human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1963–1978. doi: 10.1109/TPAMI.2019.2896631
    [6] JIANG Xinghao, XU Ke, and SUN Tanfeng. Action recognition scheme based on skeleton representation with DS-LSTM network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(7): 2129–2140. doi: 10.1109/TCSVT.2019.2914137
    [7] PLIZZARI C, CANNICI M, and MATTEUCCI M. Spatial temporal transformer network for skeleton-based action recognition[C]. International Conference on Pattern Recognition. ICPR International Workshops and Challenges, Milano, Italy, 2021: 694–701.
    [8] ZHANG Jiaxu, XIE Wei, WANG Chao, et al. Graph-aware transformer for skeleton-based action recognition[J]. The Visual Computer, To be published.
    [9] YAN Sijie, XIONG Yuanjun, and LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. The 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 7444–7452.
    [10] SHI Lei, ZHANG Yifan, CHENG Jian, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 12018–12027.
    [11] CAO Yi, LIU Chen, HUANG Zhilong, et al. Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure[J]. Multimedia Tools and Applications, 2021, 80(19): 29139–29162. doi: 10.1007/s11042-021-11136-z
    [12] PENG Wei, SHI Jingang, and ZHAO Guoying. Spatial temporal graph deconvolutional network for skeleton-based human action recognition[J]. IEEE Signal Processing Letters, 2021, 28: 244–248. doi: 10.1109/LSP.2021.3049691
    [13] YANG Hongye, GU Yuzhang, ZHU Jianchao, et al. PGCN-TCA: Pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition[J]. IEEE Access, 2020, 8: 10040–10047. doi: 10.1109/ACCESS.2020.2964115
    [14] CHEN Shuo, XU Ke, JIANG Xinghao, et al. Spatiotemporal-spectral graph convolutional networks for skeleton-based action recognition[C]. 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Shenzhen, China, 2021: 1–6.
    [15] 石跃祥, 朱茂清. 基于骨架动作识别的协作卷积Transformer网络[J]. 电子与信息学报, 2023, 45(4): 1485–1493.

    SHI Yuexiang and ZHU Maoqing. Collaborative convolutional transformer network for skeleton-based action recognition[J]. Journal of Electronics & Information Technology, 2023, 45(4): 1485–1493.
    [16] VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks[C]. The 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, 2018: 1254–1263.
    [17] WANG Xiaolong, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7794–7803.
    [18] WANG Qilong, WU Banggu, ZHU Pengfei, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 2575–7075.
    [19] SHAHROUDY A, LIU Jun, NG T T, et al. NTU RGB+D: A large scale dataset for 3D human activity analysis[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 1010–1019.
    [20] LIU Jun, SHAHROUDY A, PEREZ M, et al. NTU RGB+D 120: A large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(10): 2684–2701. doi: 10.1109/tpami.2019.2916873
    [21] ZHANG Pengfei, LAN Cuiling, ZENG Wenjun, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 1109–1118.
    [22] LIU Jun, SHAHROUDY A, WANG Gang, et al. Skeleton-based online action prediction using scale selection network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(6): 1453–1467. doi: 10.1109/TPAMI.2019.2898954
    [23] LI Maosen, CHEN Siheng, CHEN Xu, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2019: 3590–3598.
  • 加载中
图(6) / 表(6)
计量
  • 文章访问数:  1194
  • HTML全文浏览量:  791
  • PDF下载量:  205
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-06-13
  • 修回日期:  2022-10-31
  • 网络出版日期:  2022-11-07
  • 刊出日期:  2023-08-21

目录

    /

    返回文章
    返回