高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

利用可选择多尺度图卷积网络的骨架行为识别

曹毅 李杰 叶培涛 王彦雯 吕贤海

曹毅, 李杰, 叶培涛, 王彦雯, 吕贤海. 利用可选择多尺度图卷积网络的骨架行为识别[J]. 电子与信息学报. doi: 10.11999/JEIT240702
引用本文: 曹毅, 李杰, 叶培涛, 王彦雯, 吕贤海. 利用可选择多尺度图卷积网络的骨架行为识别[J]. 电子与信息学报. doi: 10.11999/JEIT240702
CAO Yi, LI Jie, YE Peitao, WANG Yanwen, LÜ Xianhai. Skeleton-based Action Recognition with Selective Multi-scale Graph Convolutional Network[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240702
Citation: CAO Yi, LI Jie, YE Peitao, WANG Yanwen, LÜ Xianhai. Skeleton-based Action Recognition with Selective Multi-scale Graph Convolutional Network[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240702

利用可选择多尺度图卷积网络的骨架行为识别

doi: 10.11999/JEIT240702
基金项目: 国家自然科学基金(51375209),江苏省“六大人才高峰”计划(ZBZZ-012),高等学校学科创新引智计划(B18027)
详细信息
    作者简介:

    曹毅:男,教授,博士,研究方向为机器人机构学、深度学习

    李杰:男,硕士生,研究方向为深度学习、行为识别

    叶培涛:男,硕士生,研究方向为机器人控制系统、路径规划

    王彦雯:男,硕士生,研究方向为深度学习、声纹识别

    吕贤海:男,硕士生,研究方向为机器人机构学、行为识别

    通讯作者:

    曹毅 caoyi@jiangnan.edu.cn

  • 中图分类号: TN911.73; TP391.41

Skeleton-based Action Recognition with Selective Multi-scale Graph Convolutional Network

Funds: The National Natural Science Foundation of China (51375209), The Six Talent Peaks Project in Jiangsu Province (ZBZZ-012), The Programme of Introducing Talents of Discipline to Universities (B18027)
  • 摘要: 针对目前骨架行为识别方法忽视骨架关节点多尺度依赖关系和无法合理利用卷积核进行时间建模的问题,该文提出了一种可选择多尺度图卷积网络(SMS-GCN)的行为识别模型。首先,介绍了人体骨架图的构建原理和通道拓扑细化图卷积网络的结构;其次,构建成对关节邻接矩阵和多关节邻接矩阵以生成多尺度通道拓扑细化邻接矩阵,并引入图卷积网络,进一步提出多尺度图卷积(MS-GC)模块,以期实现对骨架关节点的多尺度依赖关系的建模;然后,基于多尺度时序卷积和可选择大核网络,提出可选择多尺度时序卷积(SMS-TC)模块,以期实现对有用的时间上下文特征的充分提取,同时结合MS-GC和SMS-TC模块,进而提出可选择多尺度图卷积网络模型并在多支流数据输入下进行训练;最后,在NTU-RGB+D和NTU-RGB+D 120数据集上进行大量实验,实验结果表明,该模型能够捕获更多的关节特征和学习有用的时间信息,具有优异的准确率和泛化能力。
  • 图  1  多尺度图卷积模块结构示意图

    图  2  可选择多尺度时序卷积模块结构示意图

    图  3  SMS-GCN结构示意图

    表  1  不同卷积核尺寸的模型准确率对比(%)

    模型Top-1Top-5
    SMS-GCN(k=3)95.0999.41
    SMS-GCN(k=5)95.0099.43
    SMS-GCN(k=7)94.9999.40
    SMS-GCN(k=9)94.9399.36
    下载: 导出CSV

    表  2  不同卷积核尺寸的准确率对比

    模型(k1, k2)(d1, d2)Top-1(%)参数量(M)时间(s)模型(k1, k2)(d1, d2)Top-1(%)参数量(M)时间(s)
    1(1, 3)(1, 1)94.581.772779(3, 11)(1, 1)94.871.93277
    2(1, 5)(1, 1)94.791.8027510(5, 7)(1, 1)94.691.90278
    3(1, 7)(1, 1)94.921.8327711(5, 9)(1, 1)94.821.93277
    4(1, 9)(1, 1)94.781.8728212(5, 11)(1, 1)94.891.97277
    5(1, 11)(1, 1)95.041.9027613(7, 9)(1, 1)94.841.97270
    6(3, 5)(1, 1)94.921.8328714(7, 11)(1, 1)94.832.00285
    7(3, 7)(1, 1)94.741.8728115(9, 11)(1, 1)95.032.03272
    8(3, 9)(1, 1)94.911.90271
    下载: 导出CSV

    表  3  不同卷积核尺寸和膨胀系数的准确率对比

    模型(k1, k2)(d1, d2)Top-1(%)参数量(M)时间(s)模型(k1, k2)(d1, d2)Top-1(%)参数量(M)时间(s)
    1(1, 1)(1, 2)92.951.747477(9, 9)(1, 3)95.072.00761
    2(3, 3)(1, 2)94.691.808868(9, 9)(1, 4)95.092.001000
    3(5, 5)(1, 2)94.891.878659(9, 9)(2, 3)94.982.001466
    4(7, 7)(1, 2)94.661.9385410(9, 9)(2, 4)94.962.001490
    5(9, 9)(1, 2)95.092.0073511(9, 9)(3, 4)94.852.001479
    6(11, 11)(1, 2)94.972.06767
    下载: 导出CSV

    表  4  不同结构的模型准确率对比

    模型参数量(M)Top-1(%)
    SMS-GCN2.0095.09
    SMS-GCN(无SMS-TC)3.7694.46
    SMS-GCN(无S)1.9694.97
    SMS-GCN(无GMP)2.0094.90
    SMS-GCN(无GAP)2.0094.93
    下载: 导出CSV

    表  5  添加不同模块的模型准确率对比(%)

    模型关节流骨骼流双流
    CTR-GCN94.7494.7096.07
    CTR-GCN + MS-GC94.9194.9096.30
    CTR-GCN + SMS-TC94.8694.9096.29
    SMS-GCN95.0994.9696.52
    下载: 导出CSV

    表  6  NTU-RGB+D数据集下模型的准确率对比(%)

    模型CSCV模型CSCV
    CNC-LSTM[5]83.391.83D-GCN[23]89.493.3
    LAGA-Net[7]87.193.2ML-STGNet[12]91.996.2
    ST-GCN[15]81.588.3MADT-GCN[19]90.496.5
    2s-AGCN[9]88.595.1SMS-GCN(单流)89.795.1
    CTR-GCN[10]92.496.8SMS-GCN(双流)91.996.5
    VN-GAN[22]92.096.7SMS-GCN(多流)92.696.9
    下载: 导出CSV

    表  7  NTU-RGB+D 120数据集下模型的准确率对比(%)

    模型CSubCSet模型CSubCSet
    GCA-LSTM[6]58.359.2ML-STGNet[12]88.690.0
    LAGA-Net[7]81.082.2MADT-GCN[19]86.588.2
    ST-GCN[15]70.773.2VN-GAN[22]87.689.4
    2s-AGCN[9]82.984.9SMS-GCN(单流)85.386.6
    CTR-GCN[10]88.990.6SMS-GCN(双流)88.890.0
    STFE-GCN[11]84.186.3SMS-GCN(多流)89.390.7
    下载: 导出CSV
  • [1] IODICE F, DE MOMI E, and AJOUDANI A. HRI30: An action recognition dataset for industrial human-robot interaction[C]. Proceedings of the 26th International Conference on Pattern Recognition, Montreal, Canada, 2022: 4941–4947. doi: 10.1109/ICPR56361.2022.9956300.
    [2] SARDARI S, SHARIFZADEH S, DANESHKHAH A, et al. Artificial intelligence for skeleton-based physical rehabilitation action evaluation: A systematic review[J]. Computers in Biology and Medicine, 2023, 158: 106835. doi: 10.1016/j.compbiomed.2023.106835.
    [3] SUN Zehua, KE Qiuhong, RAHMANI H, et al. Human action recognition from various data modalities: A review[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3200–3225. doi: 10.1109/TPAMI.2022.3183112.
    [4] 曹毅, 吴伟官, 张小勇, 等. 基于自校准机制的时空采样图卷积行为识别模型[J]. 工程科学学报, 2024, 46(3): 480–490. doi: 10.13374/j.issn2095-9389.2022.12.25.002.

    CAO Yi, WU Weiguan, ZHANG Xiaoyong, et al. Action recognition model based on the spatiotemporal sampling graph convolutional network and self-calibration mechanism[J]. Chinese Journal of Engineering, 2024, 46(3): 480–490. doi: 10.13374/j.issn2095-9389.2022.12.25.002.
    [5] SHEN Xiangpei and DING Yanrui. Human skeleton representation for 3D action recognition based on complex network coding and LSTM[J]. Journal of Visual Communication and Image Representation, 2022, 82: 103386. doi: 10.1016/j.jvcir.2021.103386.
    [6] LIU Jun, WANG Gang, HU Ping, et al. Global context-aware attention LSTM networks for 3D action recognition[C]. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 3671–3680. doi: 10.1109/CVPR.2017.391.
    [7] XIA Rongjie, LI Yanshan, and LUO Wenhan. LAGA-Net: Local-and-global attention network for skeleton based action recognition[J]. IEEE Transactions on Multimedia, 2022, 24: 2648–2661. doi: 10.1109/TMM.2021.3086758.
    [8] ZHANG Pengfei, LAN Cuiling, ZENG Wenjun, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 1109–1118. doi: 10.1109/CVPR42600.2020.00119.
    [9] SHI Lei, ZHANG Yifan, CHENG Jian, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 12018–12027. doi: 10.1109/CVPR.2019.01230.
    [10] CHEN Yuxin, ZHANG Ziqi, YUAN Chunfeng, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 13339–13348. doi: 10.1109/ICCV48922.2021.01311.
    [11] 曹毅, 吴伟官, 李平, 等. 基于时空特征增强图卷积网络的骨架行为识别[J]. 电子与信息学报, 2023, 45(8): 3022–3031. doi: 10.11999/JEIT220749.

    CAO Yi, WU Weiguan, LI Ping, et al. Skeleton action recognition based on spatio-temporal feature enhanced graph convolutional network[J]. Journal of Electronics & Information Technology, 2023, 45(8): 3022–3031. doi: 10.11999/JEIT220749.
    [12] ZHU Yisheng, SHUAI Hui, LIU Guangcan, et al. Multilevel spatial-temporal excited graph network for skeleton-based action recognition[J]. IEEE Transactions on Image Processing, 2023, 32: 496–508. doi: 10.1109/TIP.2022.3230249.
    [13] ZHOU Huanyu, LIU Qingjie, and WANG Yunhong. Learning discriminative representations for skeleton based action recognition[C]. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 10608–10617. doi: 10.1109/CVPR52729.2023.01022.
    [14] WANG Kaixuan, DENG Hongmin, and ZHU Qilin. Lightweight channel-topology based adaptive graph convolutional network for skeleton-based action recognition[J]. Neurocomputing, 2023, 560: 126830. doi: 10.1016/j.neucom.2023.126830.
    [15] YAN Sijie, XIONG Yuanjun, and LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 7444–7452. doi: 10.1609/aaai.v32i1.12328.
    [16] GEDAMU K, JI Yanli, GAO Lingling, et al. Relation-mining self-attention network for skeleton-based human action recognition[J]. Pattern Recognition, 2023, 139: 109455. doi: 10.1016/j.patcog.2023.109455.
    [17] LIU Ziyu, ZHANG Hongwen, CHEN Zhenghao, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 140–149. doi: 10.1109/CVPR42600.2020.00022.
    [18] LI Yuxuan, HOU Qibin, ZHENG Zhaohui, et al. Large selective kernel network for remote sensing object detection[C]. Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023: 16748–16759. doi: 10.1109/ICCV51070.2023.01540.
    [19] XIA Yu, GAO Qingyuan, WU Weiguan, et al. Skeleton-based action recognition based on multidimensional adaptive dynamic temporal graph convolutional network[J]. Engineering Applications of Artificial Intelligence, 2024, 127: 107210. doi: 10.1016/j.engappai.2023.107210.
    [20] AMIR S, LIU Jun, NG T T, et al. NTU RGB+D: A large scale dataset for 3D human activity analysis[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA: IEEE, 2016: 1010–1019. doi: 10.1109/CVPR.2016.115.
    [21] LIU Jun, SHAHROUDY A, PEREZ M, et al. NTU RGB+D 120: A large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(10): 2684–2701. doi: 10.1109/TPAMI.2019.2916873.
    [22] PAN Qingzhe, ZHAO Zhifu, XIE Xuemei, et al. View-normalized and subject-independent skeleton generation for action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(12): 7398–7412. doi: 10.1109/TCSVT.2022.3219864.
    [23] 曹毅, 刘晨, 盛永健, 等. 基于三维图卷积与注意力增强的行为识别模型[J]. 电子与信息学报, 2021, 43(7): 2071–2078. doi: 10.11999/JEIT200448.

    CAO Yi, LIU Chen, SHENG Yongjian, et al. Action recognition model based on 3D graph convolution and attention enhanced[J]. Journal of Electronics & Information Technology, 2021, 43(7): 2071–2078. doi: 10.11999/JEIT200448.
  • 加载中
图(3) / 表(7)
计量
  • 文章访问数:  6
  • HTML全文浏览量:  4
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-08-12
  • 修回日期:  2025-02-17
  • 网络出版日期:  2025-02-24

目录

    /

    返回文章
    返回