DGCN-MFW: A Lightweight Human Action Recognition Network for Millimeter-Wave Radar 3D Point Clouds
-
摘要: 毫米波雷达三维点云能精准捕捉人体动作的空间变化细节,为动作识别提供了强鲁棒性的数据源。然而,点云固有的无序性与稀疏性限制了特征提取效率,传统方法难以有效建模其局部与全局的空间依赖关系,导致识别精度受限。为解决上述问题,本文提出一种基于动态图卷积与多特征融合的轻量化动作识别网络。该网络核心包含三个模块:(1)动态图卷积模块,通过动态构建局部邻域图结构,自适应学习鲁棒的点云特征,减少动作过渡阶段的误判;(2)多尺度特征融合模块,分层聚合局部细节与全局上下文信息,增强空间表征与行为理解能力;(3)自适应帧加权模块,依据信息熵与数据可靠性为不同时序帧分配权重,聚焦关键时序片段。在公开数据集 mmWave-3DPCHM-1.0 上的实验表明,所提方法对TI与Vayyar数据集上的平均识别准确率分别达到98.32%与99.48%,且仅需2.06 M参数量与4.51 GFLOPS计算量,在识别精度与模型轻量化方面均优于现有主流方法。Abstract:
Objective Millimeter-wave radar 3D point clouds offer key spatial cues for human action recognition, but their inherent disorder challenges feature extraction, and actions depend on multi-frame temporal correlations, making single-frame analysis error-prone. This paper propose a dynamic graph convolutional network that fuses multi-scale features, adaptively weights frames, and uses cross-attention, tailored to long 3D point-cloud sequences to improve recognition performance and efficiency. Methods This paper proposes a dynamic graph convolutional network solution (DGCN-MFW) with three core components: dynamic graph convolution feature extraction, multi-scale feature fusion, and adaptive temporal frame weighting. Step 1: Use dynamic graph convolution to automatically build spatial geometry via local directed neighborhood graphs and update neighborhoods online, avoiding manual graph construction and improving feature robustness. Step 2: Apply multi-scale feature fusion to jointly extract and integrate point-cloud features across space and time, capturing local details and global semantics. Step 3: Introduce adaptive frame weighting to learn per-frame importance, highlight discriminative key frames, and suppress noisy or unimportant frames; cross-attention enables information exchange between the center frame and context, compensating for single-frame deficits caused by motion blur, occlusion, or pose ambiguity. Results and Discussions The proposed network model extracts features via dynamic graph convolution, then conducts multi-scale feature fusion and adaptive frame weighting, and ultimately accomplishes human action recognition. It performs excellently on public TI and Vayyar millimeter-wave radar point cloud datasets, with only 2.06M parameters and 4.51 GFLOPS of computation, outperforming existing methods ( Tables 2 ,3 ,4 ). Ablation experiments prove both core modules significantly boost recognition accuracy (Table 1 ). Confusion matrices show it hits over 99% accuracy for most actions on the two datasets, exhibiting superior performance (Fig. 10 ,11 ). Nevertheless, its scale, parameter count and efficiency in large-scale data processing need improvement, and future work will focus on model lightweighting and architectural optimization to enhance efficiency.Conclusions To address the two major challenges in mmWave radar 3D point-cloud human action recognition, this paper proposes an action recognition algorithm based on dynamic graph convolutional network and multi-feature fusion. It uses a multi-scale feature fusion module and cross-scale interaction to extract local and global features, improving spatial representation. An adaptive frame-weighting module and cross-attention mechanism are adopted to capture the temporal evolution of actions. The method achieves 98.32% and 99.48% accuracies on two datasets with 2.06M parameters and 4.51 GFLOPs, outperforming mainstream models. It provides a new solution for high-precision and low-resource mmWave radar action recognition, suitable for real-time scenarios like industrial human–machine interaction, intelligent security and healthcare. -
表 1 不同模块组合的识别准确率对比(%)
模块组合 DGCNN MSFF AFW 在TI数据集上的准确率(%) 在Vayyar数据集上的准确率(%) Baseline √ 93.59 94.54 Baseline-1 √ √ 95.33 96.82 Baseline-2 √ √ 97.31 98.57 DGCN-MFW(Ours) √ √ √ 98.32 99.48 表 2 不同网络模型在TI数据集的动作识别准确率(%)
模型 打拳 跌倒 跳 左前倾 左挥手 开双臂 右前倾 右挥手 静坐 下蹲 站立 步行 平均 PointNet 78.25 99.12 71.36 73.89 64.58 75.63 58.94 65.72 86.83 81.29 87.65 78.92 76.85 PointNet++ 87.42 98.88 75.15 76.68 77.31 74.16 58.27 67.84 88.76 87.52 93.69 83.48 80.62 PCT 97.90 99.00 80.74 88.57 84.08 85.84 96.94 91.49 95.54 95.21 98.38 96.63 92.64 P4Transformer 98.92 98.77 82.43 98.13 85.62 98.87 97.84 95.19 99.17 99.12 99.51 96.56 95.84 PSTNet 98.54 99.03 78.30 90.57 90.78 96.09 96.06 94.76 99.21 97.40 99.18 98.49 94.98 DGCN-MFW(Ours) 99.71 99.41 93.57 98.58 95.24 98.82 97.94 99.71 99.17 99.36 99.07 99.31 98.32 表 3 不同网络模型在Vayyar数据集上的动作识别准确率(%)
模型 打拳 跌倒 跳 左前倾 左挥手 开双臂 右前倾 右挥手 静坐 下蹲 站立 步行 平均 PointNet 89.12 98.25 72.18 77.05 66.88 81.43 56.79 68.91 90.36 75.42 81.67 79.30 79.03 PointNet++ 95.68 99.80 69.52 65.76 79.88 87.64 70.65 74.92 92.68 85.17 95.46 86.32 83.24 PCT 99.34 100.00 82.59 90.57 86.76 94.26 99.31 93.28 99.94 98.52 98.14 99.76 95.31 P4Transformer 99.85 99.80 92.15 97.45 96.17 97.79 97.10 94.21 99.17 99.17 98.77 97.89 97.46 PSTNet 98.07 99.12 92.86 97.82 92.47 96.74 95.07 95.32 98.17 98.12 98.02 97.48 96.60 DGCN-MFW(Ours) 99.97 100.00 99.83 99.26 98.99 97.79 99.40 99.93 100.00 99.97 99.03 99.56 99.48 表 4 不同网络模型的计算量与复杂度
模型 规模(MB) GFLOPS 参数量(M) 推理延迟(ms) PointNet 8.85 35.60 2.30 32.0727 PointNet++ 5.58 58.70 1.45 214.3876 PCT 10.00 180.96 2.62 182.1140 P4Transformer 10.79 142.80 2.80 66.8684 PSTNet 7.35 114.14 1.90 96.1481 DGCN-MFW(Ours) 7.89 4.51 2.06 15.4716 -
[1] SALTI S, SCHREER O, and DI STEFANO L. Real-time 3d arm pose estimation from monocular video for enhanced HCI[C]. Proceedings of the 1st ACM Workshop on Vision Networks for Behavior Analysis, Vancouver, Canada, 2008: 1–8. doi: 10.1145/1461893.1461895. [2] 韩宗旺, 杨涵, 吴世青, 等. 时空自适应图卷积与Transformer结合的动作识别网络[J]. 电子与信息学报, 2024, 46(6): 2587–2595. doi: 10.11999/JEIT230551.HAN Zongwang, YANG Han, WU Shiqing, et al. Action recognition network combining spatio-temporal adaptive graph convolution and Transformer[J]. Journal of Electronics & Information Technology, 2024, 46(6): 2587–2595. doi: 10.11999/JEIT230551. [3] ZHANG Yushu, JI Junhao, WEN Wenying, et al. Understanding visual privacy protection: A generalized framework with an instance on facial privacy[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 5046–5059. doi: 10.1109/TIFS.2024.3389572. [4] 冯翔, 刘涛, 崔文卿, 等. 基于双视角时序特征融合的毫米波雷达手势数字识别研究[J]. 电子与信息学报, 2023, 45(6): 2134–2143. doi: 10.11999/JEIT220687.FENG Xiang, LIU Tao, CUI Wenqing, et al. Handwriting number recognition based on millimeter-wave radar with dual-view feature fusion network[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2134–2143. doi: 10.11999/JEIT220687. [5] JIN Biao, MA Xiao, ZHANG Zhenkai, et al. Interference-robust millimeter-wave radar-based dynamic hand gesture recognition using 2-D CNN-transformer networks[J]. IEEE Internet of Things Journal, 2024, 11(2): 2741–2752. doi: 10.1109/JIOT.2023.3293092. [6] JIN Biao, PENG Yu, KUANG Xiaofei, et al. Robust dynamic hand gesture recognition based on millimeter wave radar using atten-TsNN[J]. IEEE Sensors Journal, 2022, 22(11): 10861–10869. doi: 10.1109/JSEN.2022.3170311. [7] 丁传威, 刘芷麟, 张力, 等. 基于MIMO雷达成像图序列的切向人体姿态识别方法[J]. 雷达学报(中英文), 2025, 14(1): 151–167. doi: 10.12000/JR24116.DING Chuanwei, LIU Zhilin, ZHANG Li, et al. Tangential human posture recognition with sequential images based on MIMO radar[J]. Journal of Radars, 2025, 14(1): 151–167. doi: 10.12000/JR24116. [8] 杜兰, 李逸明, 薛世鲲, 等. 结合相似度预测和阈值自动求解的开集条件下毫米波雷达点云步态识别方法[J]. 电子与信息学报, 2025, 47(6): 1850–1863. doi: 10.11999/JEIT241034.DU Lan, LI Yiming, XUE Shikun, et al. Millimeter-wave radar point cloud gait recognition method under open-set conditions based on similarity prediction and automatic threshold estimation[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1850–1863. doi: 10.11999/JEIT241034. [9] SINGH A D, SANDHA S S, GARCIA L, et al. RadHAR: Human activity recognition from point clouds generated through a millimeter-wave radar[C]. Proceedings of the 3rd ACM Workshop on Millimeter-wave Networks and Sensing Systems, Los Cabos, Mexico, 2019: 51–56. doi: 10.1145/3349624.3356768. [10] YU Chengxi, XU Zhezhuang, YAN Kun, et al. Noninvasive human activity recognition using millimeter-wave radar[J]. IEEE Systems Journal, 2022, 16(2): 3036–3047. doi: 10.1109/JSYST.2022.3140546. [11] CHARLES R Q, SU Hao, KAICHUN M, et al. PointNet: Deep learning on point sets for 3D classification and segmentation[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 77–85. doi: 10.1109/CVPR.2017.16. [12] QI C R, YI Li, SU Hao, et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space[C]. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 5105–5114. [13] LI Xing, HUANG Qian, WANG Zhijian, et al. SequentialPointNet: A strong parallelized point cloud sequence classification network for 3D action recognition[J]. arXiv preprint arXiv: 2111.08492, 2021. doi: 10.48550/arXiv.2111.08492. (查阅网上资料,不确定本文献类型是否正确,请确认). [14] FAN Hehe, YU Xin, DING Yuhang, et al. PSTNet: Point spatio-temporal convolution on point cloud sequences[C]. 9th International Conference on Learning Representations, Austria, 2021. (查阅网上资料, 未找到本条文献出版城市信息, 请确认). [15] 余翔, 贺登辉, 杨路. 基于STF-GNN毫米波雷达点云人体动作识别方法[J/OL]. 现代雷达, https://doi.org/10.16592/j.cnki.1004-7859.2025152, 2025.YU Xiang, HE Denghui, and YANG Lu. Human action recognition method based on STF-GNN for millimeter-wave radar point cloud[J/OL]. Modern Radar, https://doi.org/10.16592/j.cnki.1004-7859.2025152, 2025. [16] FENG Runyang, GAO Yixing, MA Xueqing, et al. Mutual information-based temporal difference learning for human pose estimation in video[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 17131–17141. doi: 10.1109/CVPR52729.2023.01643. [17] PACE C D, DE NUNZIO A M, DE STEFANO C, et al. Poseidon: A ViT-based architecture for multi-frame pose estimation with adaptive frame weighting and multi-scale feature fusion[J]. arXiv preprint arXiv: 2501.08446, 2025. doi: 10.48550/arXiv.2501.08446. (查阅网上资料,不确定本文献类型是否正确,请确认). [18] LIU Zhenguang, FENG Runyang, CHEN Haoming, et al. Temporal feature alignment and mutual information maximization for video-based human pose estimation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 10996–11006. doi: 10.1109/CVPR52688.2022.01073. [19] PENG Hanchuan, LONG Fuhui, and DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226–1238. doi: 10.1109/TPAMI.2005.159. [20] WU Zonghan, PAN Shirui, CHEN Fengwen, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4–24. doi: 10.1109/TNNLS.2020.2978386. [21] WANG Yue, SUN Yongbin, LIU Ziwei, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics (TOG), 2019, 38(5): 146. doi: 10.1145/3326362. [22] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]. Proceedings of 13th European Conference on Computer Vision -- ECCV 2014, Zurich, Switzerland, 2014: 346–361. doi: 10.1007/978-3-319-10578-9_23. [23] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. 9th International Conference on Learning Representations, Austria, 2021. (查阅网上资料, 未找到本条文献出版地城市信息, 请确认). [24] 靳标, 孙康圣, 吴昊, 等. 基于毫米波雷达三维点云的人体动作识别数据集与方法[J]. 雷达学报(中英文), 2025, 14(1): 73–89. doi: 10.12000/JR24195.JIN Biao, SUN Kangsheng, WU Hao, et al. 3D point cloud from millimeter-wave radar for human action recognition: Dataset and method[J]. Journal of Radars, 2025, 14(1): 73–89. doi: 10.12000/JR24195. [25] GUO Menghao, CAI Junxiong, LIU, Zhengning, et al. PCT: Point cloud transformer[J]. Computational Visual Media, 2021, 7(2): 187–199. doi: 10.1007/s41095-021-0229-5. [26] FAN Hehe, YANG Yi, and KANKANHALLI M. Point 4D transformer networks for spatio-temporal modeling in point cloud videos[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 14199–14208. doi: 10.1109/CVPR46437.2021.01398. -
下载:
下载: