DGCN-MFW: A Lightweight Human Action Recognition Network for Millimeter-Wave Radar 3D Point Clouds
-
摘要: 毫米波雷达三维点云能精准捕捉人体动作的空间变化细节,为动作识别提供了强鲁棒性的数据源。然而,点云固有的无序性与稀疏性限制了特征提取效率,传统方法难以有效建模其局部与全局的空间依赖关系,导致识别精度受限。为解决上述问题,该文提出一种基于动态图卷积与多特征融合的轻量化动作识别网络。该网络核心包含3个模块:(1)动态图卷积模块,通过动态构建局部邻域图结构,自适应学习鲁棒的点云特征,减少动作过渡阶段的误判;(2)多尺度特征融合模块,分层聚合局部细节与全局上下文信息,增强空间表征与行为理解能力;(3)自适应帧加权模块,依据信息熵与数据可靠性为不同时序帧分配权重,聚焦关键时序片段。在公开数据集mmWave-3DPCHM-1.0上的实验表明,所提方法对TI与Vayyar数据集上的平均识别准确率分别达到98.32%与99.48%,且仅需2.06 M参数量与4.51 GFLOPS计算量,在识别精度与模型轻量化方面均优于现有主流方法。Abstract:
Objective Millimeter-wave radar 3D point clouds provide important spatial cues for human action recognition. However, their inherent disorder complicates feature extraction, and actions rely on temporal correlations across multiple frames, which makes single-frame analysis prone to error. In this paper, a dynamic graph convolutional network is proposed for long 3D point-cloud sequences to improve recognition performance and efficiency through multi-scale feature fusion, adaptive frame weighting, and cross-attention. Methods A dynamic graph convolutional network solution, DGCN-MFW, is proposed with three core components: dynamic graph convolution feature extraction, multi-scale feature fusion, and adaptive temporal frame weighting. In Step 1, dynamic graph convolution is used to automatically construct spatial geometry through local directed neighborhood graphs, and the neighborhoods are updated online. This design avoids manual graph construction and improves feature robustness. In Step 2, multi-scale feature fusion is applied to jointly extract and integrate point-cloud features across spatial and temporal dimensions, thereby capturing local details and global semantics. In Step 3, adaptive frame weighting is introduced to learn the importance of each frame, emphasize discriminative key frames, and suppress noisy or unimportant frames. Cross-attention is further used to enable information exchange between the center frame and its context, compensating for the limitations of single-frame analysis caused by motion blur, occlusion, or pose ambiguity. Results and Discussions The proposed network extracts features through dynamic graph convolution, performs multi-scale feature fusion and adaptive frame weighting, and ultimately completes human action recognition. It achieves strong performance on the public TI and Vayyar millimeter-wave radar point-cloud datasets. With only 2.06M parameters and 4.51 GFLOPs, it outperforms existing methods (Tables 2, 3, and 4). Ablation experiments confirm that both core modules substantially improve recognition accuracy (Table 1). The confusion matrices indicate accuracy above 99% for most actions on the two datasets, demonstrating superior recognition performance (Figs. 10 and 11). However, its scalability, parameter efficiency, and processing efficiency for large-scale data still require improvement. Future work will therefore focus on further lightweight design and architectural optimization to improve efficiency. Conclusions To address the two main challenges in mmWave radar 3D point-cloud-based human action recognition, an action recognition algorithm based on a dynamic graph convolutional network and multi-feature fusion is proposed. A multi-scale feature fusion module and cross-scale interaction are used to extract local and global features, which improves spatial representation. An adaptive frame-weighting module and a cross-attention mechanism are adopted to capture the temporal evolution of actions. The method achieves accuracies of 98.32% and 99.48% on two datasets with 2.06M parameters and 4.51 GFLOPs, outperforming mainstream models. It provides a new solution for high-precision, low-resource mmWave radar action recognition and is suitable for real-time scenarios such as industrial human-machine interaction, intelligent security, and healthcare. -
表 1 不同模块组合的识别准确率对比(%)
模块组合 DGCNN MSFF AFW 在TI数据集上的准确率 在Vayyar数据集上的准确率 Baseline √ 93.59 94.54 Baseline-1 √ √ 95.33 96.82 Baseline-2 √ √ 97.31 98.57 DGCN-MFW(本文) √ √ √ 98.32 99.48 表 2 不同网络模型在TI数据集的动作识别准确率(%)
模型 打拳 跌倒 跳 左前倾 左挥手 开双臂 右前倾 右挥手 静坐 下蹲 站立 步行 平均 PointNet 78.25 99.12 71.36 73.89 64.58 75.63 58.94 65.72 86.83 81.29 87.65 78.92 76.85 PointNet++ 87.42 98.88 75.15 76.68 77.31 74.16 58.27 67.84 88.76 87.52 93.69 83.48 80.62 PCT 97.90 99.00 80.74 88.57 84.08 85.84 96.94 91.49 95.54 95.21 98.38 96.63 92.64 P4Transformer 98.92 98.77 82.43 98.13 85.62 98.87 97.84 95.19 99.17 99.12 99.51 96.56 95.84 PSTNet 98.54 99.03 78.30 90.57 90.78 96.09 96.06 94.76 99.21 97.40 99.18 98.49 94.98 DGCN-MFW(本文) 99.71 99.41 93.57 98.58 95.24 98.82 97.94 99.71 99.17 99.36 99.07 99.31 98.32 表 3 不同网络模型在Vayyar数据集上的动作识别准确率(%)
模型 打拳 跌倒 跳 左前倾 左挥手 开双臂 右前倾 右挥手 静坐 下蹲 站立 步行 平均 PointNet 89.12 98.25 72.18 77.05 66.88 81.43 56.79 68.91 90.36 75.42 81.67 79.30 79.03 PointNet++ 95.68 99.80 69.52 65.76 79.88 87.64 70.65 74.92 92.68 85.17 95.46 86.32 83.24 PCT 99.34 100.00 82.59 90.57 86.76 94.26 99.31 93.28 99.94 98.52 98.14 99.76 95.31 P4Transformer 99.85 99.80 92.15 97.45 96.17 97.79 97.10 94.21 99.17 99.17 98.77 97.89 97.46 PSTNet 98.07 99.12 92.86 97.82 92.47 96.74 95.07 95.32 98.17 98.12 98.02 97.48 96.60 DGCN-MFW(本文) 99.97 100.00 99.83 99.26 98.99 97.79 99.40 99.93 100.00 99.97 99.03 99.56 99.48 表 4 不同网络模型的计算量与复杂度
模型 规模(MB) GFLOPS 参数量(M) 推理延迟(ms) PointNet 8.85 35.60 2.30 32.0727 PointNet++ 5.58 58.70 1.45 214.3876 PCT 10.00 180.96 2.62 182.1140 P4Transformer 10.79 142.80 2.80 66.8684 PSTNet 7.35 114.14 1.90 96.1481 DGCN-MFW(本文) 7.89 4.51 2.06 15.4716 -
[1] SALTI S, SCHREER O, and DI STEFANO L. Real-time 3d arm pose estimation from monocular video for enhanced HCI[C]. The 1st ACM Workshop on Vision Networks for Behavior Analysis, Vancouver, Canada, 2008: 1–8. doi: 10.1145/1461893.1461895. [2] 韩宗旺, 杨涵, 吴世青, 等. 时空自适应图卷积与Transformer结合的动作识别网络[J]. 电子与信息学报, 2024, 46(6): 2587–2595. doi: 10.11999/JEIT230551.HAN Zongwang, YANG Han, WU Shiqing, et al. Action recognition network combining spatio-temporal adaptive graph convolution and Transformer[J]. Journal of Electronics & Information Technology, 2024, 46(6): 2587–2595. doi: 10.11999/JEIT230551. [3] ZHANG Yushu, JI Junhao, WEN Wenying, et al. Understanding visual privacy protection: A generalized framework with an instance on facial privacy[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 5046–5059. doi: 10.1109/TIFS.2024.3389572. [4] 冯翔, 刘涛, 崔文卿, 等. 基于双视角时序特征融合的毫米波雷达手势数字识别研究[J]. 电子与信息学报, 2023, 45(6): 2134–2143. doi: 10.11999/JEIT220687.FENG Xiang, LIU Tao, CUI Wenqing, et al. Handwriting number recognition based on millimeter-wave radar with dual-view feature fusion network[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2134–2143. doi: 10.11999/JEIT220687. [5] JIN Biao, MA Xiao, ZHANG Zhenkai, et al. Interference-robust millimeter-wave radar-based dynamic hand gesture recognition using 2-D CNN-transformer networks[J]. IEEE Internet of Things Journal, 2024, 11(2): 2741–2752. doi: 10.1109/JIOT.2023.3293092. [6] JIN Biao, PENG Yu, KUANG Xiaofei, et al. Robust dynamic hand gesture recognition based on millimeter wave radar using atten-TsNN[J]. IEEE Sensors Journal, 2022, 22(11): 10861–10869. doi: 10.1109/JSEN.2022.3170311. [7] 丁传威, 刘芷麟, 张力, 等. 基于MIMO雷达成像图序列的切向人体姿态识别方法[J]. 雷达学报(中英文), 2025, 14(1): 151–167. doi: 10.12000/JR24116.DING Chuanwei, LIU Zhilin, ZHANG Li, et al. Tangential human posture recognition with sequential images based on MIMO radar[J]. Journal of Radars, 2025, 14(1): 151–167. doi: 10.12000/JR24116. [8] 杜兰, 李逸明, 薛世鲲, 等. 结合相似度预测和阈值自动求解的开集条件下毫米波雷达点云步态识别方法[J]. 电子与信息学报, 2025, 47(6): 1850–1863. doi: 10.11999/JEIT241034.DU Lan, LI Yiming, XUE Shikun, et al. Millimeter-wave radar point cloud gait recognition method under open-set conditions based on similarity prediction and automatic threshold estimation[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1850–1863. doi: 10.11999/JEIT241034. [9] SINGH A D, SANDHA S S, GARCIA L, et al. RadHAR: Human activity recognition from point clouds generated through a millimeter-wave radar[C]. The 3rd ACM Workshop on Millimeter-wave Networks and Sensing Systems, Los Cabos, Mexico, 2019: 51–56. doi: 10.1145/3349624.3356768. [10] YU Chengxi, XU Zhezhuang, YAN Kun, et al. Noninvasive human activity recognition using millimeter-wave radar[J]. IEEE Systems Journal, 2022, 16(2): 3036–3047. doi: 10.1109/JSYST.2022.3140546. [11] CHARLES R Q, SU Hao, KAICHUN M, et al. PointNet: Deep learning on point sets for 3D classification and segmentation[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 77–85. doi: 10.1109/CVPR.2017.16. [12] QI C R, YI Li, SU Hao, et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 5105–5114. [13] LI Xing, HUANG Qian, WANG Zhijian, et al. SequentialPointNet: A strong parallelized point cloud sequence classification network for 3D action recognition[J]. arXiv preprint arXiv: 2111.08492, 2021. doi: 10.48550/arXiv.2111.08492. [14] FAN Hehe, YU Xin, DING Yuhang, et al. PSTNet: Point spatio-temporal convolution on point cloud sequences[C]. 9th International Conference on Learning Representations, Austria, 2021. [15] 余翔, 贺登辉, 杨路. 基于STF-GNN毫米波雷达点云人体动作识别方法[J/OL]. 现代雷达, https://doi.org/10.16592/j.cnki.1004-7859.2025152, 2025.YU Xiang, HE Denghui, and YANG Lu. Human action recognition method based on STF-GNN for millimeter-wave radar point cloud[J/OL]. Modern Radar, https://doi.org/10.16592/j.cnki.1004-7859.2025152, 2025. [16] FENG Runyang, GAO Yixing, MA Xueqing, et al. Mutual information-based temporal difference learning for human pose estimation in video[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 17131–17141. doi: 10.1109/CVPR52729.2023.01643. [17] PACE C D, DE NUNZIO A M, DE STEFANO C, et al. Poseidon: A ViT-based architecture for multi-frame pose estimation with adaptive frame weighting and multi-scale feature fusion[J]. arXiv preprint arXiv: 2501.08446, 2025. doi: 10.48550/arXiv.2501.08446. [18] LIU Zhenguang, FENG Runyang, CHEN Haoming, et al. Temporal feature alignment and mutual information maximization for video-based human pose estimation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 10996–11006. doi: 10.1109/CVPR52688.2022.01073. [19] PENG Hanchuan, LONG Fuhui, and DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226–1238. doi: 10.1109/TPAMI.2005.159. [20] WU Zonghan, PAN Shirui, CHEN Fengwen, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4–24. doi: 10.1109/TNNLS.2020.2978386. [21] WANG Yue, SUN Yongbin, LIU Ziwei, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics (TOG), 2019, 38(5): 146. doi: 10.1145/3326362. [22] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]. 13th European Conference on Computer Vision -- ECCV 2014, Zurich, Switzerland, 2014: 346–361. doi: 10.1007/978-3-319-10578-9_23. [23] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. 9th International Conference on Learning Representations, Austria, 2021. [24] 靳标, 孙康圣, 吴昊, 等. 基于毫米波雷达三维点云的人体动作识别数据集与方法[J]. 雷达学报(中英文), 2025, 14(1): 73–89. doi: 10.12000/JR24195.JIN Biao, SUN Kangsheng, WU Hao, et al. 3D point cloud from millimeter-wave radar for human action recognition: Dataset and method[J]. Journal of Radars, 2025, 14(1): 73–89. doi: 10.12000/JR24195. [25] GUO Menghao, CAI Junxiong, LIU, Zhengning, et al. PCT: Point cloud transformer[J]. Computational Visual Media, 2021, 7(2): 187–199. doi: 10.1007/s41095-021-0229-5. [26] FAN Hehe, YANG Yi, and KANKANHALLI M. Point 4D transformer networks for spatio-temporal modeling in point cloud videos[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 14199–14208. doi: 10.1109/CVPR46437.2021.01398. -
下载:
下载: