DGCN-MFW：一种面向毫米波雷达三维点云的轻量化人体动作识别网络

丁轩宇; 靳标; 张贞凯

doi:10.11999/JEIT251087

DGCN-MFW：一种面向毫米波雷达三维点云的轻量化人体动作识别网络

doi: 10.11999/JEIT251087 cstr: 32379.14.JEIT251087

江苏科技大学海洋学院镇江 212003

基金项目: 国家自然科学基金(62571220)，河南省重点研发专项(241111212500)，镇江市科技计划(基础研究)项目(JC2025026)，2025年江苏省研究生实践创新计划(SJCX25_2502)

详细信息

作者简介:
丁轩宇：男，硕士生，研究方向为雷达信号处理

靳标：男，副教授，研究方向为雷达信号处理

张贞凯：男，教授，研究方向为雷达信号处理

通讯作者:
靳标　biaojin@just.edu.cn

中图分类号: TN958.94
计量
- 文章访问数: 371
- HTML全文浏览量: 194
- PDF下载量: 33
- 被引次数: 0
出版历程
- 收稿日期: 2025-10-13
- 修回日期: 2026-02-15
- 录用日期: 2026-03-03
- 网络出版日期: 2026-03-15
- 刊出日期: 2026-04-10

DGCN-MFW: A Lightweight Human Action Recognition Network for Millimeter-Wave Radar 3D Point Clouds

Ocean College, Jiangsu University of Science and Technology, Zhenjiang 212003, China

Funds: The National Natural Science Foundation of China (62571220), The Key Research and Development Project of Henan Province (241111212500), The Science and Technology Plan (Basic Research) Project of Zhenjiang City ( JC2025026), 2025 Jiangsu Provincial Postgraduate Practice & Innovation Program (SJCX25_2502)

摘要

摘要: 毫米波雷达三维点云能精准捕捉人体动作的空间变化细节，为动作识别提供了强鲁棒性的数据源。然而，点云固有的无序性与稀疏性限制了特征提取效率，传统方法难以有效建模其局部与全局的空间依赖关系，导致识别精度受限。为解决上述问题，该文提出一种基于动态图卷积与多特征融合的轻量化动作识别网络。该网络核心包含3个模块：(1)动态图卷积模块，通过动态构建局部邻域图结构，自适应学习鲁棒的点云特征，减少动作过渡阶段的误判；(2)多尺度特征融合模块，分层聚合局部细节与全局上下文信息，增强空间表征与行为理解能力；(3)自适应帧加权模块，依据信息熵与数据可靠性为不同时序帧分配权重，聚焦关键时序片段。在公开数据集mmWave-3DPCHM-1.0上的实验表明，所提方法对TI与Vayyar数据集上的平均识别准确率分别达到98.32%与99.48%，且仅需2.06 M参数量与4.51 GFLOPs计算量，在识别精度与模型轻量化方面均优于现有主流方法。
- 毫米波雷达三维点云 /
- 人体动作识别 /
- 图卷积网络 /
- 多尺度特征融合 /
- 自适应帧加权
Abstract: Objective Millimeter-wave radar 3D point clouds provide important spatial cues for human action recognition. However, their inherent disorder complicates feature extraction, and actions rely on temporal correlations across multiple frames, which makes single-frame analysis prone to error. In this paper, a dynamic graph convolutional network is proposed for long 3D point-cloud sequences to improve recognition performance and efficiency through multi-scale feature fusion, adaptive frame weighting, and cross-attention. Methods A dynamic graph convolutional network solution, DGCN-MFW, is proposed with three core components: dynamic graph convolution feature extraction, multi-scale feature fusion, and adaptive temporal frame weighting. In Step 1, dynamic graph convolution is used to automatically construct spatial geometry through local directed neighborhood graphs, and the neighborhoods are updated online. This design avoids manual graph construction and improves feature robustness. In Step 2, multi-scale feature fusion is applied to jointly extract and integrate point-cloud features across spatial and temporal dimensions, thereby capturing local details and global semantics. In Step 3, adaptive frame weighting is introduced to learn the importance of each frame, emphasize discriminative key frames, and suppress noisy or unimportant frames. Cross-attention is further used to enable information exchange between the center frame and its context, compensating for the limitations of single-frame analysis caused by motion blur, occlusion, or pose ambiguity. Results and Discussions The proposed network extracts features through dynamic graph convolution, performs multi-scale feature fusion and adaptive frame weighting, and ultimately completes human action recognition. It achieves strong performance on the public TI and Vayyar millimeter-wave radar point-cloud datasets. With only 2.06M parameters and 4.51 GFLOPs, it outperforms existing methods (Tables 2, 3, and 4). Ablation experiments confirm that both core modules substantially improve recognition accuracy (Table 1). The confusion matrices indicate accuracy above 99% for most actions on the two datasets, demonstrating superior recognition performance (Figs. 10 and 11). However, its scalability, parameter efficiency, and processing efficiency for large-scale data still require improvement. Future work will therefore focus on further lightweight design and architectural optimization to improve efficiency. Conclusions To address the two main challenges in mmWave radar 3D point-cloud-based human action recognition, an action recognition algorithm based on a dynamic graph convolutional network and multi-feature fusion is proposed. A multi-scale feature fusion module and cross-scale interaction are used to extract local and global features, which improves spatial representation. An adaptive frame-weighting module and a cross-attention mechanism are adopted to capture the temporal evolution of actions. The method achieves accuracies of 98.32% and 99.48% on two datasets with 2.06M parameters and 4.51 GFLOPs, outperforming mainstream models. It provides a new solution for high-precision, low-resource mmWave radar action recognition and is suitable for real-time scenarios such as industrial human-machine interaction, intelligent security, and healthcare.
- Millimeter-wave radar 3D point cloud /
- Human action recognition /
- Graph convolutional network /
- Multi-scale feature fusion /
- Adaptive frame weighting

HTML全文

图 1 毫米波雷达三维点云的生成流程

下载: 全尺寸图片幻灯片

图 2 左前倾动作的点云数据示例

下载: 全尺寸图片幻灯片

图 3 DGCN-MFW网络的结构

下载: 全尺寸图片幻灯片

图 4 边缘卷积构建局部有向邻域图

下载: 全尺寸图片幻灯片

图 5 帧加权可视化图

下载: 全尺寸图片幻灯片

图 6 不同参数K下的性能对比

下载: 全尺寸图片幻灯片

图 7 不同加权帧数下的性能对比

下载: 全尺寸图片幻灯片

图 8 网络在TI数据集上不同学习率下的准确率和损失曲线

下载: 全尺寸图片幻灯片

图 9 网络在Vayyar数据集上不同学习率下的准确率和损失曲线

下载: 全尺寸图片幻灯片

图 10 模型在TI数据集上的混淆矩阵

下载: 全尺寸图片幻灯片

图 11 模型在Vayyar数据集上的混淆矩阵

下载: 全尺寸图片幻灯片

表 1 不同模块组合的识别准确率对比(%)

模块组合	DGCNN	MSFF	AFW	在TI数据集上的准确率	在Vayyar数据集上的准确率
Baseline	√			93.59	94.54
Baseline-1	√	√		95.33	96.82
Baseline-2	√		√	97.31	98.57
DGCN-MFW(本文)	√	√	√	98.32	99.48

下载: 导出CSV

表 2 不同网络模型在TI数据集的动作识别准确率(%)

模型	打拳	跌倒	跳	左前倾	左挥手	开双臂	右前倾	右挥手	静坐	下蹲	站立	步行	平均
PointNet	78.25	99.12	71.36	73.89	64.58	75.63	58.94	65.72	86.83	81.29	87.65	78.92	76.85
PointNet++	87.42	98.88	75.15	76.68	77.31	74.16	58.27	67.84	88.76	87.52	93.69	83.48	80.62
PCT	97.90	99.00	80.74	88.57	84.08	85.84	96.94	91.49	95.54	95.21	98.38	96.63	92.64
P4Transformer	98.92	98.77	82.43	98.13	85.62	98.87	97.84	95.19	99.17	99.12	99.51	96.56	95.84
PSTNet	98.54	99.03	78.30	90.57	90.78	96.09	96.06	94.76	99.21	97.40	99.18	98.49	94.98
DGCN-MFW(本文)	99.71	99.41	93.57	98.58	95.24	98.82	97.94	99.71	99.17	99.36	99.07	99.31	98.32

下载: 导出CSV

表 3 不同网络模型在Vayyar数据集上的动作识别准确率(%)

模型	打拳	跌倒	跳	左前倾	左挥手	开双臂	右前倾	右挥手	静坐	下蹲	站立	步行	平均
PointNet	89.12	98.25	72.18	77.05	66.88	81.43	56.79	68.91	90.36	75.42	81.67	79.30	79.03
PointNet++	95.68	99.80	69.52	65.76	79.88	87.64	70.65	74.92	92.68	85.17	95.46	86.32	83.24
PCT	99.34	100.00	82.59	90.57	86.76	94.26	99.31	93.28	99.94	98.52	98.14	99.76	95.31
P4Transformer	99.85	99.80	92.15	97.45	96.17	97.79	97.10	94.21	99.17	99.17	98.77	97.89	97.46
PSTNet	98.07	99.12	92.86	97.82	92.47	96.74	95.07	95.32	98.17	98.12	98.02	97.48	96.60
DGCN-MFW(本文)	99.97	100.00	99.83	99.26	98.99	97.79	99.40	99.93	100.00	99.97	99.03	99.56	99.48

下载: 导出CSV

表 4 不同网络模型的计算量与复杂度

模型	规模(MB)	GFLOPs	参数量(M)	推理延迟(ms)
PointNet	8.85	35.60	2.30	32.0727
PointNet++	5.58	58.70	1.45	214.3876
PCT	10.00	180.96	2.62	182.1140
P4Transformer	10.79	142.80	2.80	66.8684
PSTNet	7.35	114.14	1.90	96.1481
DGCN-MFW(本文)	7.89	4.51	2.06	15.4716

下载: 导出CSV

参考文献(26)

[1]	SALTI S, SCHREER O, and DI STEFANO L. Real-time 3d arm pose estimation from monocular video for enhanced HCI[C]. The 1st ACM Workshop on Vision Networks for Behavior Analysis, Vancouver, Canada, 2008: 1–8. doi: 10.1145/1461893.1461895.
[2]	韩宗旺, 杨涵, 吴世青, 等. 时空自适应图卷积与Transformer结合的动作识别网络[J]. 电子与信息学报, 2024, 46(6): 2587–2595. doi: 10.11999/JEIT230551. HAN Zongwang, YANG Han, WU Shiqing, et al. Action recognition network combining spatio-temporal adaptive graph convolution and Transformer[J]. Journal of Electronics & Information Technology, 2024, 46(6): 2587–2595. doi: 10.11999/JEIT230551.
[3]	ZHANG Yushu, JI Junhao, WEN Wenying, et al. Understanding visual privacy protection: A generalized framework with an instance on facial privacy[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 5046–5059. doi: 10.1109/TIFS.2024.3389572.
[4]	冯翔, 刘涛, 崔文卿, 等. 基于双视角时序特征融合的毫米波雷达手势数字识别研究[J]. 电子与信息学报, 2023, 45(6): 2134–2143. doi: 10.11999/JEIT220687. FENG Xiang, LIU Tao, CUI Wenqing, et al. Handwriting number recognition based on millimeter-wave radar with dual-view feature fusion network[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2134–2143. doi: 10.11999/JEIT220687.
[5]	JIN Biao, MA Xiao, ZHANG Zhenkai, et al. Interference-robust millimeter-wave radar-based dynamic hand gesture recognition using 2-D CNN-transformer networks[J]. IEEE Internet of Things Journal, 2024, 11(2): 2741–2752. doi: 10.1109/JIOT.2023.3293092.
[6]	JIN Biao, PENG Yu, KUANG Xiaofei, et al. Robust dynamic hand gesture recognition based on millimeter wave radar using atten-TsNN[J]. IEEE Sensors Journal, 2022, 22(11): 10861–10869. doi: 10.1109/JSEN.2022.3170311.
[7]	丁传威, 刘芷麟, 张力, 等. 基于MIMO雷达成像图序列的切向人体姿态识别方法[J]. 雷达学报(中英文), 2025, 14(1): 151–167. doi: 10.12000/JR24116. DING Chuanwei, LIU Zhilin, ZHANG Li, et al. Tangential human posture recognition with sequential images based on MIMO radar[J]. Journal of Radars, 2025, 14(1): 151–167. doi: 10.12000/JR24116.
[8]	杜兰, 李逸明, 薛世鲲, 等. 结合相似度预测和阈值自动求解的开集条件下毫米波雷达点云步态识别方法[J]. 电子与信息学报, 2025, 47(6): 1850–1863. doi: 10.11999/JEIT241034. DU Lan, LI Yiming, XUE Shikun, et al. Millimeter-wave radar point cloud gait recognition method under open-set conditions based on similarity prediction and automatic threshold estimation[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1850–1863. doi: 10.11999/JEIT241034.
[9]	SINGH A D, SANDHA S S, GARCIA L, et al. RadHAR: Human activity recognition from point clouds generated through a millimeter-wave radar[C]. The 3rd ACM Workshop on Millimeter-wave Networks and Sensing Systems, Los Cabos, Mexico, 2019: 51–56. doi: 10.1145/3349624.3356768.
[10]	YU Chengxi, XU Zhezhuang, YAN Kun, et al. Noninvasive human activity recognition using millimeter-wave radar[J]. IEEE Systems Journal, 2022, 16(2): 3036–3047. doi: 10.1109/JSYST.2022.3140546.
[11]	CHARLES R Q, SU Hao, KAICHUN M, et al. PointNet: Deep learning on point sets for 3D classification and segmentation[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 77–85. doi: 10.1109/CVPR.2017.16.
[12]	QI C R, YI Li, SU Hao, et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 5105–5114.
[13]	LI Xing, HUANG Qian, WANG Zhijian, et al. SequentialPointNet: A strong parallelized point cloud sequence classification network for 3D action recognition[J]. arXiv preprint arXiv: 2111.08492, 2021. doi: 10.48550/arXiv.2111.08492.
[14]	FAN Hehe, YU Xin, DING Yuhang, et al. PSTNet: Point spatio-temporal convolution on point cloud sequences[C]. 9th International Conference on Learning Representations, Austria, 2021.
[15]	余翔, 贺登辉, 杨路. 基于STF-GNN毫米波雷达点云人体动作识别方法[J/OL]. 现代雷达, https://doi.org/10.16592/j.cnki.1004-7859.2025152, 2025. YU Xiang, HE Denghui, and YANG Lu. Human action recognition method based on STF-GNN for millimeter-wave radar point cloud[J/OL]. Modern Radar, https://doi.org/10.16592/j.cnki.1004-7859.2025152, 2025.
[16]	FENG Runyang, GAO Yixing, MA Xueqing, et al. Mutual information-based temporal difference learning for human pose estimation in video[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 17131–17141. doi: 10.1109/CVPR52729.2023.01643.
[17]	PACE C D, DE NUNZIO A M, DE STEFANO C, et al. Poseidon: A ViT-based architecture for multi-frame pose estimation with adaptive frame weighting and multi-scale feature fusion[J]. arXiv preprint arXiv: 2501.08446, 2025. doi: 10.48550/arXiv.2501.08446.
[18]	LIU Zhenguang, FENG Runyang, CHEN Haoming, et al. Temporal feature alignment and mutual information maximization for video-based human pose estimation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 10996–11006. doi: 10.1109/CVPR52688.2022.01073.
[19]	PENG Hanchuan, LONG Fuhui, and DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226–1238. doi: 10.1109/TPAMI.2005.159.
[20]	WU Zonghan, PAN Shirui, CHEN Fengwen, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4–24. doi: 10.1109/TNNLS.2020.2978386.
[21]	WANG Yue, SUN Yongbin, LIU Ziwei, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics (TOG), 2019, 38(5): 146. doi: 10.1145/3326362.
[22]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]. 13th European Conference on Computer Vision -- ECCV 2014, Zurich, Switzerland, 2014: 346–361. doi: 10.1007/978-3-319-10578-9_23.
[23]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. 9th International Conference on Learning Representations, Austria, 2021.
[24]	靳标, 孙康圣, 吴昊, 等. 基于毫米波雷达三维点云的人体动作识别数据集与方法[J]. 雷达学报(中英文), 2025, 14(1): 73–89. doi: 10.12000/JR24195. JIN Biao, SUN Kangsheng, WU Hao, et al. 3D point cloud from millimeter-wave radar for human action recognition: Dataset and method[J]. Journal of Radars, 2025, 14(1): 73–89. doi: 10.12000/JR24195.
[25]	GUO Menghao, CAI Junxiong, LIU, Zhengning, et al. PCT: Point cloud transformer[J]. Computational Visual Media, 2021, 7(2): 187–199. doi: 10.1007/s41095-021-0229-5.
[26]	FAN Hehe, YANG Yi, and KANKANHALLI M. Point 4D transformer networks for spatio-temporal modeling in point cloud videos[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 14199–14208. doi: 10.1109/CVPR46437.2021.01398.