Group Activity Recognition under Multi-scale Sub-group Interaction Relationships

ZHU Liping; WU Silin; CHEN Xiaohe; LI Chengyang; ZHU Kaijie

doi:10.11999/JEIT231304

Volume 46 Issue 5

May 2024

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2024 > 46(5): 2228-2236

ZHU Liping, WU Silin, CHEN Xiaohe, LI Chengyang, ZHU Kaijie. Group Activity Recognition under Multi-scale Sub-group Interaction Relationships[J]. Journal of Electronics & Information Technology, 2024, 46(5): 2228-2236. doi: 10.11999/JEIT231304

Citation:

ZHU Liping, WU Silin, CHEN Xiaohe, LI Chengyang, ZHU Kaijie. Group Activity Recognition under Multi-scale Sub-group Interaction Relationships[J]. Journal of Electronics & Information Technology, 2024, 46(5): 2228-2236. doi: 10.11999/JEIT231304

Citation:

PDF( 5988 KB)

Group Activity Recognition under Multi-scale Sub-group Interaction Relationships

doi: 10.11999/JEIT231304

ZHU Liping^{1, 2},
WU Silin^{1, 2},
CHEN Xiaohe^{1, 2
,
,},
LI Chengyang³,
ZHU Kaijie^{1, 2}

1.
Beijing Key Laboratory of Petroleum Data Mining, Beijing 102249, China
2.
College of Information Science and Engineering/College of Artificial Intelligence, China University of Petroleum (Beijing), Beijing 102249, China
3.
School of Computer Science, Peking University, Beijing 100871, China

Funds: Beijing Natural Science Foundation (L233002), The CNPC Innovation Fund (2022DQ02-0609)

Received Date: 2023-11-27
Rev Recd Date: 2024-04-29

Available Online: 2024-05-11

Publish Date: 2024-05-30

Abstract

Abstract

Group activity recognition aims to identify behaviors involving multiple individuals. In real-world applications, group behavior is often treated as a hierarchical structure, which consists group, subgroups and individuals. Previous researches have been focused on modeling relationships between individuals, without in-depth relationship analysis between subgroups. Therefore, a novel hierarchical group activity recognition framework based on Multi-scale Sub-group Interaction Relationships (MSIR) is proposed, and an innovative multi-scale interaction features extraction method between subgroups is presented as specified below. A sub-group division module is implemented. It aggregates individuals with potential correlations based on their appearance features and spatial positions, then dynamically generates subgroups of different scales using semantic information. A sub-group interactive feature extraction module is developed to extract more discriminative subgroup features. It constructs interaction matrices between different subgroups and leverages the relational reasoning capabilities of graph neural networks. Compared with existing twelve methods on benchmark datasets for group behavior recognition, including volleyball and collective activity datasets, the methodology of this paper demonstrates superior performance. This research presents an easily extendable and adaptable group activity recognition framework, exhibiting strong generalization capabilities across different datasets.
- Activity recognition,
- Group activity,
- Sub-group division,
- Relational reasoning

FullText(HTML)

References(25)

References

[1]	IBRAHIM M S and MORI G. Hierarchical relational networks for group activity recognition and retrieval[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 742–758. doi: 10.1007/978-3-030-01219-9_44.
[2]	WU Jianchao, WANG Limin, WANG Li, et al. Learning actor relation graphs for group activity recognition[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Bench, USA, 2019: 9956–9966. doi: 10.1109/Cvpr.2019.01020.
[3]	LAN Tian, SIGAL L, and MORI G. Social roles in hierarchical models for human activity recognition[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 1354–1361. doi: 10.1109/CVPR.2012.6247821.
[4]	YAN Rui, XIE Lingxi, TANG Jinhui, et al. HiGCIN: Hierarchical graph-based cross inference network for group activity recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 6955–6968. doi: 10.1109/Tpami.2020.3034233.
[5]	ZHU Xiaolin, WANG Dongli, and ZHOU Yan. Hierarchical spatial-temporal transformer with motion trajectory for individual action and group activity recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodes Island, Greece, 2023: 1–5. doi: 10.1109/ICASSP49357.2023.10096109.
[6]	HU Guyue, CUI Bo, HE Yuan, et al. Progressive relation learning for group activity recognition[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 977–986. doi: 10.1109/Cvpr42600.2020.00106.
[7]	PRAMONO R R A, CHEN Y T, and FANG W H. Empowering relational network by self-attention augmented conditional random fields for group activity recognition[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 71–90. doi: 10.1007/978-3-030-58452-8_5.
[8]	WANG Lukun, FENG Wancheng, TIAN Chunpeng, et al. 3D-unified spatial-temporal graph for group activity recognition[J]. Neurocomputing, 2023, 556: 126646. doi: 10.1016/j.neucom.2023.126646.
[9]	KIPF T N and WELLING M. Semi-supervised classification with graph convolutional networks[C]. The 5th International Conference on Learning Representations, Toulon, France, 2017.
[10]	曹毅, 吴伟官, 李平, 等. 基于时空特征增强图卷积网络的骨架行为识别[J]. 电子与信息学报, 2023, 45(8): 3022–3031. doi: 10.11999/JEIT220749. CAO Yi, WU Weiguan, LI Ping, et al. Skeleton action recognition based on spatio-temporal feature enhanced graph convolutional network[J]. Journal of Electronics & Information Technology, 2023, 45(8): 3022–3031. doi: 10.11999/JEIT220749.
[11]	DUAN Haodong, ZHAO Yue, CHEN Kai, et al. Revisiting skeleton-based action recognition[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 2959–2968. doi: 10.1109/Cvpr52688.2022.00298.
[12]	FENG Yiqiang, SHAN Shimin, LIU Yu, et al. DRGCN: Deep relation gcn for group activity recognition[C]. 27th International Conference on Neural Information Processing, Bangkok, Thailand, 2020: 361–368. doi: 10.1007/978-3-030-63820-7.
[13]	KUANG Zijian and TIE Xinran. IARG: Improved actor relation graph based group activity recognition[C]. The Third International Conference on Smart Multimedia, Marseille, France, 2020. doi: 10.1007/978-3-031-22061-6_3.
[14]	AMER M R, LEI Peng, and TODOROVIC S. HiRF: Hierarchical random field for collective activity recognition in videos[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014, 8694: 572–585. doi: 10.1007/978-3-319-10599-4_37.
[15]	BAGAUTDINOV T, ALAHI A, FLEURET F, et al. Social scene understanding: End-to-end multi-person action localization and collective activity recognition[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3425–3434. doi: 10.1109/Cvpr.2017.365.
[16]	CHOI W, SHAHID K, and SAVARESE S. What are they doing?: Collective activity classification using spatio-temporal relationship among people[C]. 2009 IEEE 12th International Conference on Computer Vision Workshops, Kyoto, Japan, 2009: 1282–1289. doi: 10.1109/ICCVW.2009.5457461.
[17]	QI Mengshi, QIN Jie, LI Annan, et al. stagNet: An attentive semantic RNN for group activity recognition[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018, 11214: 104–120. doi: 10.1007/978-3-030-01249-6_7.
[18]	DEMIREL B and OZKAN H. DECOMPL: Decompositional learning with attention pooling for group activity recognition from a single volleyball image[EB/OL]. https://arxiv.org/abs/2303.06439 2023.
[19]	DU Zexing, WANG Xue, and WANG Qing. Self-supervised global spatio-temporal interaction pre-training for group activity recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(9): 5076–5088. doi: 10.1109/Tcsvt.2023.3249906.
[20]	LI Wei, YANG Tianzhao, WU Xiao, et al. Learning graph-based residual aggregation network for group activity recognition[C]. The Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, 2022: 1102–1108. doi: 10.24963/ijcai.2022/154.
[21]	LIU Tianshan, ZHAO Rui, LAM K M, et al. Visual-semantic graph neural network with pose-position attentive learning for group activity recognition[J]. Neurocomputing, 2022, 491: 217–231. doi: 10.1016/j.neucom.2022.03.066.
[22]	WU Lifang, LANG Xianglong, XIANG Ye, et al. Active spatial positions based hierarchical relation inference for group activity recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(6): 2839–2851. doi: 10.1109/Tcsvt.2022.3228731.
[23]	WU Lifang, LANG Xixanglong, XIANG Ye, et al. Multi-perspective representation to part-based graph for group activity recognition[J]. Sensors, 2022, 22(15): 5521. doi: 10.3390/s22155521.
[24]	YUAN Hangjie, NI Dong, and WANG Mang. Spatio-temporal dynamic inference network for group activity recognition[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 7456–7465. doi: 10.1109/Iccv48922.2021.00738.
[25]	ZHOU Honglu, KADAV A, SHAMSIAN A, et al. COMPOSER: Compositional reasoning of group activity in videos with keypoint-only modality[C]. The 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 249–266. doi: 10.1007/978-3-031-19833-5_15.