Citation: | ZHU Liping, WU Silin, CHEN Xiaohe, LI Chengyang, ZHU Kaijie. Group Activity Recognition under Multi-scale Sub-group Interaction Relationships[J]. Journal of Electronics & Information Technology, 2024, 46(5): 2228-2236. doi: 10.11999/JEIT231304 |
[1] |
IBRAHIM M S and MORI G. Hierarchical relational networks for group activity recognition and retrieval[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 742–758. doi: 10.1007/978-3-030-01219-9_44.
|
[2] |
WU Jianchao, WANG Limin, WANG Li, et al. Learning actor relation graphs for group activity recognition[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Bench, USA, 2019: 9956–9966. doi: 10.1109/Cvpr.2019.01020.
|
[3] |
LAN Tian, SIGAL L, and MORI G. Social roles in hierarchical models for human activity recognition[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 1354–1361. doi: 10.1109/CVPR.2012.6247821.
|
[4] |
YAN Rui, XIE Lingxi, TANG Jinhui, et al. HiGCIN: Hierarchical graph-based cross inference network for group activity recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 6955–6968. doi: 10.1109/Tpami.2020.3034233.
|
[5] |
ZHU Xiaolin, WANG Dongli, and ZHOU Yan. Hierarchical spatial-temporal transformer with motion trajectory for individual action and group activity recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodes Island, Greece, 2023: 1–5. doi: 10.1109/ICASSP49357.2023.10096109.
|
[6] |
HU Guyue, CUI Bo, HE Yuan, et al. Progressive relation learning for group activity recognition[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 977–986. doi: 10.1109/Cvpr42600.2020.00106.
|
[7] |
PRAMONO R R A, CHEN Y T, and FANG W H. Empowering relational network by self-attention augmented conditional random fields for group activity recognition[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 71–90. doi: 10.1007/978-3-030-58452-8_5.
|
[8] |
WANG Lukun, FENG Wancheng, TIAN Chunpeng, et al. 3D-unified spatial-temporal graph for group activity recognition[J]. Neurocomputing, 2023, 556: 126646. doi: 10.1016/j.neucom.2023.126646.
|
[9] |
KIPF T N and WELLING M. Semi-supervised classification with graph convolutional networks[C]. The 5th International Conference on Learning Representations, Toulon, France, 2017.
|
[10] |
曹毅, 吴伟官, 李平, 等. 基于时空特征增强图卷积网络的骨架行为识别[J]. 电子与信息学报, 2023, 45(8): 3022–3031. doi: 10.11999/JEIT220749.
CAO Yi, WU Weiguan, LI Ping, et al. Skeleton action recognition based on spatio-temporal feature enhanced graph convolutional network[J]. Journal of Electronics & Information Technology, 2023, 45(8): 3022–3031. doi: 10.11999/JEIT220749.
|
[11] |
DUAN Haodong, ZHAO Yue, CHEN Kai, et al. Revisiting skeleton-based action recognition[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 2959–2968. doi: 10.1109/Cvpr52688.2022.00298.
|
[12] |
FENG Yiqiang, SHAN Shimin, LIU Yu, et al. DRGCN: Deep relation gcn for group activity recognition[C]. 27th International Conference on Neural Information Processing, Bangkok, Thailand, 2020: 361–368. doi: 10.1007/978-3-030-63820-7.
|
[13] |
KUANG Zijian and TIE Xinran. IARG: Improved actor relation graph based group activity recognition[C]. The Third International Conference on Smart Multimedia, Marseille, France, 2020. doi: 10.1007/978-3-031-22061-6_3.
|
[14] |
AMER M R, LEI Peng, and TODOROVIC S. HiRF: Hierarchical random field for collective activity recognition in videos[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014, 8694: 572–585. doi: 10.1007/978-3-319-10599-4_37.
|
[15] |
BAGAUTDINOV T, ALAHI A, FLEURET F, et al. Social scene understanding: End-to-end multi-person action localization and collective activity recognition[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3425–3434. doi: 10.1109/Cvpr.2017.365.
|
[16] |
CHOI W, SHAHID K, and SAVARESE S. What are they doing?: Collective activity classification using spatio-temporal relationship among people[C]. 2009 IEEE 12th International Conference on Computer Vision Workshops, Kyoto, Japan, 2009: 1282–1289. doi: 10.1109/ICCVW.2009.5457461.
|
[17] |
QI Mengshi, QIN Jie, LI Annan, et al. stagNet: An attentive semantic RNN for group activity recognition[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018, 11214: 104–120. doi: 10.1007/978-3-030-01249-6_7.
|
[18] |
DEMIREL B and OZKAN H. DECOMPL: Decompositional learning with attention pooling for group activity recognition from a single volleyball image[EB/OL]. https://arxiv.org/abs/2303.06439 2023.
|
[19] |
DU Zexing, WANG Xue, and WANG Qing. Self-supervised global spatio-temporal interaction pre-training for group activity recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(9): 5076–5088. doi: 10.1109/Tcsvt.2023.3249906.
|
[20] |
LI Wei, YANG Tianzhao, WU Xiao, et al. Learning graph-based residual aggregation network for group activity recognition[C]. The Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, 2022: 1102–1108. doi: 10.24963/ijcai.2022/154.
|
[21] |
LIU Tianshan, ZHAO Rui, LAM K M, et al. Visual-semantic graph neural network with pose-position attentive learning for group activity recognition[J]. Neurocomputing, 2022, 491: 217–231. doi: 10.1016/j.neucom.2022.03.066.
|
[22] |
WU Lifang, LANG Xianglong, XIANG Ye, et al. Active spatial positions based hierarchical relation inference for group activity recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(6): 2839–2851. doi: 10.1109/Tcsvt.2022.3228731.
|
[23] |
WU Lifang, LANG Xixanglong, XIANG Ye, et al. Multi-perspective representation to part-based graph for group activity recognition[J]. Sensors, 2022, 22(15): 5521. doi: 10.3390/s22155521.
|
[24] |
YUAN Hangjie, NI Dong, and WANG Mang. Spatio-temporal dynamic inference network for group activity recognition[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 7456–7465. doi: 10.1109/Iccv48922.2021.00738.
|
[25] |
ZHOU Honglu, KADAV A, SHAMSIAN A, et al. COMPOSER: Compositional reasoning of group activity in videos with keypoint-only modality[C]. The 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 249–266. doi: 10.1007/978-3-031-19833-5_15.
|