Citation: | SHI Yuexiang, ZHU Maoqing. Collaborative Convolutional Transformer Network Based on Skeleton Action Recognition[J]. Journal of Electronics & Information Technology, 2023, 45(4): 1485-1493. doi: 10.11999/JEIT220270 |
[1] |
石跃祥, 周玥. 基于阶梯型特征空间分割与局部注意力机制的行人重识别[J]. 电子与信息学报, 2022, 44(1): 195–202. doi: 10.11999/JEIT201006
SHI Yuexiang and ZHOU Yue. Person re-identification based on stepped feature space segmentation and local attention mechanism[J]. Journal of Electronics &Information Technology, 2022, 44(1): 195–202. doi: 10.11999/JEIT201006
|
[2] |
NIEPERT M, AHMED M, and KUTZKOV K. Learning convolutional neural networks for graphs[C]. The 33rd International Conference on International Conference on Machine Learning, New York, USA, 2016: 2014–2023.
|
[3] |
YAN Sijie, XIONG Yuanjun, and LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. The Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, USA, 2018: 912.
|
[4] |
CHENG Ke, ZHANG Yifan, HE Xiangyu, et al. Skeleton-based action recognition with shift graph convolutional network[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 180–189.
|
[5] |
LIU Ziyu, ZHANG Hongwen, CHEN Zhenghao, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 140–149.
|
[6] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010.
|
[7] |
MEINHARDT T, KIRILLOV A, LEAL-TAIXE L, et al. TrackFormer: Multi-object tracking with transformers[J]. arXiv: 2101.02702, 2021.
|
[8] |
SUN Peize, CAO Jinkun, JIANG Yi, et al. TransTrack: Multiple object tracking with transformer[J]. arXiv: 2012.15460, 2020.
|
[9] |
ZHENG CE, ZHU Sijie, MENDIETA M, et al. 3D human pose estimation with spatial and temporal transformers[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 11636–11645.
|
[10] |
CHU Peng, WANG Jiang, YOU Quanzeng, et al. TransMOT: Spatial-temporal graph transformer for multiple object tracking[J]. arXiv: 2104.00194, 2021.
|
[11] |
FERNANDO B, GAVVES E, ORAMAS J M, et al. Modeling video evolution for action recognition[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 5378–5387.
|
[12] |
LI Shuai, LI Wanqing, COOK C, et al. Independently Recurrent Neural Network (IndRNN): Building a longer and deeper RNN[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 5457–5466.
|
[13] |
LI Chao, ZHONG Qiaoyong, XIE Di, et al. Skeleton-based action recognition with convolutional neural networks[C]. 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, China, 2017: 597–600.
|
[14] |
ZHANG Pengfei, LAN Cuiling, ZENG Wenjun, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 1109–1118.
|
[15] |
ZHANG Xikun, XU Chang, and TAO Dacheng. Context aware graph convolution for skeleton-based action recognition[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 14321–14330.
|
[16] |
曾胜强, 李琳. 基于姿态校正与姿态融合的2D/3D骨架动作识别方法[J]. 计算机应用研究, 2022, 39(3): 900–905. doi: 10.19734/j.issn.1001-3695.2021.07.0286
ZENG Shengqiang and LI Lin. 2D/3D skeleton action recognition based on posture transformation and posture fusion[J]. Application Research of Computers, 2022, 39(3): 900–905. doi: 10.19734/j.issn.1001-3695.2021.07.0286
|
[17] |
李扬志, 袁家政, 刘宏哲. 基于时空注意力图卷积网络模型的人体骨架动作识别算法[J]. 计算机应用, 2021, 41(7): 1915–1921. doi: 10.11772/j.issn.1001-9081.2020091515
LI Yangzhi, YUAN Jiazheng, and LIU Hongzhe. Human skeleton-based action recognition algorithm based on spatiotemporal attention graph convolutional network model[J]. Journal of Computer Applications, 2021, 41(7): 1915–1921. doi: 10.11772/j.issn.1001-9081.2020091515
|
[18] |
LI Maosen, CHEN Siheng, CHEN Xu, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 3590–3598.
|
[19] |
SHI Lei, ZHANG Yifan, CHENG Jian, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 12018–12027.
|
[20] |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C/OL]. The 9th International Conference on Learning Representations, 2021.
|
[21] |
TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[L/OL]. The 38th International Conference on Machine Learning, 2021: 10347–10357.
|
[22] |
RAMACHANDRAN P, PARMAR N, VASWANI A, et al. Stand-alone self-attention in vision models[C]. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 7.
|
[23] |
SHARIR G, NOY A, and ZELNIK-MANOR L. An image is worth 16x16 words, what is a video worth?[J]. arXiv: 2103.13915, 2021.
|
[24] |
PLIZZARI C, CANNICI M, and MATTEUCCI M. Spatial temporal transformer network for skeleton-based action recognition[C]. International Conference on Pattern Recognition, Milano, Italy, 2021: 694–701.
|
[25] |
BA J L, KIROS J R, and HINTON G E. Layer normalization[J]. arXiv: 1607.06450, 2016.
|
[26] |
SHAHROUDY A, LIU Jun, NG T T, et al. NTU RGB+D: A large scale dataset for 3D human activity analysis[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 1010–1019.
|
[27] |
KAY W, CARREIRA J, SIMONYAN K, et al. The kinetics human action video dataset[J]. arXiv: 1705.06950, 2017.
|
[28] |
CHO S, MAQBOOL M H, LIU Fei, et al. Self-attention network for skeleton-based human action recognition[C]. 2020 IEEE Winter Conference on Applications of Computer Vision, Snowmass, USA, 2020: 624–633.
|
[29] |
TANG Yansong, TIAN Yi, LU Jiwen, et al. Deep progressive reinforcement learning for skeleton-based action recognition[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 5323–5332.
|
[30] |
LI Chao, ZHONG Qiaoyong, XIE Di, et al. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[C]. The Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: 786–792.
|