Citation: | BAI Jing, YANG Zhanyuan, PENG Bin, LI Wenjing. Research on 3D Convolutional Neural Network and Its Application to Video Understanding[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2273-2283. doi: 10.11999/JEIT220596 |
[1] |
JI Shuiwang, XU Wei, YANG Ming, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221–231. doi: 10.1109/TPAMI.2012.59
|
[2] |
TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]. The IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 4489–4497.
|
[3] |
王磐, 强彦, 杨晓棠, 等. 基于双注意力3D-UNet的肺结节分割网络模型[J]. 计算机工程, 2021, 47(2): 307–313. doi: 10.19678/j.issn.1000-3428.0057019
WANG Pan, QIANG Yan, YANG Xiaotang, et al. Network model for lung nodule segmentation based on double attention 3D-UNet[J]. Computer Engineering, 2021, 47(2): 307–313. doi: 10.19678/j.issn.1000-3428.0057019
|
[4] |
颜铭靖, 苏喜友. 基于三维空洞卷积残差神经网络的高光谱影像分类方法[J]. 光学学报, 2020, 40(16): 1628002. doi: 10.3788/AOS202040.1628002
YAN Mingjing and SU Xiyou. Hyperspectral image classification based on three-dimensional dilated convolutional residual neural network[J]. Acta Optica Sinica, 2020, 40(16): 1628002. doi: 10.3788/AOS202040.1628002
|
[5] |
ALZUBAIDI L, ZHANG Jinglan, HUMAIDI A J, et al. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions[J]. Journal of Big Data, 2021, 8(1): 53. doi: 10.1186/s40537-021-00444-8
|
[6] |
KATTENBORN T, LEITLOFF J, SCHIEFER F, et al. Review on Convolutional Neural Networks (CNN) in vegetation remote sensing[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 173: 24–49. doi: 10.1016/j.isprsjprs.2020.12.010
|
[7] |
HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
|
[8] |
WU Peida, CUI Ziguan, GAN Zongliang, et al. Three-dimensional resNeXt network using feature fusion and label smoothing for hyperspectral image classification[J]. Sensors, 2020, 20(6): 1652. doi: 10.3390/s20061652
|
[9] |
HUANG Gao, LIU Zhuang, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2261–2269.
|
[10] |
冯雨, 易本顺, 吴晨玥, 等. 基于三维卷积神经网络的肺结节识别研究[J]. 光学学报, 2019, 39(6): 0615006. doi: 10.3788/AOS201939.0615006
FENG Yu, YI Benshun, WU Chenyue, et al. Pulmonary nodule recognition based on three-dimensional convolution neural network[J]. Acta Optica Sinica, 2019, 39(6): 0615006. doi: 10.3788/AOS201939.0615006
|
[11] |
段艳廷, 郑晓东, 胡莲莲, 等. 基于3D半密度卷积神经网络的断裂检测[J]. 地球物理学进展, 2019, 34(6): 2256–2261. doi: 10.6038/pg2019CC0367
DUAN Yanting, ZHENG Xiaodong, HU Lianlian, et al. Fault detection based on 3D semi-dense convolutional neural network[J]. Progress in Geophysics, 2019, 34(6): 2256–2261. doi: 10.6038/pg2019CC0367
|
[12] |
丰艳, 张甜甜, 王传旭. 基于伪3D残差网络与交互关系建模的群组行为识别方法[J]. 电子学报, 2020, 48(7): 1269–1275. doi: 10.3969/j.issn.0372-2112.2020.07.004
FENG Yan, ZHANG Tiantian, and WANG Chuanxu. Group activity recognition method based on pseudo 3D residual network and interaction modeling[J]. Acta Electronica Sinica, 2020, 48(7): 1269–1275. doi: 10.3969/j.issn.0372-2112.2020.07.004
|
[13] |
ZOLFAGHARI M, SINGH K, and BROX T. ECO: Efficient convolutional network for online video understanding[C]. The 15th European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 713–730.
|
[14] |
LU Changlei, LIU Bin, ZHOU Wenbo, et al. Deepfake video detection using 3D-attentional inception convolutional neural network[C]. 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, USA, 2021: 3572–3576.
|
[15] |
胡正平, 刁鹏成, 张瑞雪, 等. 3D多支路聚合轻量网络视频行为识别算法研究[J]. 电子学报, 2020, 48(7): 1261–1268. doi: 10.3969/j.issn.0372-2112.2020.07.003
HU Zhengping, DIAO Pengcheng, ZHANG Ruixue, et al. Research on 3D multi-branch aggregated lightweight network video action recognition algorithm[J]. Acta Electronica Sinica, 2020, 48(7): 1261–1268. doi: 10.3969/j.issn.0372-2112.2020.07.003
|
[16] |
MOLCHANOV P, YANG Xiaodong, GUPTA S, et al. Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural networks[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 4207–4215.
|
[17] |
刘良鑫, 林勉芬, 钟良泉, 等. 基于3D双流卷积神经网络的异常行为检测[J]. 计算机系统应用, 2021, 30(5): 120–127. doi: 10.15888/j.cnki.csa.007912
LIU Liangxin, LIN Mianfen, ZHONG Liangquan, et al. Two-stream inflated 3D CNN for abnormal behavior detection[J]. Computer Systems &Applications, 2021, 30(5): 120–127. doi: 10.15888/j.cnki.csa.007912
|
[18] |
HAN Yanling, WEI Cong, ZHOU Ruyan, et al. Combining 3D-CNN and squeeze-and-excitation networks for remote sensing sea ice image classification[J]. Mathematical Problems in Engineering, 2020, 2020: 8065396. doi: 10.1155/2020/8065396
|
[19] |
王飞, 胡荣林, 金鹰. 基于3D-CBAM注意力机制的人体动作识别[J]. 南京师范大学学报:工程技术版, 2021, 21(1): 49–56. doi: 10.3969/j.issn.1672-1292.2021.01.008
WANG Fei, HU Ronglin, and JIN Ying. Human action recognition based on 3D-CBAM attention mechanism[J]. Journal of Nanjing Normal University:Engineering and Technology Edition, 2021, 21(1): 49–56. doi: 10.3969/j.issn.1672-1292.2021.01.008
|
[20] |
XU Xuanang, ZHOU Fugen, and LIU Bo. Automatic bladder segmentation from CT images using deep CNN and 3D fully connected CRF-RNN[J]. International Journal of Computer Assisted Radiology and Surgery, 2018, 13(7): 967–975. doi: 10.1007/s11548-018-1733-7
|
[21] |
XIE Saining, SUN Chen, HUANG J, et al. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification[C]. The 15th European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 318–335.
|
[22] |
WANG Limin, LI Wei, LI Wen, et al. Appearance-and-relation networks for video classification[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2017: 1430–1439.
|
[23] |
LI Jiakun, WANG Tian, ZHOU Yi, et al. Using Gabor filter in 3D convolutional neural networks for human action recognition[C]. 2017 36th Chinese Control Conference (CCC), Dalian, China, 2017: 11139–11144.
|
[24] |
QIU Zhaofan, YAO Ting, and MEI Tao. Learning spatio-temporal representation with pseudo-3D residual networks[C]. The IEEE International Conference on Computer Vision, Venice, Italy, 2017: 5533–5541.
|
[25] |
CARREIRA J and ZISSERMAN A. Quo Vadis, action recognition? A new model and the kinetics dataset[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 4724–4733.
|
[26] |
YING Xinyi, WANG Longguang, WANG Yingqian, et al. Deformable 3D convolution for video super-resolution[J]. IEEE Signal Processing Letters, 2020, 27: 1500–1504. doi: 10.1109/LSP.2020.3013518
|
[27] |
阮宏洋, 陈志澜, 程英升, 等. C-3D可变形卷积神经网络模型的肺结节检测[J]. 激光与光电子学进展, 2020, 57(4): 041013. doi: 10.3788/LOP57.041013
RUAN Hongyang, CHEN Zhilan, CHENG Yingsheng, et al. Detection of pulmonary nodules based on C-3D deformable convolutional neural network model[J]. Laser &Optoelectronics Progress, 2020, 57(4): 041013. doi: 10.3788/LOP57.041013
|
[28] |
赵欣, 石德来, 王洪凯. 基于3D全卷积深度神经网络的脑白质病变分割方法[J]. 计算机与现代化, 2020(10): 44–50. doi: 10.3969/j.issn.1006-2475.2020.10.009
ZHAO Xin, SHI Delai, and WANG Hongkai. Segmentation of white matter lesions based on 3D full convolutional deep neural network[J]. Computer and Modernization, 2020(10): 44–50. doi: 10.3969/j.issn.1006-2475.2020.10.009
|
[29] |
陆小玲, 吴海锋, 曾玉, 等. 3D迁移网络的阿尔茨海默症分类研究[J]. 计算机工程与应用, 2021, 57(16): 253–262. doi: 10.3778/j.issn.1002-8331.2005-0141
LU Xiaoling, WU Haifeng, ZENG Yu, et al. 3D transfer learning network for classification of Alzheimer's disease[J]. Computer Engineering and Applications, 2021, 57(16): 253–262. doi: 10.3778/j.issn.1002-8331.2005-0141
|
[30] |
肖志云, 蒋家旭, 倪晨. 自适应深层残差3D-CNN高光谱图像快速分类算法[J]. 计算机辅助设计与图形学学报, 2019, 31(11): 2017–2029. doi: 10.3724/SP.J.1089.2019.17552
XIAO Zhiyun, JIANG Jiaxu, and NI Chen. Spectral-spatial classification of hyperspectral image based on self-adaptive deep residual 3D convolutional neural network[J]. Journal of Computer-Aided Design &Computer Graphics, 2019, 31(11): 2017–2029. doi: 10.3724/SP.J.1089.2019.17552
|
[31] |
STROUD J C, ROSS D A, SUN Chen, et al. D3D: Distilled 3D networks for video action recognition[C]. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass, USA, 2020: 614–623.
|
[32] |
SINGH D, KUMAR V, KAUR M, et al. Screening of COVID-19 suspected subjects using multi-crossover genetic algorithm based dense convolutional neural network[J]. IEEE Access, 2021, 9: 142566–142580. doi: 10.1109/ACCESS.2021.3120717
|
[33] |
ZHANG Yuxin, WANG Huan, LUO Yang, et al. Three-dimensional convolutional neural network pruning with regularization-based method[C]. 2019 IEEE International Conference on Image Processing (ICIP), Taipei, China, 2019: 4270–4274.
|
[34] |
SHI Jixi, CHEN Zhihao, and COUTURIER R. Classification of pathological cases of myocardial infarction using convolutional neural network and random forest[C]. 11th International Workshop on Statistical Atlases and Computational Models of the Heart, Lima, Peru, 2021: 406–413.
|
[35] |
SOOMRO K, ZAMIR A R, and SHAH M. UCF101: A dataset of 101 human actions classes from videos in the wild[EB/OL]. https://arxiv.org/abs/1212.0402, 2012.
|
[36] |
KUEHNE H, JHUANG H, GARROTE E, et al. HMDB: A large video database for human motion recognition[C]. 2011 International Conference on Computer Vision, Barcelona, Spain, 2011: 2556–2563.
|
[37] |
KAY W, CARREIRA J, SIMONYAN K, et al. The kinetics human action video dataset[EB/OL]. https://arxiv.org/abs/1705.06950, 2017.
|
[38] |
KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 1725–1732.
|
[39] |
TRAN D, RAY J, SHOU Zheng, et al. ConvNet architecture search for spatiotemporal feature learning[EB/OL]. https://arxiv.org/abs/1708.05038, 2017.
|
[40] |
ZONG Ming, WANG Ruili, CHEN Zhe, et al. Multi-cue based 3D residual network for action recognition[J]. Neural Computing and Applications, 2021, 33(10): 5167–5181. doi: 10.1007/s00521-020-05313-8
|
[41] |
TRAN D, WANG Heng, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 6450–6459.
|
[42] |
ZHAI Jiecheng, YAO Xunxiang, DONG Guangyuan, et al. 3D dual-stream convolutional neural networks with simple recurrent unit network: A new framework for action recognition[C]. 2022 4th International Conference on Communications, Information System and Computer Engineering (CISCE), Shenzhen, China, 2022: 509–515.
|
[43] |
JIANG Guanghao, JIANG Xiaoyan, FANG Zhijun, et al. An efficient attention module for 3D convolutional neural networks in action recognition[J]. Applied Intelligence, 2021, 51(10): 7043–7057. doi: 10.1007/s10489-021-02195-8
|
[44] |
KIM D H, ANVAROV F, LEE J M, et al. Metric-based attention feature learning for video action recognition[J]. IEEE Access, 2021, 9: 39218–39228. doi: 10.1109/ACCESS.2021.3064934
|
[45] |
WANG Xiaolong and GUPTA A. Unsupervised learning of visual representations using videos[C]. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015: 2794–2802.
|
[46] |
YANG Xiangli, SONG Zixing, KING I, et al. A survey on deep semi-supervised learning[EB/OL]. https://arxiv.org/abs/2103.00550, 2021.
|
[47] |
WANG Yaqing, YAO Quanming, KWOK J T, et al. Generalizing from a few examples: A survey on few-shot learning[J]. ACM Computing Surveys, 2021, 53(3): 63. doi: 10.1145/3386252
|
[48] |
HAN Zongyan, FU Zhenyong, CHEN Shuo, et al. Contrastive embedding for generalized zero-shot learning[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 2371–2381.
|