Multi-scale Semantic Information Fusion for Object Detection

Hongkun CHEN; Huilan LUO

doi:10.11999/JEIT200147

Volume 43 Issue 7

Jul. 2021

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2021 > 43(7): 2087-2095

Hongkun CHEN, Huilan LUO. Multi-scale Semantic Information Fusion for Object Detection[J]. Journal of Electronics & Information Technology, 2021, 43(7): 2087-2095. doi: 10.11999/JEIT200147

Citation:

Hongkun CHEN, Huilan LUO. Multi-scale Semantic Information Fusion for Object Detection[J]. Journal of Electronics & Information Technology, 2021, 43(7): 2087-2095. doi: 10.11999/JEIT200147

Hongkun CHEN, Huilan LUO. Multi-scale Semantic Information Fusion for Object Detection[J]. Journal of Electronics & Information Technology, 2021, 43(7): 2087-2095. doi: 10.11999/JEIT200147

Citation:

Hongkun CHEN, Huilan LUO. Multi-scale Semantic Information Fusion for Object Detection[J]. Journal of Electronics & Information Technology, 2021, 43(7): 2087-2095. doi: 10.11999/JEIT200147

PDF( 3433 KB)

Multi-scale Semantic Information Fusion for Object Detection

doi: 10.11999/JEIT200147

Hongkun CHEN,
Huilan LUO^,

School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China

Funds: The National Natural Science Foundation of China (61862031, 61462035), The Science and Technology Research Project of Jiangxi Provincial Department of Education (GJJ200859, GJJ200884), Ganzhou City, Jiangxi Province “Technology Innovation Talent Program” Project

Received Date: 2020-03-03
Rev Recd Date: 2020-11-27

Available Online: 2020-12-07

Publish Date: 2021-07-10

Abstract

Abstract

Current object detection algorithms have poor detection results on small targets and dense targets. To address this challenge, a Shallow Enhanced Feature Network (SEFN) is proposed in this paper, which is based on the fusion of multiple features and enhanced shallow feature characterization capabilities. Firstly, the features extracted from the Conv4_3 layer and Conv5_3 layer are combined to form basic fusion features. Then the basic fusion features are inputted into a small multi-scale semantic information fusion module to obtain semantic features of rich contextual information and spatial detail information. The semantic features are fused into the basics features by the feature reuse module to obtain shallow enhanced features. Finally, a series of convolutions are performed based on the shallow enhanced features to obtain multiple features with different scales. Multiple detection branches are then constructed based on the features of different scales. The non-maximum suppression algorithm is used to achieve the final detection. The average accuracy of the proposed model is 81.2% and 33.7% on the PASCAL VOC2007 and MS COCO2014 datasets respectively, which is 2.7% and 4.9% higher than the classic Single Shot multibox Detector (SSD) algorithm. In addition, on detecting small targets in dense target scenes, the detection accuracy and recall rate of the proposed method are significantly improved. The experimental results show that the feature pyramid structure can enhance the semantic information of shallow features, and the feature reuse module can effectively retain shallow detail information for detection, so the proposed method can get better detection performance on small targets and dense targets.
- Object detection,
- Feature pyramid,
- Feature fusion,
- Channel attention,
- Single Shot multibox Detector (SSD) model

FullText(HTML)

References(19)

References

[1]	LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]. The 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 21–37.
[2]	罗会兰, 卢飞, 孔繁胜. 基于区域与深度残差网络的图像语义分割[J]. 电子与信息学报, 2019, 41(11): 2777–2786. doi: 10.11999/JEIT190056 LUO Huilan, LU Fei, and KONG Fansheng. Image semantic segmentation based on region and deep residual network[J]. Journal of Electronics &Information Technology, 2019, 41(11): 2777–2786. doi: 10.11999/JEIT190056
[3]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 936–944.
[4]	FU Chengyang, LIU Wei, RANGA A, et al. DSSD: Deconvolutional single shot detector[EB/OL]. http://arxiv.org/abs/1701.06659, 2017.
[5]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
[6]	LI Zuoxin and ZHOU Fuqiang. FSSD: Feature fusion single shot multibox detector[EB/OL]. https://arxiv.org/abs/1712.00960, 2017.
[7]	LIU Songtao, HUANG Di, and WANG Yunhong. Receptive field block net for accurate and fast object detection[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 404–419.
[8]	EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The PASCAL Visual Object Classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338. doi: 10.1007/s11263-009-0275-4
[9]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]. 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 740–755.
[10]	LI Hanchao, XIONG Pengfei, AN Jie, et al. Pyramid attention network for semantic segmentation[C]. British Machine Vision Conference, Newcastle, UK, 2018.
[11]	罗会兰, 卢飞, 严源. 跨层融合与多模型投票的动作识别[J]. 电子与信息学报, 2019, 41(3): 649–655. doi: 10.11999/JEIT180373 LUO Huilan, LU Fei, and YAN Yuan. Action recognition based on multi-model voting with cross layer fusion[J]. Journal of Electronics &Information Technology, 2019, 41(3): 649–655. doi: 10.11999/JEIT180373
[12]	DAI Jifeng, LI Yi, HE Kaiming, et al. R-FCN: Object detection via region-based fully convolutional networks[C]. The 30th International Conference on Neural Information Processing Systems, Barcelona, SPAIN, 2016: 379–387.
[13]	JEONG J, PARK H, and KWAK N. Enhancement of SSD by concatenating feature maps for object detection[C]. British Machine Vision Conference, London, UK, 2017.
[14]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 779–788.
[15]	REDMON J and FARHADI A. YOLO9000: Better, faster, stronger[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6517–6525.
[16]	REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031
[17]	KONG Tao, YAO Anbang, CHEN Yurong, et al. HyperNet: Towards accurate region proposal generation and joint object detection[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 845–853.
[18]	SHRIVASTAVA A, GUPTA A, and GIRSHICK R. Training region-based object detectors with online hard example mining[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016.
[19]	BELL S, ZITNICK C L, BALA K, et al. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, 2874–2883.