基于双分支特征融合的无锚框目标检测算法

侯志强; 郭浩; 马素刚; 程环环; 白玉; 范九伦

doi:10.11999/JEIT210344

基于双分支特征融合的无锚框目标检测算法

doi: 10.11999/JEIT210344

侯志强^{1, 2},
郭浩^{1, 2, ,},
马素刚^{1, 2},
程环环^{1, 2},
白玉^{1, 2},
范九伦¹

1.
西安邮电大学计算机学院西安 710121
2.
陕西省网络数据分析与智能处理重点实验室西安 710121

基金项目: 国家自然科学基金(62072370)

详细信息

作者简介:
侯志强：男，1973年生，教授，博士生导师，研究方向为图像处理、计算机视觉

郭浩：男，1997年生，硕士生，研究方向为计算机视觉、目标检测和深度学习

马素刚：男，1982年生，博士，研究方向为计算机视觉、机器学习

通讯作者:
郭浩　HaoGuo@stu.xupt.edu.cn

中图分类号: TN911.73; TP391.4
计量
- 文章访问数: 1179
- HTML全文浏览量: 484
- PDF下载量: 144
- 被引次数: 0
出版历程
- 收稿日期: 2021-04-23
- 修回日期: 2021-12-19
- 录用日期: 2022-01-12
- 网络出版日期: 2022-02-02
- 刊出日期: 2022-06-21

Anchor-free Object Detection Algorithm Based on Double Branch Feature Fusion

HOU Zhiqiang^{1, 2},
GUO Hao^{1, 2
, ,},
MA Sugang^{1, 2},
CHENG Huanhuan^{1, 2},
BAI Yu^{1, 2},
FAN Jiulun¹

1.
Institute of Computer, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
2.
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi’an 710121, China

Funds: The National Natural Science Foundation of China (62072370)

摘要

摘要: 针对无锚框目标检测算法CenterNet中，目标特征利用程度不高、检测结果不够准确的问题，该文提出一种双分支特征融合的改进算法。在算法中，一个分支包含了特征金字塔增强模块和特征融合模块，以对主干网络输出的多层特征进行融合处理。同时，为利用更多的高级语义信息，在另一个分支中仅对主干网络的最后一层特征进行上采样。其次，对主干网络添加了基于频率的通道注意力机制，以增强特征提取能力。最后，采用拼接和卷积操作对两个分支的特征进行融合。实验结果表明，在公开数据集PASCAL VOC上的检测精度为82.3%，比CenterNet算法提高了3.6%，在KITTI数据集上精度领先其6%，检测速度均满足实时性要求。该文提出的双分支特征融合方法将不同层的特征进行处理，更好地利用浅层特征中的空间信息和深层特征中的语义信息，提升了算法的检测性能。
- 目标检测 /
- 多特征融合 /
- 注意力机制 /
- CenterNet
Abstract: Focusing on the problem of low utilization of object features and inaccurate detection results in CenterNet, an improved algorithm of double branch feature fusion is proposed in the paper. One branch of the algorithm includes feature pyramid enhancement module and feature fusion module to fuse the multi-layer features output from the backbone network. At the same time, in order to use more high-level semantic information, only the last layer of the backbone network is upsampled in the other branch. Secondly, a frequency-based channel attention mechanism is added to the backbone network to enhance feature extraction capability. Finally, the features of the two branches are concatenated and convoluted. The experimental results show that the detection accuracy on PASCAL VOC dataset is 82.3%, which is 3.6% higher than CenterNet, and the detection accuracy on KITTI dataset is 6% higher than CenterNet. The detection speed meets the real-time requirements. The double branch feature fusion method is proposed to process the features of different layers, which makes better use of the spatial information of shallow features and the semantic information of deep features, and improves the detection performance of the algorithm.
- Object detection /
- Multi-feature fusion /
- Attention mechanism /
- CenterNet

HTML全文

图 1 CenterNet算法结构图

下载: 全尺寸图片幻灯片

图 2 特征金字塔增强模块FPEM

下载: 全尺寸图片幻灯片

图 3 特征融合模块FFM

下载: 全尺寸图片幻灯片

图 4 基于DCT频率域的通道注意力机制

下载: 全尺寸图片幻灯片

图 5 通道削减模块RCM结构图

下载: 全尺寸图片幻灯片

图 6 DB-CenterNet算法结构图

下载: 全尺寸图片幻灯片

图 7 不同编码网络的检测性能

下载: 全尺寸图片幻灯片

图 8 PASCAL VOC上的实验对比结果

下载: 全尺寸图片幻灯片

图 9 KITTI上的实验对比结果

下载: 全尺寸图片幻灯片

表 1 不同特征增强次数的消融实验结果(%)

增强次数	PASCAL VOC		KITTI
增强次数	ResNet-101	Res101-FcaNet	ResNet-101	Res101-FcaNet
1	81.57	82.02	73.97	76.52
2	81.63	81.93	73.95	76.48
3	81.66	82.28	74.09	76.45
4	81.59	82.08	74.03	76.45
5	81.87	82.31	74.04	76.53
6	81.66	82.22	74.07	76.49
7	81.67	82.27	74.03	76.51

下载: 导出CSV

表 2 PASCAL VOC2007和KITTI数据集的消融实验结果(%)

ResNet-101	上采样分支	多特征融合分支	FcaNet	mAP (VOC)	mAP (KITTI)
$ \surd $	$ \surd $			78.7	70.5
$ \surd $		$ \surd $		80.7	73.7
$ \surd $	$ \surd $	$ \surd $		81.8	74.0
$ \surd $	$ \surd $	$ \surd $	$ \surd $	82.3	76.5

下载: 导出CSV

表 3 PASCAL VOC2007数据集测试结果

Algorithm	Network	Resolution(ppi)	mAP(%)	fps
Faster-RCNN(2015)	ResNet-101	600×1000	76.4	5
SSD(2016)	VGG-16	512×512	76.8	19
R-FCN(2016)	ResNet-101	600×1000	80.5	9
DSSD(2017)	ResNet-101	513×513	81.5	5.5
Yolov3(2018)	DarkNet-53	544×544	79.3	26
ExtremeNet(2019)	Hourglass-104	512×512	79.5	3
FCOS(2019)	ResNet-101	800×800	80.2	16
CenterNet(2019)	ResNet-18	512×512	75.7	100
CenterNet(2019)	ResNet-101	512×512	78.7	30
CenterNet(2019)	DLA-34	512×512	80.7	33
CenterNet(2019)	Hourglass-104	512×512	80.9	6
CenterNet-DHRNet(2020)	DHRNet	512×512	81.9	18
本文	Res101-FcaNet	512×512	82.3	27.6

下载: 导出CSV

表 4 本文算法和其他算法在PASCAL VOC2007数据集上各类的检测结果(%)

Class	本文	CenterNet-DLA	Faster R-CNN	Mask R-CNN	R-FCN	SSD512
aero	88.7	85.0	76.5	73.7	74.5	70.2
bike	87.8	86.0	79.0	84.4	87.2	84.7
bird	85.0	81.4	70.9	78.5	81.5	78.4
boat	73.8	72.8	65.5	70.8	72.0	73.8
bottle	73.9	68.4	52.1	68.5	69.8	53.2
bus	88.5	86.0	83.1	88.0	86.8	86.2
car	88.4	88.4	84.7	85.9	88.5	87.5
cat	88.5	86.5	86.4	87.8	89.8	86.0
chair	66.7	65.0	52.0	60.3	67.0	57.8
cow	87.1	86.3	81.9	85.2	88.1	83.1
table	75.0	77.6	65.7	73.7	74.5	70.2
dog	88.1	85.2	84.8	87.2	89.8	84.9
horse	89.4	87.0	84.6	86.5	90.6	85.2
mbike	85.7	86.1	77.5	85.0	79.9	83.9
person	83.8	85.0	76.7	76.4	81.2	79.7
plant	58.2	58.1	38.8	48.5	53.7	50.3
sheep	88.3	83.4	73.6	76.3	81.8	77.9
sofa	76.5	79.6	73.9	75.5	81.5	73.9
train	88.0	85.0	83.0	85.0	85.9	82.5
tv	84.8	80.3	72.6	81.0	79.9	75.3

下载: 导出CSV

表 5 KITTI数据集上综合的检测结果

Algorithm	Network	Resolution(ppi)	Pedestrian(%)	Car(%)	Cyclist(%)	mAP(%)	fps
SSD(2016)	VGG-16	512×512	48.0	85.1	50.6	61.2	28.9
RFBNet(2018)	VGG-16	512×512	61.7	86.4	72.2	73.4	39
SqueezeDet(2017)	ResNet-50	1242×375	61.5	86.7	80.0	76.1	22.5
Yolov3(2018)	DarkNet-53	544×544	65.8	88.7	73.1	75.8	26
FCOS(2019)	ResNet-101	800×800	69.5	89.3	70.1	76.3	19
CenterNet(2019)	ResNet-18	512×512	50.6	80.8	59.5	63.6	100
CenterNet(2019)	ResNet-101	512×512	60.5	81.3	69.7	70.5	36
CenterNet(2019)	DLA-34	512×512	62.3	85.6	73.8	73.9	33
CenterNet(2019)	Hourglass-104	512×512	65.5	86.1	73.2	74.9	6
本文	Res101-FcaNet	512×512	62.0	88.9	78.7	76.5	27.6

下载: 导出CSV

参考文献(13)

[1]	孙怡峰, 吴疆, 黄严严, 等. 一种视频监控中基于航迹的运动小目标检测算法[J]. 电子与信息学报, 2019, 41(11): 2744–2751. doi: 10.11999/JEIT181110 SUN Yifeng, WU Jiang, HUANG Yanyan, et al. A small moving object detection algorithm based on track in video surveillance[J]. Journal of Electronics &Information Technology, 2019, 41(11): 2744–2751. doi: 10.11999/JEIT181110
[2]	REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031
[3]	LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]. The 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 21–37.
[4]	LAW H and DENG Jie. CornerNet: Detecting objects as paired keypoints[C]. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 2018: 765–781.
[5]	ZHOU Xingyi, ZHUO Jiacheng, and KRÄHENBÜHL P. Bottom-up object detection by grouping extreme and center points[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 850–859.
[6]	TIAN Zhi, SHEN Chunhua, CHEN Hao, et al. FCOS: Fully convolutional one-stage object detection[C]. The IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 9626–9635.
[7]	ZHOU Xingyi, WANG Dequan, and KRÄHENBÜHL P. Objects as points[EB/OL]. https://arxiv.org/abs/1904.07850, 2019.
[8]	WANG Wenhai, XIE Enze, SONG Xiaoge, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]. The IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 8439–8448.
[9]	HOWARD A G, ZHU Menglong, CHEN Bo, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. https://arxiv.org/abs/1704.04861, 2021.
[10]	QIN Zequn, ZHANG Pengyi, WU Fei, et al. FcaNet: Frequency channel attention networks[EB/OL]. https://arxiv.org/abs/2012.11879v4, 2020.
[11]	王新, 李喆, 张宏立. 一种迭代聚合的高分辨率网络Anchor-free目标检测方法[J]. 北京航空航天大学学报, 2021, 47(12): 2533–2541. doi: 10.13700/j.bh.1001-5965.2020.0484 WANG Xin, LI Zhe, and ZHANG Hongli. High-resolution network Anchor-free object detection method based on iterative aggregation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(12): 2533–2541. doi: 10.13700/j.bh.1001-5965.2020.0484
[12]	LIU Songtao, HUANG Di, and WANG Yunhong. Receptive field block net for accurate and fast object detection[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 404–419.
[13]	WU Bichen, WAN A, IANDOLA F, et al. Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving[C]. The IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, USA, 2017: 446–454.