基于关键特征信息感知和在线自适应掩模的孪生网络目标跟踪

何志伟; 聂佳浩; 杜晨杰; 高明煜; 董哲康

doi:10.11999/JEIT210296

基于关键特征信息感知和在线自适应掩模的孪生网络目标跟踪

doi: 10.11999/JEIT210296

何志伟^{1, 2},
聂佳浩^{1, 2},
杜晨杰^{1, 2, ,},
高明煜^{1, 2},
董哲康^{1, 3}

1.
杭州电子科技大学电子信息学院杭州 310018
2.
浙江省装备电子研究重点实验室杭州 310018
3.
香港理工大学电机工程系香港 999077

基金项目: 国家自然科学基金(61571394)，浙江省重点研发项目(2020C03098)

详细信息

作者简介:
何志伟：男，1979年生，教授，博士生导师，研究方向为计算机视觉、汽车电子技术和电池管理技术

聂佳浩：男，1998年生，硕士生，研究方向为计算机视觉、目标跟踪

杜晨杰：男，1991年生，博士生，研究方向为计算机视觉、目标跟踪

高明煜：男，1963年生，教授，博士生导师，研究方向为汽车电子技术和嵌入式系统应用

董哲康：男，1989年生，副教授，硕士生导师，研究方向为深度学习和神经形态系统

通讯作者:
杜晨杰　ducj@hdu.edu.cn

中图分类号: TN911.73; TP391.41
计量
- 文章访问数: 872
- HTML全文浏览量: 781
- PDF下载量: 111
- 被引次数: 31
出版历程
- 收稿日期: 2021-04-13
- 修回日期: 2021-11-02
- 录用日期: 2021-11-02
- 网络出版日期: 2021-12-22
- 刊出日期: 2022-05-25

Siamese Object Tracking Based on Key Feature Information Perception and Online Adaptive Masking

HE Zhiwei^{1, 2},
NIE Jiahao^{1, 2},
DU Chenjie^{1, 2
, ,},
GAO Mingyu^{1, 2},
DONG Zhekang^{1, 3}

1.
School of Electronics Information, Hangzhou Dianzi University, Hangzhou 310018, China
2.
Zhejiang Provincial Key Laboratory of Equipment Electronics, Hangzhou 310018, China
3.
Department of Electrical Engineering, The Hong Kong Polytechnic University, HongKong 999077, China

Funds: The National Natural Science Foundation of China (61571394), The Key R&D Program of Zhejiang Province (2020C03098)

摘要

摘要: 近年来，孪生网络在视觉目标跟踪的应用给跟踪器性能带来了极大的提升，可以同时兼顾准确率和实时性。然而，孪生网络跟踪器的准确率在很大程度上受到限制。为了解决上述问题，该文基于通道注意力机制，创新地提出了关键特征信息感知模块来增强网络模型的判别能力，使网络聚焦于目标的卷积特征变化；在此基础上，该文还提出了一种在线自适应掩模策略，根据在线学习到的互相关层输出状态，自适应掩模后续帧，以此来突出前景目标。在OTB100, GOT-10k数据集上进行实验验证，所提跟踪器在不影响实时性的前提下，准确率相较于基准有了显著提升，并且在遮挡、尺度变化以及背景杂乱等复杂场景下具有鲁棒的跟踪效果。
- 目标跟踪 /
- 孪生网络 /
- 关键特征信息感知 /
- 自适应掩模
Abstract: The application of Siamese network to visual object tracking has greatly improved the performance of the tracker recently, which can take both accuracy and speed into account. However, the accuracy of Siamese network tracker is limited to a great extent. In order to solve the above problems, a key information feature perception module based on channel attention mechanism to enhance the discrimination ability of the network model is proposed, which make the network focus on the convolution feature changes of the target; On this basis, an online adaptive masking strategy is proposed, which adaptively masks the subsequent frames according to the output state of the cross-correlation layer learned online, so as to highlight the foreground target. Experiments on OTB100 and GOT-10k datasets show that without affecting the real-time performance, the proposed tracker has a significant improvement in accuracy compared with the benchmark, and has a robust tracking effect in complex scenes such as occlusion, scale change and background clutter.
- Object tracking /
- Siamese network /
- Key feature information perception /
- Adaptive mask

HTML全文

图 1 本文跟踪器框图

下载: 全尺寸图片幻灯片

图 2 关键特征信息感知模块结构

下载: 全尺寸图片幻灯片

图 3 互相关层输出的特征可视化

下载: 全尺寸图片幻灯片

图 4 在线自适应掩模示意图

下载: 全尺寸图片幻灯片

图 5 10种算法在OTB100数据集上的跟踪性能对比

下载: 全尺寸图片幻灯片

图 6 10种算法在GOT-10k数据集上的跟踪性能对比

下载: 全尺寸图片幻灯片

图 7 5种算法的跟踪实例对比

下载: 全尺寸图片幻灯片

表 1 训练模型在OTB100上的AUC性能(%)

SiamFC	Model1	Model2	Model3	Model4	Model5
58.4	58.7	60.9	60.1	57.4	60.3

下载: 导出CSV

表 2 两种不同关键信息感知结构对比

算法	参数量(MB)	运算量(Byte)	准确率(%)	fps
SiamFC	3.30	0.69742	58.4	131
SiamFC(结构1)	3.32	0.69748	62.4	114
SiamFC(结构2)	3.36	0.69759	62.8	101
SiamFC-DW	2.45	0.80825	62.7	154
SiamFC-DW(结构1)	2.49	0.80841	64.3	147
SiamFC-DW(结构2)	2.55	0.80872	65.2	132

下载: 导出CSV

表 3 在线自适应掩模

输入：后续帧搜索图像f (255×255×3)
输出：掩模后的图像 ${f^ * }$
(1) 通过式(5)初始化前i帧搜索图像 ${\sigma _{{x_k}}}$ 和 ${\sigma _{{y_k}}}$ ，其中k=1,2,···,i;
(2) 根据响应图(Response map)计算前i帧的平均峰值相关能量　　 (APCE)；
(3) 由式(8)得到APCE₁, APCE₂, ···, APCE_i；
(4) 通过当前帧的前i帧APCE的平均值和APCE_i+1，计算得到　　 ε_div；
(5) if (ε_div大于或等于 ${\tau _1}$ )
(6) 　/减弱高斯掩模程度/
${\sigma _{{x_{i + 1}}}}$ = ${\sigma _{{x_i}}}(1 + \mu )$ ， ${\sigma _{{y_{i + 1}}}}$ = ${\sigma _{{y_i}}}(1 + \mu )$ ；
(7) else if (ε_div小于或等于 ${\tau _2}$ )
(8) 　 /增强高斯掩模程度/
${\sigma _{{x_{i + 1}}}}$ = ${\sigma _{{x_i}}}(1 - \mu )$ ， ${\sigma _{{y_{i + 1}}}}$ = ${\sigma _{{y_i}}}(1 - \mu )$ ；
(9) else
(10) 　/高斯掩模程度保持不变/
${\sigma _{{x_{i + 1}}}}$ = ${\sigma _{{x_{i + 1}}}}$ ， ${\sigma _{{y_{i + 1}}}}$ = ${\sigma _{{y_{i + 1}}}}$ ；
(11) 计算 $f_{i + 1}^ $ = $f_{i + 1}^ \cdot G({x_{i + 1}},{y_{i + 1}})$ 得到第i+1帧高斯掩模后的　　　搜索图像；
(12) 重复执行步骤(1)—步骤(11)得到后续每帧经过高斯掩模后　　　的搜索图像 ${f^ * }$ 。

下载: 导出CSV

表 4 10种算法在OTB数据集上不同场景的AUC定量对比

场景	视频数	KCF	Staple	SRDCF	DeepSRDCF	CFNet	SINT	SiamFC-DW	SiamRPN	SiamFC	本文
BC	31	0.476	0.560	0.583	0.627	0.561	0.584	0.574	0.591	0.527	0.611
IV	37	0.510	0.592	0.613	0.621	0.541	0.649	0.632	0.649	0.572	0.643
SV	63	0.433	0.521	0.561	0.605	0.546	0.610	0.613	0.615	0.556	0.614
OCC	48	0.460	0.543	0.559	0.601	0.527	0.595	0.601	0.585	0.549	0.611
DEF	43	0.389	0.551	0.544	0.566	0.526	0.572	0.560	0.617	0.512	0.597
MB	29	0.427	0.541	0.594	0.625	0.540	0.634	0.654	0.622	0.554	0.642
FM	39	0.417	0.540	0.597	0.628	0.554	0.616	0.630	0.599	0.571	0.621
IPR	51	0.460	0.548	0.544	0.589	0.567	0.616	0.606	0.627	0.559	0.628
OPR	63	0.435	0.533	0.550	0.607	0.553	0.621	0.612	0.625	0.561	0.628
OV	14	0.352	0.475	0.460	0.553	0.454	0.572	0.590	0.542	0.475	0.587
LR	9	0.348	0.394	0.514	0.561	0.614	0.622	0.596	0.639	0.618	0.636
OTB100	100	0.471	0.578	0.598	0.635	0.587	0.625	0.627	0.629	0.584	0.639
注：加粗字体为每行最优，斜体为次优。

下载: 导出CSV

表 5 10种算法在GOT-10k数据集上性能对比

算法	AO	SR_0.5	SR_0.75
KCF	0.203	0.177	0.076
Staple	0.246	0.239	0.089
SRDCF	0.236	0.227	0.094
DeepSRDCF	0.286	0.275	0.096
CFNet	0.293	0.265	0.087
SINT	0.347	0.375	0.124
SiamFC-DW	0.384	0.401	0.118
SiamRPN	0.367	0.424	0.102
SiamFC	0.326	0.353	0.098
本文	0.411	0.492	0.175
注：加粗字体为每列最优，斜体为次优。

下载: 导出CSV

表 6 本文算法在OTB100上的消融实验结果

算法	创新1	创新2	曲线下面积	精度	帧数
SiamFC	Ο	Ο	0.584	0.772	131
SiamFCv1	Π	Ο	0.624	0.831	114
SiamFCv2	Ο	Π	0.606	0.818	127
本文	Π	Π	0.639	0.861	111

下载: 导出CSV

参考文献(23)

[1]	谭建豪, 殷旺, 刘力铭, 等. 引入全局上下文特征模块的DenseNet孪生网络目标跟踪[J]. 电子与信息学报, 2021, 43(1): 179–186. doi: 10.11999/JEIT190788 TAN Jianhao, YIN Wang, LIU Liming, et al. DenseNet-siamese network with global context feature module for object tracking[J]. Journal of Electronics &Information Technology, 2021, 43(1): 179–186. doi: 10.11999/JEIT190788
[2]	KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking VOT2018 challenge results[C]. 2018 European Conference on Computer Vision, Munich, Germany, 2019: 3–53.
[3]	李玺, 查宇飞, 张天柱, 等. 深度学习的目标跟踪算法综述[J]. 中国图象图形学报, 2019, 24(12): 2057–2080. doi: 10.11834/jig.190372 LI Xi, CHA Yufei, ZHANG Tianzhu, et al. Survey of visual tracking algorithms based on deep learning[J]. Journal of Image and Graphics, 2019, 24(12): 2057–2080. doi: 10.11834/jig.190372
[4]	HENRIQUE J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596. doi: 10.1109/tpami.2014.2345390
[5]	BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: Complementary learners for real-time tracking[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 1401–1409.
[6]	DANELLJAN M, HÄGER G, KHAN F S K, et al. Learning spatially regularized correlation filters for visual tracking[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 4310–4318.
[7]	DANELLJAN M, HÄGER G, KHAN F S, et al. Convolutional features for correlation filter based visual tracking[C]. 2015 IEEE International Conference on Computer Vision Workshop, Santiago, Chile, 2015: 621–629.
[8]	CHOPRA S, HADSELL R, and LECUN Y. Learning a similarity metric discriminatively, with application to face verification[C]. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005: 539–546.
[9]	TAO Ran, GAVVES E, and SMEULDERS A W M. Siamese instance search for tracking[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 1420–1429.
[10]	BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]. 2016 European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 850–865.
[11]	VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 5000–5008.
[12]	LI Bo, YAN Junjie, WU Wei, et al. High performance visual tracking with Siamese region proposal network[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8971–8980.
[13]	REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J] IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149.
[14]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916. doi: 10.1109/TPAMI.2015.2389824
[15]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
[16]	SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9.
[17]	LI Bo, WU Wei, WANG Qiang, et al. SiamRPN++: Evolution of Siamese visual tracking with very deep networks[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 4277–4286.
[18]	ZHANG Zhipeng and PENG Houwen. Deeper and wider Siamese networks for real-time visual tracking[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 4586–4595.
[19]	DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: Accurate tracking by overlap maximization[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 4655–4664.
[20]	WU Yi, LIM J, and YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834–1848. doi: 10.1109/TPAMI.2014.2388226
[21]	HUANG Lianghua, ZHAO Xin, and HUANG Kaiqi. GOT-10k: A large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562–1577. doi: 10.1109/TPAMI.2019.2957464
[22]	HU Jie, SHEN Li, ALBANIE S, et al. Squeeze-and-excitation networks[J] IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011–2023.
[23]	WANG Mengmeng, LIU Yong, and HUANG Zeyi. Large margin object tracking with circulant feature maps[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 4800–4808.