孪生网络框架下融合显著性和干扰在线学习的航拍目标跟踪算法

孙锐; 方林凤; 梁启丽; 张旭东

doi:10.11999/JEIT200140

孪生网络框架下融合显著性和干扰在线学习的航拍目标跟踪算法

doi: 10.11999/JEIT200140 cstr: 32379.14.JEIT200140

1.
合肥工业大学计算机与信息学院合肥 230009
2.
工业安全与应急技术安徽省重点实验室合肥 230009

基金项目: 国家自然科学基金(61471154, 61876057)，安徽省重点研发计划-科技强警专项(202004d07020012)

详细信息

作者简介:
孙锐：男，1976年生，教授，主要研究方向为计算机视觉与机器学习

方林凤：女，1994年生，硕士生，研究方向为图像信息处理和计算机视觉

梁启丽：女，1995年生，硕士生，研究方向为图像信息处理和计算机视觉

张旭东：男，1966年生，教授，主要研究方向为智能信息处理

通讯作者:
方林凤　f_linf@163.com

中图分类号: TN911.73; TP391
计量
- 文章访问数: 1401
- HTML全文浏览量: 581
- PDF下载量: 119
- 被引次数: 0
出版历程
- 收稿日期: 2020-03-03
- 修回日期: 2020-10-21
- 网络出版日期: 2020-11-19
- 刊出日期: 2021-05-18

Siamese Network Combined Learning Saliency and Online Leaning Interference for Aerial Object Tracking Algorithm

1.
School of Computer and Information, Hefei University of Technology, Hefei 230009, China
2.
Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei 230009, China

Funds: The National Natural Science Foundation of China (61471154, 61876057), The Key Research Plan of Anhui Province - Strengthening Police with Science and Technology (202004d07020012)

摘要

摘要: 针对一般跟踪算法不能很好地解决航拍视频下目标分辨率低、视场大、视角变化多等特殊难点，该文提出一种融合目标显著性和在线学习干扰因子的无人机(UAV)跟踪算法。通用模型预训练的深层特征无法有效地识别航拍目标，该文跟踪算法能根据反向传播梯度识别每个卷积滤波器的重要性来更好地选择目标显著性特征，以此凸显航拍目标特性。另外充分利用连续视频丰富的上下文信息，通过引导目标外观模型与当前帧尽可能相似地来在线学习动态目标的干扰因子，从而实现可靠的自适应匹配跟踪。实验证明：该算法在跟踪难点更多的UAV123数据集上跟踪成功率和准确率分别比孪生网络基准算法高5.3%和3.6%，同时速度达到平均28.7帧/s，基本满足航拍目标跟踪准确性和实时性需求。
- 目标跟踪 /
- 无人机航拍场景 /
- 孪生网络 /
- 目标显著性 /
- 在线学习干扰因子
Abstract: In view of the fact that the general tracking algorithm can not solve the special problems such as low resolution, large field of view and many changes of view angle, a Unmanned Aerial Vehicle (UAV) tracking algorithm combining target saliency and online learning interference factor is proposed. The deep feature that the general model pre-trained can not effectively identify the aerial target, the tracking algorithm can better select the salient feature of each convolution filter according to the importance of the back propagation gradient, so as to highlight the aerial target feature. In addition, it makes full use of the rich context information of the continuous video, and learn the interference factor of the dynamic target online by guiding the target appearance model as similar as possible to the current frame, so as to achieve reliable adaptive matching tracking. It is proved that the tracking success rate and accuracy rate of the algorithm are 5.3% and 3.6% higher than that of the siamese network benchmark algorithm on the more difficult UAV123 dataset, respectively, and the speed reaches an average of 28.7 frames per second, which basically meet the aerial target tracking accuracy and real-time requirements.
- Object tracking /
- Unmanned Aerial Vehicle (UAV) scene /
- Siamese network /
- Target saliency /
- Online leaning interference factor

HTML全文

图 1 本文跟踪算法框架

下载: 全尺寸图片幻灯片

图 2 可视化跟踪效果图

下载: 全尺寸图片幻灯片

图 3 目标干扰因子学习

下载: 全尺寸图片幻灯片

图 4 成功率和准确率对比

下载: 全尺寸图片幻灯片

图 5 视频序列测试图——低分辨率

下载: 全尺寸图片幻灯片

图 6 视频序列测试图——部分遮挡

下载: 全尺寸图片幻灯片

图 7 视频序列测试图——视角变化

下载: 全尺寸图片幻灯片

图 8 视频序列测试图——相似目标

下载: 全尺寸图片幻灯片

表 1 本文跟踪算法流程

输入：
(1) 第1帧$ {{Z}}_{1} $：目标位置坐标$ {{L}}_{1} $和包围框${{{b}}_{1}}$
(2) 第$t - {1}$帧${{{X}}_{t{ - 1}}}$：目标位置坐标${{{L}}_{t - {1}}}$和包围框${{{b}}_{t - {1}}}$
(3) 第$t$帧${{{X}}_t}$(当前帧)
输出；
${\rm{res}}$；当前帧每个位置的相似度值
Function $\left( {{{{O}}_t}:{{{O}}_1}:{{{O}}_{t - 1}}} \right)$= pretrain_feature 　　　　　$\left( {{{{X}}_t}:{{{Z}}_1}:{{{X}}_{t - 1}}} \right)$
$\left( {{{{O}}_t}:{{{O}}_1}:{{{O}}_{t - 1}}} \right) = \varphi \left( {{{{X}}_t}:{{{Z}}_1}:{{{X}}_{t - 1}}} \right)$
end
Function ${ {{O} }_t}^{'}$= Target_saliency_feature(${{{O}}_t}$)
${M_i} = {F_{ {\rm{ap} } } }\left(\dfrac{ {\partial J} }{ {\partial {F_i} } }\right)$
${{{O}}_t}^{'} = {{f}}({{{O}}_t};{{M}_i})$
end
Function ${{{S}}_{t - {1}}}$= get_disturbance_factor $\left( {{{{O}}_1}:{{{O}}_{t - 1}}:{\lambda _s}} \right)$ 　　　　　${ {{S} }_{t - {1} } } = { {\cal F}^{ - 1} }\left(\dfrac{ { { {\cal F}^ * }({ {{O} }_1}) \odot {\cal F}({ {{O} }_{t - 1} })} }{ { { {\cal F}^ * }({ {{O} }_1}) \odot {\cal F}({ {{O} }_1}) + {\lambda _s} } }\right)$
end
Function ${\rm{res}}$= detection $ \left({{S}}_{{t}-1};{{O}}_{1};{{O}}_{t}\right) $
${\rm{res}} = {\rm{corr} }({ {{S} }_{t - {1} } } * { {{O} }_{1} },{ {{O} }_t}^\prime )$
end

下载: 导出CSV

表 2 部分视频的跟踪成功率和跟踪准确率比较(%)

序列	Attibutes	Struck	SAMF	MUSTER	KCF	SRDCF	CFNet	本文
bike3	LR POC	6.9/30.0	15.7/22.8	19.4/27.7	12.2/20.5	13.9/35.4	14.1/45.2	17.8/65.5
boat5	VC	16.6/10.6	74.7/61.8	85.1/67.7	23.2/9.5	48.6/89.7	36.0/17.2	38.7/37.6
building5	CM	99.3/99.3	96.9/98.6	97.3/98.9	89.0/99.7	97.5/98.9	21.6/61.6	99.8/99.8
car15	LR POC SOB	42.4/96	3.0/8.5	8.5/11.7	2.3/8.5	44.4/100.0	45.8/82.4	49.1/99.7
person21	LR POC VC SOB	31.2/43.9	0.6/9.4	51.3/79	0.6/5.7	30.8/82.9	18.6/47.2	28.7/73.9
truck2	LR POC	42.9/44.4	86.1/100	48.9/48.8	39.2/44.4	70.5/100.0	88.2/62.3	88.5/99.7
uav4	LR SOB	6.3/14.4	1.9/14.0	3.8/13.3	1.3/14.0	7.6/15.3	2.8/5.7	8.9/19.8
wakeboard2	VC CM	3.1/48.2	3.3/16.0	5.3/21.4	4.9/22.0	4.5/13.1	6.4/12.2	26.1/64.6
car1_s	POC OV VC CM	18.4/18.4	18.4/18.4	18.4/18.4	18.4/18.7	23.2/22.1	10.6/10.3	23.1/21.0
person3_s	POC OV CM	30.2/20.7	46.5/35	46.1/41.7	30.0/31.2	69.5/46.5	25.1/16.8	48.3/55.8

下载: 导出CSV

表 3 算法的速度(fps)比较

算法	Struck	SAMF	MUSTER	KCF	DCF	SRDCF	CFNet	本文
FPS	15.4	6.4	1.0	526.5	470.2	8.4	31.4	28.7

下载: 导出CSV

参考文献(26)

[1]	TRILAKSONO B R, TRIADHITAMA R, ADIPRAWITA W, et al. Hardware-in-the-loop simulation for visual target tracking of octorotor UAV[J]. Aircraft Engineering and Aerospace Technology, 2011, 83(6): 407–419. doi: 10.1108/00022661111173289
[2]	黄静琪, 胡琛, 孙山鹏, 等. 一种基于异步传感器网络的空间目标分布式跟踪方法[J]. 电子与信息学报, 2020, 42(5): 1132–1139. doi: 10.11999/JEIT190460 HUANG Jingqi, HU Chen, SUN Shanpeng, et al. A distributed space target tracking algorithm based on asynchronous multi-sensor networks[J]. Journal of Electronics &Information Technology, 2020, 42(5): 1132–1139. doi: 10.11999/JEIT190460
[3]	KRISTAN M, MATAS J, LEONARDIS A, et al. A novel performance evaluation methodology for single-target trackers[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(11): 2137–2155. doi: 10.1109/TPAMI.2016.2516982
[4]	HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583–596. doi: 10.1109/TPAMI.2014.2345390
[5]	SUN Chong, WANG Dong, LU Huchuan, et al. Correlation tracking via joint discrimination and reliability learning[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 489–497. doi: 10.1109/CVPR.2018.00058.
[6]	SUN Chong, WANG Dong, LU Huchuan, et al. Learning spatial-aware regressions for visual tracking[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8962–8970. doi: 10.1109/CVPR.2018.00934.
[7]	QI Yuankai, ZHANG Shengping, QIN Lei, et al. Hedging deep features for visual tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(5): 1116–1130. doi: 10.1109/TPAMI.2018.2828817
[8]	SONG Yibing, MA Chao, WU Xiaohe, et al. VITAL: Visual tracking via adversarial learning[C]. The 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018: 8990–8999. doi: 10.1109/CVPR.2018.00937.
[9]	BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]. The 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 2016: 850–865. doi: 10.1007/978-3-319-48881-3_56.
[10]	KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking vot2018 challenge results[C]. Computer Vision ECCV 2018 Workshops, Munich, Germany, 2018: 3–53. doi: 10.1007/978-3-030-11009-3_1.
[11]	CHEN Kai and TAO Wenbing. Once for all: A two-flow convolutional neural network for visual tracking[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(12): 3377–3386. doi: 10.1109/TCSVT.2017.2757061
[12]	HELD D, THRUN S, SAVARESE S, et al. Learning to track at 100 fps with deep regression networks [C] The 14th European Conference on Computer Vision (ECCV) , Amsterdam, Netherlands, 2016, 9905: 749–765. doi: 10.1007/978-3-319-46448-0_45.
[13]	侯志强, 陈立琳, 余旺盛, 等. 基于双模板Siamese网络的鲁棒视觉跟踪算法[J]. 电子与信息学报, 2019, 41(9): 2247–2255. doi: 10.11999/JEIT181018 HOU Zhiqiang, CHEN Lilin, YU Wangsheng, et al. Robust visual tracking algorithm based on Siamese network with dual templates[J]. Journal of Electronics &Information Technology, 2019, 41(9): 2247–2255. doi: 10.11999/JEIT181018
[14]	HUANG Chen, LUCEY S, RAMANAN D, et al. Learning policies for adaptive tracking with deep feature cascades[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 105–114. doi: 10.1109/ICCV.2017.21.
[15]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 618–626. doi: 10.1109/ICCV.2017.74.
[16]	MUELLER M, SMITH N, and GHANEM B. A benchmark and simulator for UAV tracking[C]. The 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 2016: 445–461. doi: 10.1007/978-3-319-46448-0_27.
[17]	HENRIQUES J F, CASEIRO R, MARTINS P, et al. Exploiting the circulant structure of tracking-by-detection with kernels[C]. The 12th European Conference on Computer Vision (ECCV), Florence, Italy, 2012: 702–715. doi: 10.1007/978-3-642-33765-9_50.
[18]	DANELLJAN M, HÄGER G, SHAHBAZ K, et al. Accurate scale estimation for robust visual tracking[C]. The British Machine Vision Conference (BMVC), Nottingham, UK, 2014: 65.1–65.11. doi: 10.5244/C.28.65.
[19]	HARE S, GOLODETZ S, and SAFFARI A. Struck: Structured output tracking with kernels[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2096–2109. doi: 10.1109/TPAMI.2015.2509974
[20]	ZHANG Jianming, MA Shugao, and SCLAROFF S. MEEM: Robust tracking via multiple experts using entropy minimization[C]. The 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 2014: 188–203. doi: 10.1007/978-3-319-10599-4_13.
[21]	HONG Zhibin, CHEN Zhe, WANG Chaohui, et al. Multi-store tracker (MUSTer): A cognitive psychology inspired approach to object tracking[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, USA, 2015: 749–758. doi: 10.1109/CVPR.2015.7298675.
[22]	KRISTAN M, PFLUGFELDER R, LEONARDIS A, et al. The visual object tracking VOT2014 challenge results[C]. The 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 2014: 191–217. doi: 10.1007/978-3-319-16181-5_14.
[23]	JIA Xu, LU Huchuan, and YANG M H. Visual tracking via adaptive structural local sparse appearance model[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, USA, 2012: 1822–1829. doi: 10.1109/CVPR.2012.6247880.
[24]	VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 5000–5008. doi: 10.1109/CVPR.2017.531.
[25]	DANELLJAN M, HÄGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 4310–4318. doi: 10.1109/ICCV.2015.490.
[26]	LI Bo, YAN Junjie, WU Wei, et al. High performance visual tracking with Siamese region proposal network[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8971–8980. doi: 10.1109/CVPR.2018.00935.