长时视觉跟踪中基于双模板Siamese结构的目标漂移判定网络

侯志强; 王卓; 马素刚; 赵佳鑫; 余旺盛; 范九伦

doi:10.11999/JEIT230496

长时视觉跟踪中基于双模板Siamese结构的目标漂移判定网络

doi: 10.11999/JEIT230496 cstr: 32379.14.JEIT230496

1.
西安邮电大学计算机学院西安 710121
2.
空军工程大学信息与导航学院西安 710038

基金项目: 国家自然科学基金(62072370)，陕西省自然科学基金(2023-JC-YB-598)

详细信息

作者简介:
侯志强：男，教授，博士生导师，研究方向为图像处理、计算机视觉

王卓：女，硕士生，研究方向为计算机视觉、目标跟踪

马素刚：男，博士生，研究方向为计算机视觉、机器学习

赵佳鑫：男，硕士生，研究方向为计算机视觉、目标跟踪

余旺盛：男，副教授，硕士生导师，研究方向为图像处理、视觉跟踪

范九伦：男，教授，博士生导师，研究方向为模式识别、信号与信息处理等

通讯作者:
王卓　wz210086@163.com

中图分类号: TN911.73; TP391.4
计量
- 文章访问数: 614
- HTML全文浏览量: 469
- PDF下载量: 33
- 被引次数: 0
出版历程
- 收稿日期: 2023-05-26
- 修回日期: 2023-12-27
- 网络出版日期: 2024-01-02
- 刊出日期: 2024-04-24

Target Drift Discriminative Network Based on Dual-template Siamese Structure in Long-term Tracking

1.
Institute of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
2.
Institute of Information and Navigation, Air Force Engineering University, Xi’an 710038, China

Funds: The National Natural Science Foundation of China (62072370), The Natural Science Foundation of Shaanxi Province (2023-JC-YB-598)

摘要

摘要: 在长时视觉跟踪中，大部分目标丢失判定方法需要人为确定阈值，而最优阈值的选取通常较为困难，造成长时跟踪算法的泛化能力较弱。为此，该文提出一种无需人为选取阈值的目标漂移判定网络(DNet)。该网络采用Siamese结构，利用静态模板和动态模板共同判定跟踪结果是否丢失，其中，引入动态模板有效提高算法对目标外观变化的适应能力。为了对所提目标漂移判定网络进行训练，建立了样本丰富的数据集。为验证所提网络的有效性，将该网络与基础跟踪器和重检测模块相结合，构建了一个完整的长时跟踪算法。在UAV20L, LaSOT, VOT2018-LT和VOT2020-LT等经典的视觉跟踪数据集上进行了测试，实验结果表明，相比于基础跟踪器，在UAV20L数据集上，跟踪精度和成功率分别提升了10.4%和7.5%。
- 长时跟踪 /
- 深度学习 /
- 目标漂移判定网络 /
- Siamese结构 /
- 双模板
Abstract: In long-term visual tracking, most of the target loss discriminative methods require artificially determined thresholds, and the selection of optimal thresholds is usually difficult, resulting in weak generalization ability of long-term tracking algorithms. A target drift Discriminative Network (DNet) that does not require artificially selected thresholds is proposed. The network adopts Siamese structure and uses both static and dynamic templates to determine whether the tracking results are lost or not. Among them, the introduction of dynamic templates effectively improves the algorithm’s ability to adapt to changes in target appearance. In order to train the proposed target drift discriminative network, a sample-rich dataset is established. To verify the effectiveness of the proposed network, a complete long-term tracking algorithm is constructed in this paper by combining this network with the base tracker and the re-detection module. It is tested on classical visual tracking datasets such as UAV20L, LaSOT, VOT2018-LT and VOT2020-LT. The experimental results show that compared with the base tracker, the tracking accuracy and success rate are improved by 10.4% and 7.5% on UAV20L dataset, respectively.
- Long-term tracking /
- Deep learning /
- Target drift Discriminative Network(DNet) /
- Siamese structure /
- Dual template

HTML全文

图 1 整体跟踪算法框架图

下载: 全尺寸图片幻灯片

图 2 基于双模板Siamese结构的目标漂移判定网络(DNet)

下载: 全尺寸图片幻灯片

图 3 获取静态模板T₁和动态模板T₂

下载: 全尺寸图片幻灯片

图 4 部分正负训练样本示例

下载: 全尺寸图片幻灯片

图 5 在VOT2018-LT上不同跟踪算法的定性对比结果

下载: 全尺寸图片幻灯片

图 6 UAV123上不同跟踪算法的成功率和精确图对比结果

下载: 全尺寸图片幻灯片

图 7 UAV20L上不同跟踪算法的成功率和精确图对比结果

下载: 全尺寸图片幻灯片

图 8 UAV20L的12个视频属性上不同跟踪算法的成功率对比结果

下载: 全尺寸图片幻灯片

图 9 LaSOT上不同跟踪算法的对比结果

下载: 全尺寸图片幻灯片

图 10 本文算法在VOT2018-LT数据集上与其他跟踪算法的跟踪速度对比结果图

下载: 全尺寸图片幻灯片

表 1 在UAV20L上选取不同置信度阈值的跟踪结果精确率，最优结果使用粗体

阈值	0.70	0.75	0.80	0.85	0.90
精确率	0.829	0.848	0.842	0.841	0.836

下载: 导出CSV

表 2 训练数据划分

	正样本	负样本
训练集	45352	53836
验证集	11338	13458
共计	56690	67294

下载: 导出CSV

1 长时视觉跟踪中基于双模板Siamese结构的目标漂移判定网络

输入：图像序列：I₁,I₂,···,I_n；初始目标位置：P₀=(x₀, y₀, w, h)
输出：预估目标位置：P_n=( x_0, y_0, w, h)。
For i=2,3,···,n do:
步骤1 跟踪目标
根据上一帧中心位置裁剪出搜索区域Search Region；
提取基准模板和搜索区域的特征；
将经过模型预测器的模板特征与搜索区域特征进行互相关Corr操　作，得到结果响应图，即最高响应点位置是被跟踪目标的位置。
步骤2基于双模板Siamese结构的目标漂移判定网络DNet
输入初始跟踪结果R、静态模板T₁与动态模板T₂；
经过4层卷积(Conv1～Conv4),并将浅层特征与深层特征相结　合，最终提取到特征t,t₁,t₂；
对双模板特征t₁和t₂进行融合之后，得到特征x₁，并与初始跟踪　结果t进行特征相减，得到特征x₂；
特征x₂与初始跟踪结果t进行Addtion操作，得到最终特征x₃；
将特征x₃送入3层全连接(FC1～FC3)中降参数；
随后使用Softmax函数与Argmax函数得到最终结果0或1，即目　标丢失或未丢失；
若判断目标未丢失，同时基础跟踪器中的置信度分数F_max大于阈　值$ \sigma $，则更新动态模板T₂。
步骤3 重检测模块
如果判定网络判断目标丢失同时置信度分数 F_max小于0.3，则启　动重检测模块；
将当前帧的整幅图像送入GlobalTrack算法中；
最后得到目标框，即为被跟踪目标。

下载: 导出CSV

表 3 在不同数据集下不同判定方法的阈值选取

	UAV123	UAV20L	LaSOT	VOT2018-LT	VOT2020-LT
PSR	5	9	5	7	8
APCE	16	20	19	19	20
DNet	–	–	–	–	–

下载: 导出CSV

表 4 在VOT2018-LT上的消融实验结果，最佳结果以加黑突出显示

DiMP50	MDNet	PSR	APCE&Fmax	DNet&Fmax	GlobalTrack	F-score	FPS
√	√				√	0.614	6.5
√		√			√	0.627	21
√			√		√	0.629	22
√				√	√	0.630	20

下载: 导出CSV

表 5 在VOT2018-LT上不同算法的性能评估，第1和第2优的结果用粗体和斜体显示

	FuCoLoT	PTAVplus	DaSiam_LT	MBMD	DiMP50	SPLT	GlobalTrack	TJLGS	ELGLT	本文算法
F-score	0.480	0.481	0.607	0.610	0.588	0.616	0.555	0.586	0.638	0.630
Pr	0.539	0.595	0.627	0.634	0.564	0.633	0.503	0.649	0.669	0.643
Re	0.432	0.404	0.588	0.588	0.614	0.600	0.528	0.535	0.610	0.618

下载: 导出CSV

表 6 在VOT2020-LT上的性能评估，第1和第2优的结果用粗体和斜体显示

	ltMDNet	MBMD	mbdet	SPLT	DiMP50	GlobalTrack	SiamRN	TJLGS	STMTrack	TACT	ELGLT	本文算法
F-score	0.574	0.575	0.567	0.565	0.567	0.520	0.444	0.515	0.550	0.569	0.590	0.599
Pr	0.649	0.623	0.609	0.565	0.606	0.529	0.575	0.628	0.611	0.578	0.637	0.609
Re	0.514	0.534	0.530	0.544	0.533	0.512	0.361	0.437	0.500	0.561	0.550	0.590

下载: 导出CSV

参考文献(22)

[1]	周治国, 荆朝, 王秋伶, 等. 基于时空信息融合的无人艇水面目标检测跟踪[J]. 电子与信息学报, 2021, 43(6): 1698–1705. doi: 10.11999/JEIT200223. ZHOU Zhiguo, JING Zhao, WANG Qiuling, et al. Object detection and tracking of unmanned surface vehicles based on spatial-temporal information fusion[J]. Journal of Electronics & Information Technology, 2021, 43(6): 1698–1705. doi: 10.11999/JEIT200223.
[2]	MUELLER M, SMITH N, and GHANEM B. A benchmark and simulator for UAV tracking[C]. The 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 445–461. Doi: 10.1007/978-3-319-46448-0_27.
[3]	FAN Heng, LIN Liting, YANG Fan, et al. LaSOT: A high-quality benchmark for large-scale single object tracking[C]. The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 5369–5378. doi: 10.1109/CVPR.2019.00552.
[4]	LUKEŽIČ A, ZAJC L Č, VOJÍŘ T, et al. Now you see me: Evaluating performance in long-term visual tracking[EB/OL]. https://arxiv.org/abs/1804.07056, 2018.
[5]	KRISTAN M, LEONARDIS A, MATAS J, et al. The eighth visual object tracking VOT2020 challenge results[C]. European Conference on Computer Vision, Glasgow, UK, 2020: 547–601. doi: 10.1007/978-3-030-68238-5_39.
[6]	BOLME D S, BEVERIDGE J R, DRAPER B A, et al. Visual object tracking using adaptive correlation filters[C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 2544–2550. doi: 10.1109/CVPR.2010.5539960.
[7]	WANG Mengmeng, LIU Yong, and HUANG Zeyi. Large margin object tracking with circulant feature maps[C]. The 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 4800–4808. doi: 10.1109/CVPR.2017.510.
[8]	LUKEŽIČ A, ZAJC L Č, VOJÍŘ T, et al. Fucolot–a fully-correlational long-term tracker[C]. 14th Asian Conference on Computer Vision, Perth, Australia, 2018: 595–611. doi: 10.1007/978-3-030-20890-5_38.
[9]	YAN Bin, ZHAO Haojie, WANG Dong, et al. 'Skimming-Perusal' tracking: A framework for real-time and robust long-term tracking[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 2385–2393. doi: 10.1109/ICCV.2019.00247.
[10]	KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking vot2018 challenge results[C]. The European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 2018: 3–53. doi: 10.1007/978-3-030-11009-3_1.
[11]	ZHANG Yunhua, WANG Dong, WANG Lijun, et al. Learning regression and verification networks for long-term visual tracking[EB/OL].https://arxiv.org/abs/1809.04320, 2018.
[12]	XUAN Shiyu, LI Shengyang, ZHAO Zifei, et al. Siamese networks with distractor-reduction method for long-term visual object tracking[J]. Pattern Recognition, 2021, 112: 107698. doi: 10.1016/j.patcog.2020.107698.
[13]	DAI Kenan, ZHANG Yunhua, WANG Dong, et al. High-performance long-term tracking with meta-updater[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 6297–6306. doi: 10.1109/CVPR42600.2020.00633.
[14]	WU Yi, LIM J, and YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834–1848. doi: 10.1109/TPAMI.2014.2388226.
[15]	BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning discriminative model prediction for tracking[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 6181–6190. doi: 10.1109/ICCV.2019.00628.
[16]	HUANG Lianghua, ZHAO Xin, and HUANG Kaiqi. Globaltrack: A simple and strong baseline for long-term tracking[C]. The 34th AAAI Conference on Artificial Intelligence, New York, USA, 2020: 11037–11044. doi: 10.1609/aaai.v34i07.6758.
[17]	CHENG Siyuan, ZHONG Bineng, LI Guorong, et al. Learning to filter: Siamese relation network for robust tracking[C]. The 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 4419–4429. doi: 10.1109/CVPR46437.2021.00440.
[18]	CAO Ziang, FU Changhong, YE Junjie, et al. HiFT: Hierarchical feature transformer for aerial tracking[C]. The 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 15457–15466. doi: 10.1109/ICCV48922.2021.01517.
[19]	CAO Ziang, HUANG Ziyuan, PAN Liang, et al. TCTrack: Temporal contexts for aerial tracking[C]. The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 14778–14788. doi: 10.1109/CVPR52688.2022.01438.
[20]	FU Zhihong, LIU Qingjie, FU Zehua, et al. STMTrack: Template-free visual tracking with space-time memory networks[C]. The 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 13769–13778. doi: 10.1109/CVPR46437.2021.01356.
[21]	ZHAO Haojie, YAN Bin, WANG Dong, et al. Effective local and global search for fast long-term tracking[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2023, 45(1): 460–474. doi: 10.1109/TPAMI.2022.3153645.
[22]	https://www.votchallenge.net/vot2019/results.html.