基于感知深度神经网络的视觉跟踪

侯志强; 戴铂; 胡丹; 余旺盛; 陈晨; 范舜奕

doi:10.11999/JEIT151449

基于感知深度神经网络的视觉跟踪

doi: 10.11999/JEIT151449

基金项目:

国家自然科学基金(61175029, 61473309)，陕西省自然科学基金(2015JM6269，2015JM6269，2016JM6050)

计量
- 文章访问数: 1829
- HTML全文浏览量: 115
- PDF下载量: 945
- 被引次数: 0
出版历程
- 收稿日期: 2015-12-22
- 修回日期: 2016-05-04
- 刊出日期: 2016-07-19

Robust Visual Tracking via Perceptive Deep Neural Network

Funds:

The National Natural Science Foundation of China (61175029, 61473309), The Natural Science Foundation of Shaanxi Province (2015JM6269, 2015JM6269, 2016JM6050)

摘要

摘要: 视觉跟踪系统中，高效的特征表达是决定跟踪鲁棒性的关键，而多线索融合是解决复杂跟踪问题的有效手段。该文首先提出一种基于多网络并行、自适应触发的感知深度神经网络；然后，建立一个基于深度学习的、多线索融合的分块目标模型。目标分块的实现成倍地减少了网络输入的维度，从而大幅降低了网络训练时的计算复杂度；在跟踪过程中，模型能够根据各子块的置信度动态调整权重，提高对目标姿态变化、光照变化、遮挡等复杂情况的适应性。在大量的测试数据上进行了实验，通过对跟踪结果进行定性和定量分析表明，所提出算法具有很强的鲁棒性，能够比较稳定地跟踪目标。
- 视觉跟踪 /
- 特征表达 /
- 深度学习 /
- 感知深度神经网络
Abstract: In a visual tracking system, the feature description plays the most important role. Multi-cue fusion is an effective way to solve the tracking problem under many complex conditions. Therefore, a perceptive deep neural network based on multi parallel networks which can be triggered adaptively is proposed. Then, using the multi-cue fusion, a new tracking method based on deep learning is established, in which the target can be adaptively fragmented. The fragment decreases the input dimension, thus reducing the computation complexity. During the tracking process, the model can dynamically adjust the weights of fragments according to the reliability of them, which is able to improve the flexibility of the tracker to deal with some complex circumstances, such as target posture change, light change and occluded by other objects. Qualitative and quantitative analysis on challenging benchmark video sequences show that the proposed tracking method is robust and can track the moving target robustly.
- Visual tracking /
- Feature description /
- Deep learning /
- Perceptive deep neural network

HTML全文

参考文献(24)

侯志强, 韩崇昭. 视觉跟踪技术综述[J]. 自动化学报, 2006, 32(4): 603-617.

HOU Zhiqiang and HAN Chongzhao. A Survey of visual tracking[J]. Acta Automatica Sinica, 2006, 32(4): 603-617.

WANG Naiyan, SHI Jianping, YEUNG Dityan, et al. Understanding and diagnosing visual tracking systems[C]. International Conference on Computer Vision, Santiago, Chile, 2015: 11-18.

BABENKO B, YANG M, and BELONGIE S. Visual tracking with online multiple instance learning[C]. International Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009: 983-990. doi: 10.1109/CVPR.2009. 5206737.

KALAL Z, MIKOLAJCZYK K, and MATAS J. Tracking learning detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409-1422. doi: 10.1109/TPAMI.2011.239.

HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification[C]. International Conference on Computer Vision, Santiago, Chile, 2015: 1026-1034.

COURBARIAUX M, BENGIO Y, and DAVID J P. Binary Connect: training deep neural networks with binary weights during propagations[C]. Advances in Neural Information Processing Systems, Montral, Quebec, Canada, 2015: 3105-3113.

SAINATH T N, VINYALS O, SENIOR A, et al. Convolutional, long short term memory, fully connected deep neural networks[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, 2015: 4580-4584. doi: 10.1109/ICASSP.2015.7178838.

PARKHI O M, VEDALDI A, and ZISSERMAN A. Deep face recognition[J]. Proceedings of the British Machine Vision, 2015, 1(3): 6.

WANG Naiyan and YEUNG Dityan. Learning a deep compact image representation for visual tracking[C]. Advances in Neural Information Processing Systems, South Lake Tahoe, Nevada, USA, 2013: 809-817.

李寰宇, 毕笃彦, 杨源, 等. 基于深度特征表达与学习的视觉跟踪算法研究[J]. 电子与信息学报, 2015, 37(9): 2033-2039.

LI Huanyu, BI Duyan, YANG Yuan, et al. Research on visual tracking algorithm based on deep feature expression and learning[J]. Journal of Electronics Information Technology, 2015, 37(9): 2033-2039. doi: 10.11999/JEIT150031.

RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. doi: 10.1007/ s11263-015-0816-y.

VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion[J]. Journal of Machine Learning Research, 2010, 11(11): 3371-3408.

HINTON G E and SALAKHUTDINOV R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. doi: 10.1126/science.1127647.

ADAM A, RIVLIN E, and SHIMSHONI I. Robust fragments-based tracking using the integral histogram[C]. International Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 2006: 798-805. doi: 10.1109/CVPR.2006.256.

JULIER S J and UHLM J U. Unscented filtering and nonlinear estimation[J]. Proceedings of IEEE, 2004, 192(3): 401-422. doi: 10.1109/JPROC.2003.823141.

YILMAZ A, JAVED O, and SHAH M. Object tracking: a survey[J]. ACM Computer Survey, 2006, 38(4): 1-45.

NICKEL K and STIEFELHAGEN R. Dynamic integration of generalized cues for person tracking[C]. European Conference on Computer Vision, Marseille, France, 2008: 514-526. doi: 10.1007/978-3-540-88693-8_38.

SPENGLER M and SCHIELE B. Towards robust multi-cue integration for visual tracking[J]. Machine Vision and Applications, 2003, 14(1): 50-58. doi: 10.1007/s00138-002- 0095-9.

WU Yi, LIM Jongwoo, and YANG Minghsuan. Online object tracking: a benchmark[C]. International Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 2013: 2411-2418.

ZHANG Kaihua, ZHANG Lei, and YANG Minghsuan. Real-time compressive tracking[C]. European Conference on Computer Vision, Florence, Italy, 2012: 866-879. doi: 10.1007/978-3-642-33712-3_62.

SEVILLA-LARA L and LEARNED-MILLER E. Distribution fields for tracking[C]. International Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012: 1910-1917. doi: 10.1109/CVPR.2012.6247891.

LI Hanxi, LI Yi, and PORIKLI Fatih. Deeptrack: learning discriminative feature representations by convolutional neural networks for visual tracking[C]. Proceedings of the British Machine Vision Conference, Nottingham, UK, 2014: 110-119. doi: 10.1109/TIP.2015.2510583.

施引文献

资源附件(0)

访问统计