基于弱监督E2LSH和显著图加权的目标分类方法

赵永威; 李弼程; 柯圣财

doi:10.11999/JEIT150337

基于弱监督E2LSH和显著图加权的目标分类方法

doi: 10.11999/JEIT150337 cstr: 32379.14.JEIT150337

基金项目:

国家自然科学基金(60872142, 61301232)

计量
- 文章访问数: 1300
- HTML全文浏览量: 162
- PDF下载量: 843
- 被引次数: 0
出版历程
- 收稿日期: 2015-03-23
- 修回日期: 2015-09-09
- 刊出日期: 2016-01-19

Object Classification Method Based on Weakly Supervised E2LSH and Saliency Map Weighting

Funds:

The National Natural Science Foundation of China (60872142, 61301232)

摘要

摘要: 在目标分类领域，当前主流的目标分类方法是基于视觉词典模型，而时间效率低、视觉单词同义性和歧义性及单词空间信息的缺失等问题严重制约了其分类性能。针对这些问题，该文提出一种基于弱监督的精确位置敏感哈希(E2LSH)和显著图加权的目标分类方法。首先，引入E2LSH算法对训练图像集的特征点聚类生成一组视觉词典，并提出一种弱监督策略对E2LSH中哈希函数的选取进行监督，以降低其随机性，提高视觉词典的区分性。然后，利用GBVS(Graph-Based Visual Saliency)显著度检测算法对图像进行显著度检测，并依据单词所处区域的显著度值为其分配权重；最后，利用显著图加权的视觉语言模型完成目标分类。在数据集Caltech-256和Pascal VOC 2007上的实验结果表明，所提方法能够较好地提高词典生成效率，提高目标表达的分辨能力，其目标分类性能优于当前主流方法。
- 目标分类 /
- 视觉词典模型 /
- 精确位置敏感哈希 /
- 视觉显著图 /
- 视觉语言模型
Abstract: The most popular approach in object classification is based on the bag of visual-words model. However, there are several fundamental problems that restricts the performance of this method, such as low time efficiency, the synonym and polysemy of visual words, and the lack of spatial information between visual words. In view of this, an object classification method based on weakly supervised Exact Euclidean Locality Sensitive Hashing (E2LSH) and saliency map weighting is proposed. Firstly, E2LSH is employed to generate a group of visual dictionary by clustering SIFT features of the training dataset, and the selecting process of hash functions is effectively supervised inspired by the random forest ideas to reduce the randomcity of E2LSH. Secondly, Graph-Based Visual Saliency (GBVS) algorithm is applied to detect the saliency map of different images and visual words are weighted according to the saliency prior. Finally, saliency map weighted visual language model is carried out to accomplish object classification. Experimental results on datasets of Caltech-256 and Pascal 2007 indicate that the distinguishability of objects is effectively improved and the proposed method is superior to the state- of-the-art object classification methods.
- Object classification /
- Bag of visual words model /
- Exact Euclidean Locality Sensitive Hashing (E2LSH) /
- Visual saliency map /
- Visual language model

HTML全文

参考文献(36)

SIVIC J and ZISSERMAN A. Video Google: a text retrieval approach to object matching in videos[C]. Proceedings of 9th IEEE International Conference on Computer Vision, Nice, France, 2003: 1470-1477.

CHEN Y Z, Dick A, LI X, et al. Spatially aware feature selection and weighting for object retrieval[J]. Image and Vision Computing, 2013, 31(6): 935-948.

WANG J Y, Bensmail H, and GAO X. Joint learning and weighting of visual vocabulary for bag-of-feature based tissue classification[J]. Pattern Recognition, 2013, 46(3): 3249-3255.

OT?VIO A, PENATTI B, FERNANDA B S, et al. Visual word spatial arrangement for image retrieval and classification[J]. Pattern Recognition, 2014, 47(1): 705-720.

宋相法, 焦李成. 基于稀疏编码和集成学习的多示例多标记图像分类方法[J]. 电子与信息学报, 2013, 35(3): 622-626. doi: 10.3724/SP.J.1146.2012.01218.

SONG Xiangfa and JIAO Licheng. A multi-instance multi-label image classification method based on sparse coding and ensemble learning[J]. Jounal of Electronics Information Technology, 2013, 35(3): 622-626. doi: 10.3724/ SP.J.1146.2012.01218.

LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.

VAN GEMERT J C, VEENMAN C J, SMEULDERS A W M, et al. Visual word ambiguity[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(7): 1271-1283.

NISTER D and STEWENIUS H. Scalable recognition with a vocabulary tree[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006: 2161-2168.

PHILBIN J, CHUM O, ISARD M, et al. Object retrieval with large vocabularies and fast spatial matching[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, USA, 2007: 1-8.

MU Y D, SUN J, and YAN S C. Randomized locality sensitive vocabularies for bag-of-features model[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 1-14.

CAO Yiqun, JIANG Tao, and THOMAS G. Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing[J]. Bioinformatics, 2010, 26(7): 953-959.

XIA Hao, WU Pengcheng, and STEVEN C H. Boosting multi-kernel locality-sensitive hashing for scalable image retrieval[C]. Proceedings of 35th ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, Oregon, USA, 2012: 55-64.

张瑞杰, 郭志刚, 李弼程. 基于E2LSH-MKL的视觉语义概念检测[J]. 自动化学报, 2012, 38(10): 1671-1678.

ZHANG Ruijie, GUO Zhigang, and LI Bicheng. A visual semantic concept detection algorithm based on E2LSH- MKL[J]. Acta Automatica Sinica, 2012, 38(10): 1671-1678.

ZHENG Q and GAO W. Constructing visual phrases for effective and efficient object-based image retrieval[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2008, 5(1): 1-19.

CHEN T, YAP K H, and ZHANG D J. Discriminative soft bag-of-visual phrase for mobile landmark recognition[J]. IEEE Transactions on Multimedia, 2014, 16(3): 612-622.

PHILBIN J, CHUM O, ISARD M, et al. Lost in quantization: improving particular object retrieval in large scale image databases[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, USA, 2009: 278-286.

WEINSHALL D, LEVI G, and HANUKAEV D. LDA topic model with soft assignment of descriptors to words[C]. Proceedings of the 30th International Conference on Machine Learning, Atlanta, USA, 2013: 711-719.

LAZEBNIK S, SCHMID C, and PONCE J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006: 2169-2178.

SHARMA G and JURIE F. Learning discriminative spatial representation for image classification[C]. Proceedings of the 22nd British Machine Vision Conference, Dundee, Britain, 2011: 1-11.

赵春晖, 王莹, KANEKO M. 一种基于词典模型的图像优化分类方法[J]. 电子与信息学报, 2012, 34(9): 2064-2070. doi: 10.3724/SP.J.1146.2012.00047.

ZHAO Chunhui, WANG Ying, and KANEKO M. An optimized method for image classification based on bag of words model[J]. Journal of Electronics Information Technology, 2012, 34(9): 2064-2070. doi: 10.3724/ SP.J.1146.2012.00047.

赵仲秋, 季海峰, 高隽, 等. 基于稀疏编码多尺度空间潜在语义分析的图像分类[J]. 计算机学报, 2014, 37(6): 1251-1260.

ZHAO Zhongqiu, JI Haifeng, GAO Jun, et al. Sparse coding based on multi-scale spatial latent semantic analysis for image classification[J]. Chinese Journal of Computers, 2014, 37(6): 1251-1260.

XIE L, TIAN Q, and ZHANG B. Spatial pooling of heterogeneous features for image classification[J]. IEEE Transactions on Image Processing, 2014, 23(5): 1994-2008.

GENG B, YANG L, and XU C. A study of language model for image retrieval[C]. Proceedings of IEEE International Conference on Data Mining Workshops, Washington, DC, USA, 2009: 158-163.

吴磊. 视觉语言分析: 从底层视觉特征表达到语义距离学习[D]. [博士论文], 中国科学技术大学, 2010.

WU Lei. Visual language analysis: from low level feature representation to semantic metric learning[D]. [Ph.D. dissertation], University of Science and Technology of China, 2010.

DATAR M, IMMORLICA N, and INDYK P. Locality-sensitive hashing scheme based on p-stable distributions[C]. Proceedings of the 20th Annual Symposium on Computational Geometry, New York, USA, 2004: 253-262.

HAREL J, KOCH C, and PERONA P. Graph-based visual saliency [C]. Proceedings of Advances in Neural Information Processing Systems, NewYork, USA, 2007: 545-552.

SLANEY M and CASEY M. Locality-sensitive hashing for finding nearest neighbors[J]. IEEE Signal Processing Magazine, 2008, 25(2): 128-131.

高毫林, 彭天强, 李弼程. 基于多表频繁项投票和桶映射链的快速检索方法[J]. 电子与信息学报, 2012, 34(11): 2574-2581. doi: 10.3724/ SP.J.1146.2012.00548.

GAO Haolin, PENG Tianqiang, and LI Bicheng. A fast retrieval method based on frequent items voting of multi table and bucket map chain[J]. Journal of Electronics Information Technology, 2012, 34(11): 2574-2581. doi: 10.3724/SP.J.1146.2012.00548.

ITTI L, KOCH C, and NIEBUR E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(4): 1254-1259.

LI F F, FERGUS R, and PERONA P. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories[J]. Computer Vision and Image Understanding, 2007, 106(1): 59-70.

施引文献

资源附件(0)

访问统计