鲁棒视觉词汇本的自适应构造与自然场景分类应用

杨丹; 李博; 赵红

doi:10.3724/SP.J.1146.2009.01323

鲁棒视觉词汇本的自适应构造与自然场景分类应用

doi: 10.3724/SP.J.1146.2009.01323

基金项目:

国家自然科学基金(60975015)，教育部博士点基金(20090191110023)和重庆市科技攻关项目(CSTC2009AC2057)资助课题

计量
- 文章访问数: 3631
- HTML全文浏览量: 108
- PDF下载量: 911
- 被引次数: 0
出版历程
- 收稿日期: 2009-10-12
- 修回日期: 2010-04-30
- 刊出日期: 2010-09-19

An Adaptive Algorithm for Robust Visual Codebook Generation and Its Natural Scene Categorization Application

摘要

摘要: 该文提出了一种视觉词汇本的优化构造策略。首先引入条件数定量评估海量低层特征的稳定性，排除病态特征，筛选稳定的鲁棒视觉特征；通过分析聚类和降维的内在联系，构造了具有聚类结构的视觉特征自适应降维算法；进而利用低维聚类结构信息中的邻域支持度，自适应选取最佳的初始视觉词汇，同时选择Sil指标作为目标函数，从而改进流行的LBG词汇本生成算法敏感于初始点的随机选取，并只能得到局部最优等不足。新的视觉词汇本生成算法具有聚类和降维的统一计算功能、良好的鲁棒性和自适应优化等特性。基于概率潜在语义分析技术将该文的视觉词汇本应用于自然场景分类，在13类场景图像库上取得了73.46%的平均分类率。
- 模式识别 /
- 自然场景分类 /
- 视觉词汇本 /
- 条件数
Abstract: This paper describes a novel optimization framework for visual codebook generation. Firstly, the Condition Number (CN) is applied to evaluate the stability of initial visual features, and the well conditioned features are preserved by eliminating the bad ones. At the mean time, an adaptive algorithm to generate low-dimensional visual words is proposed by studying the relationship between clustering and dimension-reducing. In order to overcome the popular LBG codebook design algorithm suffers from local optimality and is sensitive to the initial solution, a parameter called neighborhood-support for each feature is calculated according to clustering structure, which is used to select initial visual words adaptively. Finally, the rational distortion function is redefined using Silhouette. Compared with traditional algorithm, the presented algorithm has excellent properties at simultaneous clustering and dimension reduction, good robustness and adaptive optimization. A good performance (73.46% classification rate) of application this method to 13-Scene classification is obtained by using Probabilistic Latent Semantic Analysis (PLSA).
- Pattern recognition /
- Natural scene categorization /
- Visual codebook /
- Condition Number (CN)

HTML全文

参考文献(1)

Cummins M and Newman P. FAB-MAP: Probabilistic localization and mapping in the space of appearance[J].The International Journal of Robotics Research.2008, 27(6):647-665[2]Zhong W, Qifa K, Michael I, and Jian S. Bundling features for large-scale partial-duplicate web image search[C]. IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009: 25-32.[3]李志欣, 施智平, 李志清, 史忠植. 图像检索中语义映射方法综述[J]. 计算机辅助设计与图形学学报, 2008, 20(8): 1085-1096.Li Zhi-xin, Shi Zhi-ping, Li Zhi-qing, and Shi Zhong-zhi. A survey of semantic mapping in image retrieval[J].Journal of Computer Aided Design Computer Graphics.2008, 20(8):1085-1096[4]石跃祥, 朱东辉, 蔡自兴, Benhabib B. 图像语义特征的抽取方法及其应用[J].计算机工程.2007, 33(19):177-179Shi Yue-xiang, Zhu Dong-hui, Cai Zi-xing, and Benhabib B. Extraction of image semantic attributes and its application[J]. Computer Engineering, 2007, 33(19): 177-179.[5]Rasiwasia N and Vasconcelos N. Scene classification with low-dimensional semantic spaces and weak supervision[C]. IEEE Conference on Computer Vision and Pattern Recognition, Alaska, 2008: 1-6.[6]Bosch A, Zisserman A, and Munoz X. Scene classification via pLSA [C]. European Conference on Computer Vision, Austria, 2006: 517-530.[7]Li Fei-fei and Perona P. A Bayesian hierarchical model for learning natural scene categories[C]. IEEE Conference on Computer Vision and Pattern Recognition, San Diego, 2005: 524-531.[8]Lazebnik S, Schmid C, and Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories[C]. IEEE Conference on Computer Vision and Pattern Recognition, New York, 2006, 2: 2169-2178.[9]Kim S and Kweon I S. Simultaneous classification and visual word selection using entropy-based minimum description length[C]. IEEE International Conference of Pattern Recognition, Hong Kong, 2006: 650-653.[10]Liu Yang, Rong Jin, Sukthankar R, and Jurie F. Unifying discriminative visual codebook generation with classifier training for object category recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Alaska, 2008: 1-8.[11]Farquhar J, Szedmak S, Meng H, and Taylor J S. Improving bags-of-keypoints image categorization[R]. Tech report, University of Southampton, 2005.[12]Moosmann F, Triggs B, and Jurie F. Fast discriminative visual codebooks using randomized clustering forests[C]. In Neural Information Processing Systems, Vancouver, 2006: 985-992.[13]Jiang Yu-gang, Chong-Wah N, and Yang Jun. Towards optimal bag-of-features for object categorization and semantic video retrieval[C]. ACM International Conference on Image and Video Retrieval, New York, 2007: 494-501.[14]Linde Y, Buzo A, and Gray R M. An algorithm for vector quantizer design[J].IEEE Transactions on Communications.1980, 28(1):84-95[15]Kenney C, Manjunath B S, and Zuliani M. A condition number for point matching with application to registration and post-registration error estimation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence.2003, 25(11):1437-1454[16]马瑞，王家廞，宋亦旭. 基于局部线性嵌入(LLE)非线性降维的多流形学习[J]. 清华大学学报(自然科学版), 2008, 48(4): 582-585.Ma Rui, Wang Jia-xin, and Song Yi-xu. Multi-manifold learning using locally linear embedding (LLE) nonlinear dimensionality reduction[J]. Journal of Tsinghua University (Science and Technology), 2008, 48(4): 582-585.[17]Wang Jing, Zhang Zhen-yue, and Zha Hong-yuan. Adaptive manifold learning[C]. Advances in Neural Information Processing Systems, Cambridge, 2005: 1473-1480.[18]Frey B J and Dueck D. Clustering by passing messages between data points[J].Science.2007, 315:972-976

施引文献

资源附件(0)

访问统计