Advanced Search
Volume 32 Issue 11
Dec.  2010
Turn off MathJax
Article Contents
Xu Chao, Zhou Yi-Min, Shen Lei. A Context Tree Kernel Based on Latent Semantic Topic[J]. Journal of Electronics & Information Technology, 2010, 32(11): 2695-2700. doi: 10.3724/SP.J.1146.2009.01493
Citation: Xu Chao, Zhou Yi-Min, Shen Lei. A Context Tree Kernel Based on Latent Semantic Topic[J]. Journal of Electronics & Information Technology, 2010, 32(11): 2695-2700. doi: 10.3724/SP.J.1146.2009.01493

A Context Tree Kernel Based on Latent Semantic Topic

doi: 10.3724/SP.J.1146.2009.01493
  • Received Date: 2009-11-20
  • Rev Recd Date: 2010-03-09
  • Publish Date: 2010-11-19
  • The lack of semantic information is a critical problem of context tree kernel in text representation. A context tree kernel method based on latent topics is proposed. First, words are mapped to latent topic space through Latent Dirichlet Allocation(LDA). Then, context tree models are built using latent topics. Finally, context tree kernel for text is defined through mutual information between the models. In this approach, document generative models are defined using semantic class instead of words, and the issue of statistic data sparse is solved. The clustering experiment results on text data set show, the proposed context tree kernel is a better measure of topic similarity between documents, and the performance of text clustering is greatly improved.
  • loading
  • Srivastava A N and Sahami M. Text Mining: Classification,Clustering, and Applications[M]. Boca Raton: Chapman and Hall, 2009: 1-25.[2]Cristianini N, Shawe-Taylor J, and Lodhi H. Latent semantic kernels[J].Journal of Intelligent Information Systems.2002, 18(2/3):127-152[3]Nyffenegger M.[J].Chappelier J C, and Gaussier. Revisiting Fisher kernels for document similarities[C]. 17th European Conference on Machine Learning, Berlin, Germany, September 18-2.2006,:-Lehmann A and Shawe-Taylor J. A probabilistic model for text kernels[C]. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, 2006: 537-544.[4]Cuturi M and Vert J P. The context-tree kernel for strings[J].Neural Networks.2005, 18(8):1111-1123[5]Yin Chuan-huan, Tian Sheng-feng, and Mu Shao-min, et al.. Efficient computations of gapped string kernels based on suffix kernel[J].Neurocomputing.2008, 71(4-6):944-962[6]Vert J P. Text categorization using adaptive context trees[C]. Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, February 18-24, 2001: 423-436.[7]Willems F M J, Shtarkov Y M, and Tjalkens T J. The context-tree weighting method: basic properties[J].IEEE Transactions on Information Theory.1995, 41(3):653-664[8]Vert J P. Adaptive context trees and text clustering[J].IEEE Transactions on Information Theory.2001, 47(5):1884-1901[9]李晓光, 于戈, 王大玲等. 基于信息论的潜在概念获取与文本聚类[J].软件学报.2008, 19(9):2276-2284Li Xiao-guang, Yu Ge, and Wang Da-ling, et al.. Latent concept extraction and text clustering based on information theory[J].Journal of Software.2008, 19(9):2276-2284[10]Hofmann T. Probabilistic Latent Semantic Analysis[C]. Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30-August 1, 1999: 289-296.[11]Phan Xuan-hieu.[J].Nguyen Le-minh, and Horiguchi Susumu. Learning to classify short and sparse text web with hidden topics from large-scale data collections[C]. Proceeding of the 17th International Conference on World Wide Web, Beijing, China, April 21-2.2008,:-[12]Pinto D and Rosso P. On the relative hardness of clustering corpora[C]. Proceedings of 10th International Conference on Text, Speech and Dialogue, Pilsen, Czech Republic, September 3-7, 2007: 155-161.[13]周昭涛. 文本聚类分析效果评价及文本表示研究[D]. [硕士论文], 中国科学院计算技术研究所, 2005.[14]Zhou Zhao-tao. Quality evaluation of text clustering results and investigation on text representation[D]. [MA. dissertation], Institute of Computing Technology, Chinese Academy of Sciences, 2005.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (3411) PDF downloads(849) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return