高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种面向隐含主题的上下文树核

徐超 周一民 沈磊

徐超, 周一民, 沈磊. 一种面向隐含主题的上下文树核[J]. 电子与信息学报, 2010, 32(11): 2695-2700. doi: 10.3724/SP.J.1146.2009.01493
引用本文: 徐超, 周一民, 沈磊. 一种面向隐含主题的上下文树核[J]. 电子与信息学报, 2010, 32(11): 2695-2700. doi: 10.3724/SP.J.1146.2009.01493
Xu Chao, Zhou Yi-Min, Shen Lei. A Context Tree Kernel Based on Latent Semantic Topic[J]. Journal of Electronics & Information Technology, 2010, 32(11): 2695-2700. doi: 10.3724/SP.J.1146.2009.01493
Citation: Xu Chao, Zhou Yi-Min, Shen Lei. A Context Tree Kernel Based on Latent Semantic Topic[J]. Journal of Electronics & Information Technology, 2010, 32(11): 2695-2700. doi: 10.3724/SP.J.1146.2009.01493

一种面向隐含主题的上下文树核

doi: 10.3724/SP.J.1146.2009.01493

A Context Tree Kernel Based on Latent Semantic Topic

  • 摘要: 该文针对上下文树核用于文本表示时缺乏语义信息的问题,提出了一种面向隐含主题的上下文树核构造方法。首先采用隐含狄利克雷分配将文本中的词语映射到隐含主题空间,然后以隐含主题为单位建立上下文树模型,最后利用模型间的互信息构造上下文树核。该方法以词的语义类别来定义文本的生成模型,解决了基于词的文本建模时所遇到的统计数据的稀疏性问题。在文本数据集上的聚类实验结果表明,文中提出的上下文树核能够更好地度量文本间主题的相似性,提高了文本聚类的性能。
  • Srivastava A N and Sahami M. Text Mining: Classification,Clustering, and Applications[M]. Boca Raton: Chapman and Hall, 2009: 1-25.[2]Cristianini N, Shawe-Taylor J, and Lodhi H. Latent semantic kernels[J].Journal of Intelligent Information Systems.2002, 18(2/3):127-152[3]Nyffenegger M.[J].Chappelier J C, and Gaussier. Revisiting Fisher kernels for document similarities[C]. 17th European Conference on Machine Learning, Berlin, Germany, September 18-2.2006,:-Lehmann A and Shawe-Taylor J. A probabilistic model for text kernels[C]. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, 2006: 537-544.[4]Cuturi M and Vert J P. The context-tree kernel for strings[J].Neural Networks.2005, 18(8):1111-1123[5]Yin Chuan-huan, Tian Sheng-feng, and Mu Shao-min, et al.. Efficient computations of gapped string kernels based on suffix kernel[J].Neurocomputing.2008, 71(4-6):944-962[6]Vert J P. Text categorization using adaptive context trees[C]. Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, February 18-24, 2001: 423-436.[7]Willems F M J, Shtarkov Y M, and Tjalkens T J. The context-tree weighting method: basic properties[J].IEEE Transactions on Information Theory.1995, 41(3):653-664[8]Vert J P. Adaptive context trees and text clustering[J].IEEE Transactions on Information Theory.2001, 47(5):1884-1901[9]李晓光, 于戈, 王大玲等. 基于信息论的潜在概念获取与文本聚类[J].软件学报.2008, 19(9):2276-2284Li Xiao-guang, Yu Ge, and Wang Da-ling, et al.. Latent concept extraction and text clustering based on information theory[J].Journal of Software.2008, 19(9):2276-2284[10]Hofmann T. Probabilistic Latent Semantic Analysis[C]. Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30-August 1, 1999: 289-296.[11]Phan Xuan-hieu.[J].Nguyen Le-minh, and Horiguchi Susumu. Learning to classify short and sparse text web with hidden topics from large-scale data collections[C]. Proceeding of the 17th International Conference on World Wide Web, Beijing, China, April 21-2.2008,:-[12]Pinto D and Rosso P. On the relative hardness of clustering corpora[C]. Proceedings of 10th International Conference on Text, Speech and Dialogue, Pilsen, Czech Republic, September 3-7, 2007: 155-161.[13]周昭涛. 文本聚类分析效果评价及文本表示研究[D]. [硕士论文], 中国科学院计算技术研究所, 2005.[14]Zhou Zhao-tao. Quality evaluation of text clustering results and investigation on text representation[D]. [MA. dissertation], Institute of Computing Technology, Chinese Academy of Sciences, 2005.
  • 加载中
计量
  • 文章访问数:  3382
  • HTML全文浏览量:  81
  • PDF下载量:  849
  • 被引次数: 0
出版历程
  • 收稿日期:  2009-11-20
  • 修回日期:  2010-03-09
  • 刊出日期:  2010-11-19

目录

    /

    返回文章
    返回