高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于标题类别语义识别的文本分类算法研究

王强 关毅 王晓龙

王强, 关毅, 王晓龙. 基于标题类别语义识别的文本分类算法研究[J]. 电子与信息学报, 2007, 29(12): 2885-2890. doi: 10.3724/SP.J.1146.2006.00507
引用本文: 王强, 关毅, 王晓龙. 基于标题类别语义识别的文本分类算法研究[J]. 电子与信息学报, 2007, 29(12): 2885-2890. doi: 10.3724/SP.J.1146.2006.00507
Wang Qiang, Guan Yi, Wang Xiao-long. Applying Title Category Semantic Recognition for Text Categorization[J]. Journal of Electronics & Information Technology, 2007, 29(12): 2885-2890. doi: 10.3724/SP.J.1146.2006.00507
Citation: Wang Qiang, Guan Yi, Wang Xiao-long. Applying Title Category Semantic Recognition for Text Categorization[J]. Journal of Electronics & Information Technology, 2007, 29(12): 2885-2890. doi: 10.3724/SP.J.1146.2006.00507

基于标题类别语义识别的文本分类算法研究

doi: 10.3724/SP.J.1146.2006.00507
基金项目: 

国家自然科学基金(60435020,60504021)资助课题

Applying Title Category Semantic Recognition for Text Categorization

  • 摘要: 本文提出了一种基于标题类别语义识别的文本分类算法。算法利用基于类别信息的特征选择策略构造分类的特征空间,通过识别文本标题中的特征词的类别语义来预测文本的候选类别,最后在候选类别空间中用分类器执行分类操作。实验表明该算法在有效降低分类候选数目的基础上可显著提高文本分类的精度,通过对类别空间表示效率指标的验证,进一步表明该算法有效地提高了文本表示空间的性能。
  • Yiming Yang and Jan O P. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML97), San Francisco, USA, 1997: 412-420.[2]Rong Jin, Joyce Y C, and Luo Si. Learn to weight terms in information retrieval using category information. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 2005: 353-360.[3]Young joong Ko,Park Jinwoo, and Seo Jungyun. Automatic text categorization using the importance of sentences. In Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, 2002: 474-480.[4]Li Wei, Yuan Chunfa, Wong Kam-Fai, and Li Wenjie. Text similarity calculating based on critical sentence vector model. In Proceedings of the 20th International Conference on Computer Processing of Oriental Languages (ICCPOL2003), Shenyang, China, 2003: 424-430.[5]Zhan Xuegang, Yao Tianshun. The classification method for Chinese document title based on Chinese semantic analysis. In Proc of the Int'l Conf Chinese Information Processing, Beijing, Tsinghua University Press, 1998, 321-324.[6]林鸿飞. 基于示例的文本标题分类机制. 计算机研究与发展, 2001, 38(9): 1132-1136. in Hong-Fei. The mechanism of text title classification based on examples. Journal of Computer Research Development, 2001, 38(9):1132-1136.[7]张加民. 标题预示性的元功能视角. 外语教学, 2004, 25(6): 36-39. hang Jia-ming. The meta-function research on title's prediction. Foreign Language Education, 2004, 25(6): 36-39.[8]麻志毅,姚天顺. 基于情境的文本主题求解. 计算机研究与发展, 1998, 35(4): 344-348. a Zhi-yi, Yao Tian-shun. Calculating texts' topics based on situations. Journal of Computer Research Development, 1998, 35(4): 344-348.[9]刘云. 论篇名语言的标记性. 云梦学刊, 2003, 4: 104-107. iu Yun. On the markedness of title language. Journal of Yun Meng, 2003, 4: 104-107.[10]John C P. Probabilistic outputs for support vector machines and comparisons to regularized likelihood, methods. Advances in Large Margin Classifiers, 1999: 61-73.Tom Ault and Yang Yiming. KNN at TREC-9. In Proceedings of the Ninth Text REtrieval Conference (TREC-9). Maryland, USA, 1999: 127-134.[11]Franca Debole and Fabrizio Sebastiani. A analysis of the relative hardness of Reuters-21578 subsets: research articles[J].Journal of the American Society for Information Science and Technology.2005, 56(6):584-596
  • 加载中
计量
  • 文章访问数:  3766
  • HTML全文浏览量:  107
  • PDF下载量:  2368
  • 被引次数: 0
出版历程
  • 收稿日期:  2006-04-17
  • 修回日期:  2006-09-26
  • 刊出日期:  2007-12-19

目录

    /

    返回文章
    返回