高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于Rough集约简算法的中文文本自动分类系统

盛晓炜 江铭虎

盛晓炜, 江铭虎. 基于Rough集约简算法的中文文本自动分类系统[J]. 电子与信息学报, 2005, 27(7): 1047-1052.
引用本文: 盛晓炜, 江铭虎. 基于Rough集约简算法的中文文本自动分类系统[J]. 电子与信息学报, 2005, 27(7): 1047-1052.
Sheng Xiao-wei, Jiang Ming-hu. Automatic Classification of Chinese Documents Based on Rough Set and Improved Quick-Reduce Algorithm[J]. Journal of Electronics & Information Technology, 2005, 27(7): 1047-1052.
Citation: Sheng Xiao-wei, Jiang Ming-hu. Automatic Classification of Chinese Documents Based on Rough Set and Improved Quick-Reduce Algorithm[J]. Journal of Electronics & Information Technology, 2005, 27(7): 1047-1052.

基于Rough集约简算法的中文文本自动分类系统

Automatic Classification of Chinese Documents Based on Rough Set and Improved Quick-Reduce Algorithm

  • 摘要: 现有的文本自动分类离不开文档向量的构造,向量的分量与文档中的特征项相对应。这种向量通常高达几千维甚至数万维,计算量相当大,因此需要对向量进行约简。而传统的基于频率的阈值过滤法往往会导致有效信息的丢失,影响分类的准确度。该文将Rough集理论引入自动分类,并提出了一种新的文档向量约简算法。实验证明该算法不仅能有效缩减文档向量的规模,而且相比传统的阈值法信息损失小、准确率更高。
  • Salton G, Wong A, Yang C S. A vector space model for automatic indexing[J].Communications of the ACM.1975, 18(11):613-[2]Sebastiani F. Machine learning in automated text categorization[J].ACM Computing Surveys.2002, 34(1):1-47[3]Riloff E, Lehnert W. Information extraction as a basis for high-precision text classification[J].ACM Trans on Information Systems.1994, 12(3):296-[4]Zdzislaw Pawlak. Rough sets[J].International Journal of Computer and Information Sciences.1982, 11(5):341-[5]Zdzislaw Pawlak. Rough sets: Theoretical Aspects of Reasoning about Data. Dordrecht: Kluwer Academic Publishers, 1991:15 - 16, 69 - 80.[6]Chouchoulas A, Shen Q. A rough set-based approach to text classification. In Proceedings of the 7th International Workshop on Rough Sets, Yamaguchi, Japan, November 1999:118 - 127.[7]李滔等.一种基于粗糙集的网页分类方法.小型微型计算机系统,2003,24(3):520-523.[8]Maudal O. Preprocessing Data for Neural Network based Classifiers: Rough Sets vs. Principal Component Analysis.Project Report, Department of Artificial Intelligence, University of Edinburgh, 1996.[9]王国胤.Rough集理论与知识获取.西安:西安交通大学出版社,2001:133-146.[10]Wong S K M, Ziarko W. On optimal decision rules in decision tables. Bulletin, Polish Academy of Sciences, 1985, 33(11/12):693-696.[11]Skowron A, Rauszer C. The discernibility matrices and functions in information system. In Intelligent Decision Support Handbook of Applications and Advances of the Rough Sets Theory. Dordrecht: Kluwer Academic Publishers, 1992:331 - 362.[12]刘少辉,等.Rough集高效算法的研究.计算机学报,2003,26(5):524-529.[13]Schutze H.[J].Silverstein C. Projections for efficient document clustering. In Proceedings of ACM/SIGIR97, Conference on Research and Development in Information Retrieval,Philadelphia, USA.1997,:-
  • 加载中
计量
  • 文章访问数:  2239
  • HTML全文浏览量:  70
  • PDF下载量:  953
  • 被引次数: 0
出版历程
  • 收稿日期:  2004-02-19
  • 修回日期:  2004-08-05
  • 刊出日期:  2005-07-19

目录

    /

    返回文章
    返回