Advanced Search
Volume 27 Issue 7
Jul.  2005
Turn off MathJax
Article Contents
Sheng Xiao-wei, Jiang Ming-hu. Automatic Classification of Chinese Documents Based on Rough Set and Improved Quick-Reduce Algorithm[J]. Journal of Electronics & Information Technology, 2005, 27(7): 1047-1052.
Citation: Sheng Xiao-wei, Jiang Ming-hu. Automatic Classification of Chinese Documents Based on Rough Set and Improved Quick-Reduce Algorithm[J]. Journal of Electronics & Information Technology, 2005, 27(7): 1047-1052.

Automatic Classification of Chinese Documents Based on Rough Set and Improved Quick-Reduce Algorithm

  • Received Date: 2004-02-19
  • Rev Recd Date: 2004-08-05
  • Publish Date: 2005-07-19
  • Much of the previous automatic Text Classification (TC) methods are closely connected with the construction of document vectors. With each term corresponding to a unit in the vector, this method maps the document vectors into a very high dimensional space, possibly of tens of thousands of dimension, which results in a massive amount of calculation. Since the traditional algorithms based on frequency and threshold filtering may often lead to the loss of effective information, this paper presents a new system for TC, which introduces rough set theory that can greatly reduce the document vector dimensions by reduction algorithm. The empirical results prove to be very successful, for it can not only effectively reduce the dimensional space, but also reach higher accuracy while losing less information compared with usual reduction methods.
  • loading
  • Salton G, Wong A, Yang C S. A vector space model for automatic indexing[J].Communications of the ACM.1975, 18(11):613-[2]Sebastiani F. Machine learning in automated text categorization[J].ACM Computing Surveys.2002, 34(1):1-47[3]Riloff E, Lehnert W. Information extraction as a basis for high-precision text classification[J].ACM Trans on Information Systems.1994, 12(3):296-[4]Zdzislaw Pawlak. Rough sets[J].International Journal of Computer and Information Sciences.1982, 11(5):341-[5]Zdzislaw Pawlak. Rough sets: Theoretical Aspects of Reasoning about Data. Dordrecht: Kluwer Academic Publishers, 1991:15 - 16, 69 - 80.[6]Chouchoulas A, Shen Q. A rough set-based approach to text classification. In Proceedings of the 7th International Workshop on Rough Sets, Yamaguchi, Japan, November 1999:118 - 127.[7]李滔等.一种基于粗糙集的网页分类方法.小型微型计算机系统,2003,24(3):520-523.[8]Maudal O. Preprocessing Data for Neural Network based Classifiers: Rough Sets vs. Principal Component Analysis.Project Report, Department of Artificial Intelligence, University of Edinburgh, 1996.[9]王国胤.Rough集理论与知识获取.西安:西安交通大学出版社,2001:133-146.[10]Wong S K M, Ziarko W. On optimal decision rules in decision tables. Bulletin, Polish Academy of Sciences, 1985, 33(11/12):693-696.[11]Skowron A, Rauszer C. The discernibility matrices and functions in information system. In Intelligent Decision Support Handbook of Applications and Advances of the Rough Sets Theory. Dordrecht: Kluwer Academic Publishers, 1992:331 - 362.[12]刘少辉,等.Rough集高效算法的研究.计算机学报,2003,26(5):524-529.[13]Schutze H.[J].Silverstein C. Projections for efficient document clustering. In Proceedings of ACM/SIGIR97, Conference on Research and Development in Information Retrieval,Philadelphia, USA.1997,:-
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (2239) PDF downloads(953) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return