Research on Automatic Text Classification Based on a Hybrid Language Model

Zheng De-quan; Li Sheng; Zhao Tie-jun; Yu Hao

doi:10.3724/SP.J.1146.2005.01015

Volume 29 Issue 3

Jan. 2011

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2007 > 29(3): 601-605

Zheng De-quan, Li Sheng, Zhao Tie-jun, Yu Hao. Research on Automatic Text Classification Based on a Hybrid Language Model[J]. Journal of Electronics & Information Technology, 2007, 29(3): 601-605. doi: 10.3724/SP.J.1146.2005.01015

Citation:

Zheng De-quan, Li Sheng, Zhao Tie-jun, Yu Hao. Research on Automatic Text Classification Based on a Hybrid Language Model[J]. Journal of Electronics & Information Technology, 2007, 29(3): 601-605. doi: 10.3724/SP.J.1146.2005.01015

Citation:

PDF( 250 KB)

Research on Automatic Text Classification Based on a Hybrid Language Model

doi: 10.3724/SP.J.1146.2005.01015

Received Date: 2005-08-17
Rev Recd Date: 2006-01-11
Publish Date: 2007-03-19

Abstract

Abstract

With the volume of information available on the Internet and corporate intranets continues to increase, text classification has become one of the key technology in organizing and processing large amount of document data. This paper gives a novel method of Chinese text categorization based on a combination of ontology with statistical method. In this study, first, linguistic ontology knowledge bank will be respectively acquired by learning training corpus for various classes to determine the various categorizations. For a actual document, the evaluation value will respectively be gotten by various linguistic ontology knowledge bank and the categorization will be judged by the highest evaluation value. This method is compared with Bayes, k-nearest neighbor and support vector machine, The primary experimental results show that the method outperforms that previous work.

FullText(HTML)