Advanced Search
Volume 23 Issue 11
Nov.  2001
Turn off MathJax
Article Contents
Wu Yingliang, Wei Gang, Li Haizhou. A WORD SEGMENTATION ALGORITHM FOR CHINESE LANGUAGE BASED ON N-GRAM MODELS AND MACHINE LEARNING[J]. Journal of Electronics & Information Technology, 2001, 23(11): 1148-1153.
Citation: Wu Yingliang, Wei Gang, Li Haizhou. A WORD SEGMENTATION ALGORITHM FOR CHINESE LANGUAGE BASED ON N-GRAM MODELS AND MACHINE LEARNING[J]. Journal of Electronics & Information Technology, 2001, 23(11): 1148-1153.

A WORD SEGMENTATION ALGORITHM FOR CHINESE LANGUAGE BASED ON N-GRAM MODELS AND MACHINE LEARNING

  • Received Date: 1999-09-29
  • Rev Recd Date: 2000-04-06
  • Publish Date: 2001-11-19
  • Automatic word segmentation for the Chinese language is a fundamental and difficult problem in the field of computer Chinese language information processing. This paper presents a new method for segmenting the input Chinese language text sentence into words, which consists of a character-based N-gram model and an efficient Viterbi search algorithm. In addition, two performance evaluation ration targets, i.e. Recall and Precision for word segmentation algorithm are discussed, The effectiveness has been confirmed by evaluation experiments using the closed texts and open texts corpus.
  • loading
  • 梁南元,汉语计算机自动分词知识,中文信息学报,1989,4(2),29-33.[2]王德春,应用语言学概论,上海,上海外语教育出版社,1997年12月第1版,88-120.[3]E. Charniak, C. Hendrickson, N. Jacoboson, M. Perkowitz, Equations for part-of speech tagging,AAAI-93, 1993, 784 789.[4]K. Church, A stochastic parts program and noun phrase parser for unrestricted text, ANLP-88,1998, 136-143.[5]S. Sakai, Morphological category bigram: A single language model for both spoken language and text, ISSD-93, 1993, 97-90.[6]M. Yamamoto, A re-estimation method for stochastic language modeling from ambigous obser-vations, in Proceeding of WVLC-96, California, 1996, 155-167.[7]赵以宝, 孙圣和, 一种基于单字统计二元文法的自组词音字转换算法,电子学报, 1998, 26(10), 55-58.[8]F. Jelinek, Self-Organized Language Modeling for Speech Recognition, IBM Research Report,IBM T, J. Watson Research Center, 1985. Reprinted in Reading in Speech Recognition, Waibel,A., and Lee, K-F. (Eds.), Morgan Kaufann Publishers, 1990, 450-506.[9]S.M. Katz, Estimation of probailities from sparse data for the language model component ofspeech recognizer, IEEE Trans. on Acousttics, Speech, and Signal Processing, 1987, ASSP-35(3),400-401.[10]R. Rosenfeld, The CMU statistical language modeling toolkit and its use in the 1994 ARPA CSR evaluation, In the Proc. of ARPA Spoken Language Systems Technology Workshop, Washington, 1995, 47-50.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (4072) PDF downloads(1288) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return