基于词汇化模型的汉语句法分析

曹海龙; 赵铁军; 李生

doi:10.3724/SP.J.1146.2006.00119

基于词汇化模型的汉语句法分析

doi: 10.3724/SP.J.1146.2006.00119

基金项目:

国家自然科学基金(60373101)和国家863计划(2004AA117010-08)资助课题

计量
- 文章访问数: 3604
- HTML全文浏览量: 120
- PDF下载量: 760
- 被引次数: 1
出版历程
- 收稿日期: 2006-01-23
- 修回日期: 2006-07-20
- 刊出日期: 2007-09-19

Parsing Chinese Based on Lexicalized Model

摘要

摘要: 该文以处理大规模真实文本为目标，把句法分析分解为分词/词性标注、短语识别两个部分。首先提出了一个一体化的分词/词性标注方法，该方法在隐马尔科夫模型(HMM)的基础上引入词汇信息，既保留了HMM简单快速的特点，又有效提高了标注精度；然后应用中心驱动模型进行短语识别，这是一个词汇化的英文句法分析模型，该文将其同分词/词性标注模型结合进行汉语句法分析。在公共的测试集上对句法分析器的性能进行了评价，精确率和召回率分别为77.57%和74.96%，这一结果要明显好于目前唯一可比的工作。
- 句法分析;隐马尔科夫模型;中心驱动模型;结构模式识别
Abstract: In order to process large-scale real text, a method of building Chinese parser based on lexicalized model is proposed. First, a unified approach for segmentation and part of speech tagging is proposed based on hidden Markov model. The method not only conservers the merits of HMM which is simple and efficient but also improves the tagging accuracy. Then the head-driven model is used to recognize phrases. Head-driven model is a well-known English parsing model; we combine it with segmentation and POS tagging model and thus build a Chinese parser that can operate at the character level. The parser is evaluated on the standard test set. It achieves 77.57% precision and 74.96% recall and outperforms the only previous comparable work significantly.

HTML全文

参考文献(1)

Xue Nianwen, Xia Fei, and Chiou Fudong, et al.. The Penn Chinese treebank: Phrase structure annotation of a large corpus. Natural Language Engineering, 2004(4): 1-30.[2]Collins Michael. Head-driven statistical models for natural language parsing. [Ph.D. thesis], University of Pennsylvania, 1999.[3]Fung Pascale, Ngai Grace, and Yang Yongsheng, et al.. A maximum-entropy chinese parser augmented by transformation-based[J].learning. ACM Trans. on Asian Language Processing.2004, 3(2):159-168[4]Bikel Daniel and Chiang David. Two statistical parsing models applied to Chinese treebank. Proceedings of the 2nd Chinese language processing workshop, Hong Kong, 2000: 1-6.[5]Chiang David and Bikel Daniel. Recovering latent information in treebanks. Proceedings of the 19th International Conference on Computational Linguistics, Taipei, 2002: 183-189.[6]Levy Roger and Manning Christopher. Is it harder to parse Chinese, or the Chinese treebank? Proceedings of Annual Meeting of the Association for Computational Linguistics, Sapporo, 2003: 439-446.[7]Hearne Mary and Way Andy. Data-oriented parsing and the Penn Chinese treebank. Proceedings of the First International Joint Conference Natural language processing, Hainan Island, 2004: 406-413.[8]Xiong Deyi, Li Shuanglong, and Liu Qun et al.. Parsing the Penn Chinese treebank with semantic knowledge. Proceedings of the Second International Joint Conference Natural language processing, Jeju Island, 2005: 70-81.[9]Xia Fei. Automatic grammar generation from two different perspectives. [Ph.D. thesis], University of Pennsylvania, 1999.[10]Luo Xiaoqiang. A maximum entropy Chinese character-based parser. Proceedings of the conference on Empirical methods in Natural Language Processing, Barcelona, 2003: 192-199.

施引文献