基于Web的无指导译文消歧词模型与N-gram模型及对比研究

刘鹏远; 赵铁军

doi:10.3724/SP.J.1146.2008.01624

基于Web的无指导译文消歧词模型与N-gram模型及对比研究

doi: 10.3724/SP.J.1146.2008.01624 cstr: 32379.14.SP.J.1146.2008.01624

刘鹏远^{①;赵铁军},
赵铁军

基金项目:

国家重点基础研究发展计划(2004CB318102)资助课题

计量
- 文章访问数: 3193
- HTML全文浏览量: 113
- PDF下载量: 828
- 被引次数: 0
出版历程
- 收稿日期: 2008-12-05
- 修回日期: 2009-05-07
- 刊出日期: 2009-12-19

Comparison of Web-Based Unsupervised Translation Disambiguation Word Model and N-gram Model

Liu Peng-yuan^{①;赵铁军},
Zhao Tie-jun

摘要

摘要: 该文提出了基于Web的无指导译文消歧的词模型及N-gram模型方法，并在尽可能相同的条件下进行了比较。两种方法均利用搜索引擎统计不同搜索片段在Web上的Page Count作为主要消歧信息。词模型定义了汉语词汇与英语词汇之间的双语词汇Web相关度，根据汉语上下文词汇与英语译文之间的相关度进行消歧；N-gram模型首先假设不同语义下的多义词N-gram序列行为模式不同，从而可对多义词不同语义类下词汇在实例中的N-gram序列进行统计与分析以进行消歧。两个模型的性能均超过了在国际语义评测SemEval2007的task#5上可比较的最好无指导系统。对这两个模型进行试验对比可发现N-gram模型性能优于词模型，也表明组合两类模型的结果有进一步提升消歧性能的潜力。
- 计算语言学; 无指导译文消歧; 词模型; N-gram模型; Page Count; 双语词汇Web相关度
Abstract: This paper describes and compares web-based unsupervised translation disambiguation word model and N-gram model. For acquiring knowledge of disambiguation, both two models put differents queries to search engine and statistic page counts which it returned. Word model defines Web Bilingual Relatedness(WBR) between Chinese words and English words and disambiguates word sense by maxmizing Web Bilingual Relatedness between contexts and the translations of target word. Based on the hypothesis that the pattern of a polysemant is different while different sense of it is being used, N-gram model makes disambiguation by statisticing and analyzing N-grams of words in different semantic class of that polysemant. Both of the two models are evaluated on the SemEval2007 task#5, achieving the top performance against the state-of-the-art comparable unsupervised systems. Furthmore, N-gram model outperforms word model and the performence has potential for promotion when combine the results of that two class model.

HTML全文

参考文献(1)

Li Hang and Li Cong. Word translation disambiguation usingbilingual bootstrapping[J].Computational Linguistics.2004,30(1):1-22[2]Yarowsky D. Decision lists for lexical ambiguity resolution:Application to accent restoration in spanish and french.Proceedings of the 32nd Annual Meeting of the Associationfor Computational Linguistics, Las Cruces, New Mexico,1994: 88-95.[3]Niu Zheng-yu, Ji Dong-hong, Tan Chew lim, and PakhomovS. Word sense disambiguation using label propagation basedsemi-supervised larning. Proceedings of the 43th AnnualMeeting of the Association for Computational Linguistics(ACL), Morristown, NJ, USA July 2005: 395-402.[4]Gale W A, Church K W, and Yarowsky D. Using bilingualmaterials to develop word sense disambiguation methods.Proceedings of the International Conference on Theoreticaland Methodological Issues in Machine Translation, Montreal,Canada, 1992: 101-112.[5]Hwee Tou Ng, BinWang, and Yee Seng Chan. Exploitingparallel texts for word sense disambiguation: an empiricalstudy. Proceedings of the 41st Annual Meeting of theAssociation for Computational Linguistics, Sapporo, Japan,2003: 455-462.[6]Chodorow L M and Miller G A. Using corpus statistics andWordNet relations for sense identification. ComputationalLinguistics, 1998, 24(1): 147-165.[7]Mihalcea R. Bootstrapping large sense tagged corpora.Proceedings of the 3rd International Conference on LanguageResources and Evaluation (LREC), Las Palmas, Spain. 2002:1407-1411.[8]Agirre E and Martnez D. Unsupervised WSD based onautomatically retrieved examples: The importance of bias.Proceedings of the Conference on Empirical Methods in NLP.Barcelona, Spain, 2004: 25-32.[9]刘鹏远, 赵铁军, 杨沐昀, 李壮. 基于等价伪译词的无指导译文消歧模型研究[J].电子与信息学报.2008, 30(7):1690-1695浏览[10]Kilgarriff A and Grefenstette G. 2003. Introduction to thespecial issue on the web as corpus. ComputationalLinguistics,2003, 29(3): 333-348.[11]Martinez D, Agirre E and Wang Xing-long. Word relatives incontext for word sense disambiguation. Proceedings of the2006 Australasian Language Technology Workshop (ALTW2006), Sydney, Australia, 2006: 42-50.[12]Mihalcea R and Moldovan D I. Word sense disambiguationbased on Semantic Density. Proceedings of COLING-ACLWordshop on Usage of WordNet in Natural LanguageProcessing, Montreal, Canada, July 1998: 16-22.[13]Turney P D. Mining the Web for synonyms: PMI-IR versusLSA on TOEFL. Proceedings of the Twelfth EuropeanConference on Machine Learning, Berlin: Springer-Verlag,2001: 491-502.[14]Paolo Rosso, Manuel Montes-y-Gomez, Davide Buscaldi,Aaron Pancardo-Rodrguez, and Luis Villase.nor Pineda.Two Web-based approaches for noun sense disambiguation.Int. Conf. on Compute. Linguistics and Intelligent TextProcessing. CICLing-2005, Springer Verlag, LNCS (3406),Mexico D. F., Mexico, 2005: 261-273.[15]Yang Che-yu. Word sense disambiguation using semanticrelatedness measurement[J].Journal of Zhejiang UniversitySCIENCE A.2006, 7(10):1609-1625[16]Liu Peng-yuan, Zhao Tie-jun, and Yang Mu-yun. HIT-WSD:Using search engine for multilingual Chinese-English lexicalsample task. Proceedings of the 4th International Workshopon Semantic Evaluations (SemEval-2007), Prague, June 2007:169-172.[17]Mohammad S, Hirst G, and Resnik P. TOR, TORMD:Distribtional profiles of concepts for unsupervised word sensedisambiguation. Proceedings of the 4th InternationalWorkshop on Semantic Evaluations (SemEval-2007). Prague,June, zech Republic. Association for ComputationalLinguistics Conference. 2007: 326-333.[18]Gavin B, Wyatt J, Harris R, and Yao Xin. Diversity creationmethods: A survey and categorization. Information FusionJournal, 2004, (6): 5-20.[19]Pedersen T. A baseline methodology for word sensedisambiguation. Proceedings of the Third InternationalConference on Intelligent Text Processing andComputational Linguistics. Mexico City. February, 2002:17-23.

施引文献

资源附件(0)

访问统计