将词类信息融入三元文法统计模型的汉语音字转换方法
A KIND OF CHINESE TRANSITION METHOD FROM SPELLING TO CHARACTER TAKING INTO ACCOUNT POS INFORMATION IN A TRIGRAM-BASED STATISTICAL MODEL
-
摘要: 本文给出了一种将词类信息融入三元文法模型的汉语组合语言模型。理论分析和实验均表明:该模型不仅复杂度低于三元文法模型,而且对测试文本域的依赖性也优于前者。Abstract: A kind of Chinese combined language model,that takes into account POS(part of speech)information in a trigram-based statistical language model, is presented in this paper. The theoretical analysis and experiments all show that the model not only is lower than trigram model in PP(perplexity), but also is superior to trigram model in dependence on test text domain.
-
Cerf-Danon H, De Gennaro S, Ferretti M, Gonzalez J, Keppel E. Tangora-A large vocabulary speech recognition system for five language. EUROSPEECH91, Genova(Italy): Sep.24-26, 1991, vol.1, 183-192.[2]Katz S. Estimation of probabilistics from sparse data for the language model component of a speech recognizer. IEEE Trans.on Acoustics, Speech and Signal Processing, 1987, 34(3): 400-401.[3]Jelinek F, Mercer R L. Interpolated estimation of Markov source parameters from sparse data,[4]Pattern Recognition in Practice, E.L. Gelsema and L. N. Kanal, Eds., New York, North-Holland: 1980,381-397.[5]刘开瑛,郑家恒,赵军.语料库词类自动标注算法研究:机器翻译研究进展,北京:电子工业出版社,1992 378-386.[6]吴伯修,规绍升,祝宗泰,等.信息论与编码.北京:电子工业出版社,1986,5-13.
计量
- 文章访问数: 2070
- HTML全文浏览量: 67
- PDF下载量: 456
- 被引次数: 0