电话语音识别中基于统计模型的动态通道
Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone
-
摘要: 与桌面环境相比,电话网络环境下的语音识别率仍然还比较低,为了推动电话语音识别在实际中的应用,提高其识别率成了当务之急.先前的研究表明,电话语音识别率明显下降通常是因为测试和训练环境的电话通道不同引起数据失配造成的,因此该文提出基于统计模型的动态通道补偿算法(SMDC)减少它们之间的差异,采用贝叶斯估计算法动态地跟踪电话通道的时变特性.实验结果表明,大词汇量连续语音识别的字误识率(CER)相对降低约27%,孤立词的词误识率(WER)相对降低约30%.同时,算法的结构时延和计算复杂度也比较小.平均时延约200ms.可以很好地嵌入到实际电话语音识别应用中.Abstract: Automatic speech recognition in telecommunications environment still has a lower correct rate compared to its desktop pairs. Improving the performance of telephone-quality speech recognition is an urgent problem for its application in those practical fields. Previous works have shown that the main reason for this performance degradation is the varational mismatch caused by different telephone channels between the testing and train-ing sets. In this paper, they propose an efficient implementation to dynamically compen-sate this mismatch based on a phone-conditioned prior statistic model for the channel bias. This algorithm uses Bayes rule to estimate telephone channels and dynamically follows the time-variations within the channels. In their experiments on mandarin Large Vocabulary Continuous Speech Recognition (LVCSR) over telephone lines, the average Character Error Rate (CER) decreases more than 27% when applying this algorithm; in short utterance test, the Vord-Error-Rate(VER) relatively reduced 30%. At the same time, the structural delay and computational consumptions required by this algorithm are limited. The average delay is about 200 ins. So it could be embedded into practical telephone-based applications.
-
Moreno P J, Siegler M A, Jain U, Stern R. M. Continuous speech recognition of large vocabulary telephone quality speech. Proc. of the Eighth Spoken Language Systems Technology Workshop,Austin, Texas, 1995.[2]Besacier L, Grassi S, Dufaux A, Ansorge M, Pellandini F. GSM speech coding and speaker recognition. Proc. of ICASSP 2000, Istanbul, Turkey, June 2000: 1085-1088.[3]Huerta J M. Speech recognition in mobile environments. [Ph.D. Thesis]: School of Computer Science, Carnegie Mellon University, Apr. 2000.[4]Hermansky H, Morgan N. RASTA processing of speech[J].IEEE Trans. on Speech and Audio Processing.1994, 2(4):578-589[5]Rahim M G, Juang Biing-Hwang. Signal bias removal by maximum likelihood estimation for robust telephone speech recognition[J].IEEE Trans. on Speech and Audio Processing.1996, 4(1):19-30[6]Sankar Ananth, Lee Chin-Hui. A maximum-likelihood approach to stochastic matching for robust speech recognition[J].IEEE Trans. on Speech and Audio Processing.1996, 4(3):190-202[7]Moreno P J. Speech recognition in noisy environments. [Ph.D. Thesis]: School of Computer Science, Carnegie Mellon University, April 22, 1996.[8]Westphal M. The use of cepstral means in conversational speech recognition. Proc. of Eurospeech 97, Greece, 1997: 1143-1146.[9]Chien Jen-Tzung.[J].Wang Hsiao-Chuan, Lee Lee-Min. Estimation of channel bias for telephone speech recognition. In Proc. ICSLP96, Philadelphia USA.1996,:-Veth J D.[J].Boves L. Comparison of channel normalization techniques for automatic speech recognition over the phone. In Proc. ICSLP96, Philadelphia USA.1996,:-
计量
- 文章访问数: 2313
- HTML全文浏览量: 113
- PDF下载量: 619
- 被引次数: 0