Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone

Han Zhao-bing; Zhang Hua-yun; Zhang Shu-wu; Xu Bo

Volume 26 Issue 11

Nov. 2004

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2004 > 26(11): 1714-1720

Han Zhao-bing, Zhang Hua-yun, Zhang Shu-wu, Xu Bo. Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone[J]. Journal of Electronics & Information Technology, 2004, 26(11): 1714-1720.

Citation:

Han Zhao-bing, Zhang Hua-yun, Zhang Shu-wu, Xu Bo. Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone[J]. Journal of Electronics & Information Technology, 2004, 26(11): 1714-1720.

Citation:

PDF( 1640 KB)

Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone

Received Date: 2003-06-12
Rev Recd Date: 2004-03-23
Publish Date: 2004-11-19

Abstract

Abstract

Automatic speech recognition in telecommunications environment still has a lower correct rate compared to its desktop pairs. Improving the performance of telephone-quality speech recognition is an urgent problem for its application in those practical fields. Previous works have shown that the main reason for this performance degradation is the varational mismatch caused by different telephone channels between the testing and train-ing sets. In this paper, they propose an efficient implementation to dynamically compen-sate this mismatch based on a phone-conditioned prior statistic model for the channel bias. This algorithm uses Bayes rule to estimate telephone channels and dynamically follows the time-variations within the channels. In their experiments on mandarin Large Vocabulary Continuous Speech Recognition (LVCSR) over telephone lines, the average Character Error Rate (CER) decreases more than 27% when applying this algorithm; in short utterance test, the Vord-Error-Rate(VER) relatively reduced 30%. At the same time, the structural delay and computational consumptions required by this algorithm are limited. The average delay is about 200 ins. So it could be embedded into practical telephone-based applications.

FullText(HTML)

References(1)

References

Moreno P J, Siegler M A, Jain U, Stern R. M. Continuous speech recognition of large vocabulary telephone quality speech. Proc. of the Eighth Spoken Language Systems Technology Workshop,Austin, Texas, 1995.[2]Besacier L, Grassi S, Dufaux A, Ansorge M, Pellandini F. GSM speech coding and speaker recognition. Proc. of ICASSP 2000, Istanbul, Turkey, June 2000: 1085-1088.[3]Huerta J M. Speech recognition in mobile environments. [Ph.D. Thesis]: School of Computer Science, Carnegie Mellon University, Apr. 2000.[4]Hermansky H, Morgan N. RASTA processing of speech[J].IEEE Trans. on Speech and Audio Processing.1994, 2(4):578-589[5]Rahim M G, Juang Biing-Hwang. Signal bias removal by maximum likelihood estimation for robust telephone speech recognition[J].IEEE Trans. on Speech and Audio Processing.1996, 4(1):19-30[6]Sankar Ananth, Lee Chin-Hui. A maximum-likelihood approach to stochastic matching for robust speech recognition[J].IEEE Trans. on Speech and Audio Processing.1996, 4(3):190-202[7]Moreno P J. Speech recognition in noisy environments. [Ph.D. Thesis]: School of Computer Science, Carnegie Mellon University, April 22, 1996.[8]Westphal M. The use of cepstral means in conversational speech recognition. Proc. of Eurospeech 97, Greece, 1997: 1143-1146.[9]Chien Jen-Tzung.[J].Wang Hsiao-Chuan, Lee Lee-Min. Estimation of channel bias for telephone speech recognition. In Proc. ICSLP96, Philadelphia USA.1996,:-Veth J D.[J].Boves L. Comparison of channel normalization techniques for automatic speech recognition over the phone. In Proc. ICSLP96, Philadelphia USA.1996,:-

Relative Articles

Supplements(0)

Cited By

Proportional views