Advanced Search
Volume 26 Issue 11
Nov.  2004
Turn off MathJax
Article Contents
Han Zhao-bing, Zhang Hua-yun, Zhang Shu-wu, Xu Bo. Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone[J]. Journal of Electronics & Information Technology, 2004, 26(11): 1714-1720.
Citation: Han Zhao-bing, Zhang Hua-yun, Zhang Shu-wu, Xu Bo. Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone[J]. Journal of Electronics & Information Technology, 2004, 26(11): 1714-1720.

Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone

  • Received Date: 2003-06-12
  • Rev Recd Date: 2004-03-23
  • Publish Date: 2004-11-19
  • Automatic speech recognition in telecommunications environment still has a lower correct rate compared to its desktop pairs. Improving the performance of telephone-quality speech recognition is an urgent problem for its application in those practical fields. Previous works have shown that the main reason for this performance degradation is the varational mismatch caused by different telephone channels between the testing and train-ing sets. In this paper, they propose an efficient implementation to dynamically compen-sate this mismatch based on a phone-conditioned prior statistic model for the channel bias. This algorithm uses Bayes rule to estimate telephone channels and dynamically follows the time-variations within the channels. In their experiments on mandarin Large Vocabulary Continuous Speech Recognition (LVCSR) over telephone lines, the average Character Error Rate (CER) decreases more than 27% when applying this algorithm; in short utterance test, the Vord-Error-Rate(VER) relatively reduced 30%. At the same time, the structural delay and computational consumptions required by this algorithm are limited. The average delay is about 200 ins. So it could be embedded into practical telephone-based applications.
  • loading
  • Moreno P J, Siegler M A, Jain U, Stern R. M. Continuous speech recognition of large vocabulary telephone quality speech. Proc. of the Eighth Spoken Language Systems Technology Workshop,Austin, Texas, 1995.[2]Besacier L, Grassi S, Dufaux A, Ansorge M, Pellandini F. GSM speech coding and speaker recognition. Proc. of ICASSP 2000, Istanbul, Turkey, June 2000: 1085-1088.[3]Huerta J M. Speech recognition in mobile environments. [Ph.D. Thesis]: School of Computer Science, Carnegie Mellon University, Apr. 2000.[4]Hermansky H, Morgan N. RASTA processing of speech[J].IEEE Trans. on Speech and Audio Processing.1994, 2(4):578-589[5]Rahim M G, Juang Biing-Hwang. Signal bias removal by maximum likelihood estimation for robust telephone speech recognition[J].IEEE Trans. on Speech and Audio Processing.1996, 4(1):19-30[6]Sankar Ananth, Lee Chin-Hui. A maximum-likelihood approach to stochastic matching for robust speech recognition[J].IEEE Trans. on Speech and Audio Processing.1996, 4(3):190-202[7]Moreno P J. Speech recognition in noisy environments. [Ph.D. Thesis]: School of Computer Science, Carnegie Mellon University, April 22, 1996.[8]Westphal M. The use of cepstral means in conversational speech recognition. Proc. of Eurospeech 97, Greece, 1997: 1143-1146.[9]Chien Jen-Tzung.[J].Wang Hsiao-Chuan, Lee Lee-Min. Estimation of channel bias for telephone speech recognition. In Proc. ICSLP96, Philadelphia USA.1996,:-Veth J D.[J].Boves L. Comparison of channel normalization techniques for automatic speech recognition over the phone. In Proc. ICSLP96, Philadelphia USA.1996,:-
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (2313) PDF downloads(619) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return