Advanced Search
Volume 27 Issue 1
Jan.  2005
Turn off MathJax
Article Contents
Xie Lei, Fu Zhong-hua, Jiang Dong-mei, Zhao Rong-chun, Werner Verhelst, Hichem Sahli, Jan Conlenis. A Robust Dynamic Mouth Feature Based on Visemic LDA for Audio Visual Speech Recognition[J]. Journal of Electronics & Information Technology, 2005, 27(1): 64-68.
Citation: Xie Lei, Fu Zhong-hua, Jiang Dong-mei, Zhao Rong-chun, Werner Verhelst, Hichem Sahli, Jan Conlenis. A Robust Dynamic Mouth Feature Based on Visemic LDA for Audio Visual Speech Recognition[J]. Journal of Electronics & Information Technology, 2005, 27(1): 64-68.

A Robust Dynamic Mouth Feature Based on Visemic LDA for Audio Visual Speech Recognition

  • Received Date: 2003-07-11
  • Rev Recd Date: 2003-11-25
  • Publish Date: 2005-01-19
  • This paper presents a robust visual feature based on Visemic LDA for audio visual speech recognition, which captures dynamic lip contour information and reflects the viseme classes of visual speech. The paper also introduces an automatic labeling method using the speech recognition results for LDA training data, which avoids the tedious manually labeling work and labeling errors. Experimental results show that the audio visual speech recognition system based on the visual features presented in this paper can greatly increase the speech recognition rate in noisy conditions. The combination of the visual feature with multi-stream HMM can bring the recognition rate of over 80% at a 10dB SNR noisy condition.
  • loading
  • Potamianos G, Neti C, et al.. Recent advances in the automatic recognition of audiovisual speech[J].Proc. IEEE.2003, 91(9):1306-[2]Cootes T F, Taylor C J, et al.. Active shape models-their training and application. Computer Vision and Image Understanding,1995, 12(1): 38 - 59.[3]Neti C, Potamianos G, Luettin J, et al.. Audio visual speech recognition. Final Workshop 2000 Report, Baltimore, USA, 2000:40 - 41.[4]Rao C R. Linear Statistical Inference and Its Applications. New York, John Wiley and Sons, 1965:122 - 128.[5]Young S J, Kershaw D, Odell J, Woodland P. The HTK Book.http:∥htk.eng.cam.ac.uk/docs/docs.shtml, 2002.[6]Dupont S, Luettin J. Audio-visual speech modeling for continuous speech recognition[J].IEEE Trans. on Multimedia.2000,2(3):141-
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (2657) PDF downloads(738) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return