Xie Lei, Fu Zhong-hua, Jiang Dong-mei, Zhao Rong-chun, Werner Verhelst, Hichem Sahli, Jan Conlenis. A Robust Dynamic Mouth Feature Based on Visemic LDA for Audio Visual Speech Recognition[J]. Journal of Electronics & Information Technology, 2005, 27(1): 64-68.
Citation:
Xie Lei, Fu Zhong-hua, Jiang Dong-mei, Zhao Rong-chun, Werner Verhelst, Hichem Sahli, Jan Conlenis. A Robust Dynamic Mouth Feature Based on Visemic LDA for Audio Visual Speech Recognition[J]. Journal of Electronics & Information Technology, 2005, 27(1): 64-68.
Xie Lei, Fu Zhong-hua, Jiang Dong-mei, Zhao Rong-chun, Werner Verhelst, Hichem Sahli, Jan Conlenis. A Robust Dynamic Mouth Feature Based on Visemic LDA for Audio Visual Speech Recognition[J]. Journal of Electronics & Information Technology, 2005, 27(1): 64-68.
Citation:
Xie Lei, Fu Zhong-hua, Jiang Dong-mei, Zhao Rong-chun, Werner Verhelst, Hichem Sahli, Jan Conlenis. A Robust Dynamic Mouth Feature Based on Visemic LDA for Audio Visual Speech Recognition[J]. Journal of Electronics & Information Technology, 2005, 27(1): 64-68.
This paper presents a robust visual feature based on Visemic LDA for audio visual speech recognition, which captures dynamic lip contour information and reflects the viseme classes of visual speech. The paper also introduces an automatic labeling method using the speech recognition results for LDA training data, which avoids the tedious manually labeling work and labeling errors. Experimental results show that the audio visual speech recognition system based on the visual features presented in this paper can greatly increase the speech recognition rate in noisy conditions. The combination of the visual feature with multi-stream HMM can bring the recognition rate of over 80% at a 10dB SNR noisy condition.
Potamianos G, Neti C, et al.. Recent advances in the automatic recognition of audiovisual speech[J].Proc. IEEE.2003, 91(9):1306-[2]Cootes T F, Taylor C J, et al.. Active shape models-their training and application. Computer Vision and Image Understanding,1995, 12(1): 38 - 59.[3]Neti C, Potamianos G, Luettin J, et al.. Audio visual speech recognition. Final Workshop 2000 Report, Baltimore, USA, 2000:40 - 41.[4]Rao C R. Linear Statistical Inference and Its Applications. New York, John Wiley and Sons, 1965:122 - 128.[5]Young S J, Kershaw D, Odell J, Woodland P. The HTK Book.http:∥htk.eng.cam.ac.uk/docs/docs.shtml, 2002.[6]Dupont S, Luettin J. Audio-visual speech modeling for continuous speech recognition[J].IEEE Trans. on Multimedia.2000,2(3):141-