一种稳健的基于Visemic LDA的口形动态特征及听视觉语音识别
A Robust Dynamic Mouth Feature Based on Visemic LDA for Audio Visual Speech Recognition
-
摘要: 视觉特征提取是听视觉语音识别研究的热点问题。文章引入了一种稳健的基于Visemic LDA的口形动态特征,这种特征充分考虑了发音时口形轮廓的变化及视觉Viseme划分。文章同时提出了一利利用语音识别结果进行LDA训练数据自动标注的方法。这种方法免去了繁重的人工标注工作,避免了标注错误。实验表明,将VisemicLDA视觉特征引入到听视觉语音识别中,可以大大地提高噪声条件下语音识别系统的识别率;将这种视觉特征与多数据流HMM结合之后,在信噪比为10dB的强噪声情况下,识别率仍可以达到80%以上。Abstract: This paper presents a robust visual feature based on Visemic LDA for audio visual speech recognition, which captures dynamic lip contour information and reflects the viseme classes of visual speech. The paper also introduces an automatic labeling method using the speech recognition results for LDA training data, which avoids the tedious manually labeling work and labeling errors. Experimental results show that the audio visual speech recognition system based on the visual features presented in this paper can greatly increase the speech recognition rate in noisy conditions. The combination of the visual feature with multi-stream HMM can bring the recognition rate of over 80% at a 10dB SNR noisy condition.
-
Potamianos G, Neti C, et al.. Recent advances in the automatic recognition of audiovisual speech[J].Proc. IEEE.2003, 91(9):1306-[2]Cootes T F, Taylor C J, et al.. Active shape models-their training and application. Computer Vision and Image Understanding,1995, 12(1): 38 - 59.[3]Neti C, Potamianos G, Luettin J, et al.. Audio visual speech recognition. Final Workshop 2000 Report, Baltimore, USA, 2000:40 - 41.[4]Rao C R. Linear Statistical Inference and Its Applications. New York, John Wiley and Sons, 1965:122 - 128.[5]Young S J, Kershaw D, Odell J, Woodland P. The HTK Book.http:∥htk.eng.cam.ac.uk/docs/docs.shtml, 2002.[6]Dupont S, Luettin J. Audio-visual speech modeling for continuous speech recognition[J].IEEE Trans. on Multimedia.2000,2(3):141-
计量
- 文章访问数: 2657
- HTML全文浏览量: 103
- PDF下载量: 738
- 被引次数: 0