一种稳健的基于Visemic LDA的口形动态特征及听视觉语音识别

谢磊; 付中华; 蒋冬梅; 赵荣椿; WernerVerhelst; HichemSahli; JanConlenis

一种稳健的基于Visemic LDA的口形动态特征及听视觉语音识别

计量
- 文章访问数: 2668
- HTML全文浏览量: 106
- PDF下载量: 738
- 被引次数: 30
出版历程
- 收稿日期: 2003-07-11
- 修回日期: 2003-11-25
- 刊出日期: 2005-01-19

A Robust Dynamic Mouth Feature Based on Visemic LDA for Audio Visual Speech Recognition

摘要

摘要: 视觉特征提取是听视觉语音识别研究的热点问题。文章引入了一种稳健的基于Visemic LDA的口形动态特征,这种特征充分考虑了发音时口形轮廓的变化及视觉Viseme划分。文章同时提出了一利利用语音识别结果进行LDA训练数据自动标注的方法。这种方法免去了繁重的人工标注工作,避免了标注错误。实验表明,将VisemicLDA视觉特征引入到听视觉语音识别中,可以大大地提高噪声条件下语音识别系统的识别率;将这种视觉特征与多数据流HMM结合之后,在信噪比为10dB的强噪声情况下,识别率仍可以达到80％以上。
- 语音识别; 听视觉语音识别; ASM; LinearDiscriminantAnalysis（LDA）; Viseme
Abstract: This paper presents a robust visual feature based on Visemic LDA for audio visual speech recognition, which captures dynamic lip contour information and reflects the viseme classes of visual speech. The paper also introduces an automatic labeling method using the speech recognition results for LDA training data, which avoids the tedious manually labeling work and labeling errors. Experimental results show that the audio visual speech recognition system based on the visual features presented in this paper can greatly increase the speech recognition rate in noisy conditions. The combination of the visual feature with multi-stream HMM can bring the recognition rate of over 80% at a 10dB SNR noisy condition.

HTML全文

参考文献(1)

Potamianos G, Neti C, et al.. Recent advances in the automatic recognition of audiovisual speech[J].Proc. IEEE.2003, 91(9):1306-[2]Cootes T F, Taylor C J, et al.. Active shape models-their training and application. Computer Vision and Image Understanding,1995, 12(1): 38 - 59.[3]Neti C, Potamianos G, Luettin J, et al.. Audio visual speech recognition. Final Workshop 2000 Report, Baltimore, USA, 2000:40 - 41.[4]Rao C R. Linear Statistical Inference and Its Applications. New York, John Wiley and Sons, 1965:122 - 128.[5]Young S J, Kershaw D, Odell J, Woodland P. The HTK Book.http:∥htk.eng.cam.ac.uk/docs/docs.shtml, 2002.[6]Dupont S, Luettin J. Audio-visual speech modeling for continuous speech recognition[J].IEEE Trans. on Multimedia.2000,2(3):141-

施引文献

期刊类型引用(12)

1.	王广龙，田杰，朱文杰，方丹. 基于特征匹配的非刚性大位移光流算法. 北京理工大学学报. 2020(04): 421-426+440 . 百度学术
2.	刘晨，张龙波，王雷，卢海涛. 基于超像素重建的多尺度B样条医学图像配准. 智能计算机与应用. 2019(01): 24-27 . 百度学术
3.	何凯，闫佳星，魏颖，王阳. 基于改进光流场模型的非刚性图像配准. 天津大学学报(自然科学与工程技术版). 2018(05): 491-496 . 百度学术
4.	张静亚，李周雁. 有限元弹性配准中的驱动外力及其网格细化. 常熟理工学院学报. 2018(05): 73-79 . 百度学术
5.	华亮，程天宇，顾菊平，俞钶安. 基于ROI及Clifford代数相对不变量的3D医学图像配准. 图学学报. 2017(01): 90-96 . 百度学术
6.	杨飒，夏明华，郑志硕. 基于多项式确定性矩阵的SIFT医学图像配准算法. 激光与光电子学进展. 2016(08): 128-134 . 百度学术
7.	于荷峰，吕晓琪，黄显武，贾东征. 基于改进Demons算法的三维肺部医学影像配准研究. 计算机应用研究. 2016(04): 1269-1272 . 百度学术
8.	张静亚，王加俊. 一种改进的非刚性医学图像配准算法. 计算机应用研究. 2015(04): 1261-1264 . 百度学术
9.	杨飒，郑志硕. 基于稀疏随机投影的SIFT医学图像配准算法. 量子电子学报. 2015(03): 283-289 . 百度学术
10.	华亮，黄宇，丁立军，冯浩，顾菊平. Clifford代数空间上的三维多模医学图像配准. 光电工程. 2014(01): 65-72 . 百度学术
11.	李京娜，邓嘉兴，王刚. 卫星图像配准及匹配曲线特征评估法. 光电工程. 2014(03): 73-81 . 百度学术
12.	华亮，丁立军，黄宇，冯浩，顾菊平. Clifford代数几何不变量3D医学图像配准的方法. 计算机科学. 2014(06): 304-308 . 百度学术