DBN Based Multi-stream Multi-states Model for Continue Audio-Visual Speech Recognition

Lv Guo-Yun; Jiang  Dong-Mei; Zhang  Yan-Ning; Zhao  Rong-Chun; H  Sahli; Ilse  Ravyse

doi:10.3724/SP.J.1146.2007.00915

Volume 30 Issue 12

Jan. 2011

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2008 > 30(12): 2906-2911

Lv Guo-Yun , Jiang Dong-Mei, Zhang Yan-Ning, Zhao Rong-Chun, H Sahli, Ilse Ravyse　. DBN Based Multi-stream Multi-states Model for Continue Audio-Visual Speech Recognition[J]. Journal of Electronics & Information Technology, 2008, 30(12): 2906-2911. doi: 10.3724/SP.J.1146.2007.00915

Citation:

Lv Guo-Yun , Jiang Dong-Mei, Zhang Yan-Ning, Zhao Rong-Chun, H Sahli, Ilse Ravyse　. DBN Based Multi-stream Multi-states Model for Continue Audio-Visual Speech Recognition[J]. Journal of Electronics & Information Technology, 2008, 30(12): 2906-2911. doi: 10.3724/SP.J.1146.2007.00915

Citation:

Lv Guo-Yun , Jiang Dong-Mei, Zhang Yan-Ning, Zhao Rong-Chun, H Sahli, Ilse Ravyse　. DBN Based Multi-stream Multi-states Model for Continue Audio-Visual Speech Recognition[J]. Journal of Electronics & Information Technology, 2008, 30(12): 2906-2911. doi: 10.3724/SP.J.1146.2007.00915

PDF( 260 KB)

DBN Based Multi-stream Multi-states Model for Continue Audio-Visual Speech Recognition

doi: 10.3724/SP.J.1146.2007.00915 cstr: 32379.14.SP.J.1146.2007.00915

Received Date: 2007-06-11
Rev Recd Date: 2007-11-27
Publish Date: 2008-12-19

Abstract

Abstract

Asynchrony of speech and lip motion is a key issue of multi-model fusion Audio-Visual Speech Recognition (AVSR). In this paper, a Multi-Stream Asynchrony Dynamic Bayesian Network (MS-ADBN) model is introduced, which looses the asynchrony of audio and visual streams to the word level, and both in audio stream and in visual stream, word-phone topology structure is used. However, Multi-stream Multi-states Asynchrony DBN (MM-ADBN) model is an augmentation of Multi-Stream DBN (MS-ADBN) model, is proposed for large vocabulary AVSR, which adopts word-phone-state topology structure in both audio stream and visual stream. In essential, MS-ADBN model is a word model, and while MM-ADBN model is a phone model whose recognition basic units are phones. The experiments are done on small vocabulary and large vocabulary audio-visual database, the results show that: for large vocabulary audio-visual database, comparing with MS-ADBN model and MSHMM, in clean speech environment, the improvements of 35.91 and 9.97% are obtained for MM-ADBN model respectively, which show the asynchrony description is important for AVSR systems.
- S,
- p,
- e

FullText(HTML)

References(1)

References

[1] Dupont S and Luettin J. Audio-visual speech modeling forcontinuous speech recognition[J].IEEE Trans. on Multimedia.2000, 2(3):141-151 [2] Potamianos G, and Neti C, et al.. Recent advances in theautomatic recognition of audiovisual speech[J].Proc. IEEE.2003, 91(9):1306-1326 [3] Nefian A, Liang L, and Pi X, et al.. Dynamic Bayesiannetworks for audio-visual speech recognition[J].EURASIP,Journal on Applied Signal Processing.2002, 2002(11):1274-1288 [4] Bilmes J and Zweig G. The graphical models toolkit: An opensource software system for speech and time-series processing.In Proc. IEEE Intl. Conf. Acoustics, Speech, and SignalProcessing, Orlando, USA, 2002, 4: 3916-3919. [5] Gowdy J N, Subramanya A, and Bartels C, et al.. DBN-basedmultistream models for audio-visual speech recognition. InProc. IEEE Int. Conf. Acoustics, Speech, and SignalProcessing, Philadelphia, USA, May 2004, 1: 993-996. [6] Bilmes J and Bartels C. Graphical model architectures forspeech recognition. IEEE Signal Processing Magazine, 2005,22(5): 89-100. [7] Ravyse Ilse, Jiang D M, and Jiang X Y, et al.. DBN basedmodels for audio-visual speech analysis and recognition. 2006Pacific-Rim Conference on Multimedia (PCM 2006),Hangzhou, China, Nov 2-4, 2006: 19-30. [8] L Guoyun, Jiang Dongmei, and Sahli H, et al.. A novel DBNmodel for large vocabulary continuous speech recognition andphone segmentation. International Conference on ArtificialIntelligence and Pattern Recognition (AIPR-07), Orlando,Florida, USA, July 2007: 397-402.

Relative Articles

Supplements(0)

Cited By

Proportional views