Advanced Search
Volume 26 Issue 3
Mar.  2004
Turn off MathJax
Article Contents
Jiang Dong-mei, Xie Lei, Ilse Ravyse, Zhao Rong-chun, Hichem Sahli, Jan Cornelis. The Viseme Based Continuous Speech Recognition System for a Talking Head[J]. Journal of Electronics & Information Technology, 2004, 26(3): 375-381.
Citation: Jiang Dong-mei, Xie Lei, Ilse Ravyse, Zhao Rong-chun, Hichem Sahli, Jan Cornelis. The Viseme Based Continuous Speech Recognition System for a Talking Head[J]. Journal of Electronics & Information Technology, 2004, 26(3): 375-381.

The Viseme Based Continuous Speech Recognition System for a Talking Head

  • Received Date: 2002-07-25
  • Rev Recd Date: 2003-03-10
  • Publish Date: 2004-03-19
  • A continuous speech recognition system for a talking head is presented in this paper, which is based on the viseme (the basic speech unit in visual domain) HMMs and segments speech to mouth shape sequences with timing boundaries. The trisemes are for malized to consider the viseme contexts. Based on the 3D talking head images, the viseme similarity weight (VSW) is denned, and 166 visual questions are designed for the building of the triseme decision trees to tie the states of the trisemes with similar contexts, so that they can share the same parameters. For the system evaluation, besides the recognition rate, an image related measurement, the viseme similarity weighted accuracy accounts for the mismatches of the recognized viseme sequence with its reference, and jerky points in liprounding and VSW graphs help evaluate the smoothness of the resulting viseme image sequences. Results show that the viseme based speech recognition system gives smoother and more plausible mouth shapes.
  • loading
  • Petajan E D, Goldschen A J, Garcia O N. Continuous automatic speech recognition by lipreading,In Motion-Based Recognition, USA: Kluwer Academnic Publishers, 1997: 321-343.[2]Woodland P C, Young S J, Odell J. Tree-based state tying for high accuracy acoustic modelling.In Proc. ARPA Workshop on Human Language Technology, Plainsboro, NJ, USA, 1994: 307-312.[3]Kate R, Faruquie T A, Kapoor A. Audio driven facial animation for audio-visual reality. In Proc.International Conference on Multimedia and EXPO (ICME), Tokyo, Japan, 2001: 22-25.[4]Young S J. The HTK Hidden Markov Model Toolkit: Design and Philosophy, Technical Report,CUED, Cambridge University, 1994.[5]Young S J, Kershaw D, Odell J, Woodland P. The HTK Book (for HTK Version 3.0),Http://htk.eng.cam.ac.uk/docs/docs.shtml, 2000.[6]Ezzat T, Poggio T. MikeTalk: A talking facial display based on morphing visemes. In Proc.Computer Animation Conference, Philadelphia, USA, 1998: 456-459.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (2632) PDF downloads(615) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return