Advanced Search
Volume 31 Issue 12
Dec.  2010
Turn off MathJax
Article Contents
Zhao Hui, Tang Chao-jing. Visual Speech Synthesis Algorithm Based on Chinese Visual Triphone[J]. Journal of Electronics & Information Technology, 2009, 31(12): 3010-3014. doi: 10.3724/SP.J.1146.2008.01634
Citation: Zhao Hui, Tang Chao-jing. Visual Speech Synthesis Algorithm Based on Chinese Visual Triphone[J]. Journal of Electronics & Information Technology, 2009, 31(12): 3010-3014. doi: 10.3724/SP.J.1146.2008.01634

Visual Speech Synthesis Algorithm Based on Chinese Visual Triphone

doi: 10.3724/SP.J.1146.2008.01634
  • Received Date: 2008-12-05
  • Rev Recd Date: 2009-06-19
  • Publish Date: 2009-12-19
  • In order to synthesize real video sequence, a visual speech synthesis algorithm based on Chinese visual triphone is proposed. According to Chinese pronunciation principle and the relationship between phoneme and viseme, conception of visual triphone is presented. Hidden Markov Model(HMM) is established based on visual triphones. In the training stage, combined features including visual features and audio features are used. In the synthesis stage, sentence HMM is constructed by concatenating triphone HMMs, from which the feature parameters are extracted. From the result of subjective and objective evaluation, the synthesized video is real and satisfied.
  • loading
  • ]. Mini-Micro-System, 2002,23(4): 474-477.[5]Masuko T, Kobayashi T, and Tamura M, et al.. Text-tovisualspeech synthesis based on parameter generation fromHMM[C]. IEEE International Conference on Acoustics,Speech and Signal Processing, Seattle, USA, 1998, 6:3745-3748.[6]Jiang Jin-tao, Aronoff J M, and Bernstein L E. Developmentof a visual speech synthesizer via second-orderisomorphism[C]. IEEE International Conference onAcoustics, Speech and Signal Processing, Las Vegas, USA,2008: 4677-4680.[7]Zhou Wei and Wang Zeng-fu. Speech animation based onChinese mandarin triphone model. 6th IEEE/ACISInternational Conference on Computer and InformationScience, Melbourne, Australia, July 2007: 924-929.[8]吴华, 徐波, 黄泰翼. 基于三音素模型的语料自动选取算法[J]. 软件学报, 2000, 11(2): 271-276.Wu Hua, Xu Bo, and Huang Tai-yi. Automatic corpusselecting algorithm based on triphone models[J]. Journal ofSoftware, 2000, 11(2): 271-276.[9]Zhao Hui and Tang Chao-jing. Visual speech synthesis basedon Chinese dynamic visemes[C]. IEEE InternationalConference on Information and Automation, Zhangjiajie,China, June, 2008: 139-14.

    Summerfield Q. Use of visual information in phoneticperception[J].Phonetic.1979, 36(4/5):314-331[2]McGurk H and Macdonald J. Hearing lips and seeingvoices[J].Nature.1976, 264(5588):746-748[3]Perng Woei-luen, Wu Yung-kang, and Ming Ouh-young.Image talk: a real time synthetic talking head using one singleimage with Chinese text-to-speech capability[C]. SixthPacific Conference on Computer Graphics and Applications,Singapore, 1998: 140-148.[4]王志明, 蔡莲红, 吴志勇. 汉语文本-可视语音转换的研究[J].小型微型计算机系统, 2002, 23(4): 474-477.Wang Zhi-ming, Cai Lian-hong, and Wu Zhi-yong. Study oftext to visual speech.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (3254) PDF downloads(825) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return