Visual Speech Synthesis Algorithm Based on Chinese Visual Triphone

Zhao Hui; Tang Chao-jing

doi:10.3724/SP.J.1146.2008.01634

Volume 31 Issue 12

Dec. 2010

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2009 > 31(12): 3010-3014

Zhao Hui, Tang Chao-jing. Visual Speech Synthesis Algorithm Based on Chinese Visual Triphone[J]. Journal of Electronics & Information Technology, 2009, 31(12): 3010-3014. doi: 10.3724/SP.J.1146.2008.01634

Citation:

Zhao Hui, Tang Chao-jing. Visual Speech Synthesis Algorithm Based on Chinese Visual Triphone[J]. Journal of Electronics & Information Technology, 2009, 31(12): 3010-3014. doi: 10.3724/SP.J.1146.2008.01634

Citation:

PDF( 341 KB)

Visual Speech Synthesis Algorithm Based on Chinese Visual Triphone

doi: 10.3724/SP.J.1146.2008.01634

Received Date: 2008-12-05
Rev Recd Date: 2009-06-19
Publish Date: 2009-12-19

Abstract

Abstract

In order to synthesize real video sequence, a visual speech synthesis algorithm based on Chinese visual triphone is proposed. According to Chinese pronunciation principle and the relationship between phoneme and viseme, conception of visual triphone is presented. Hidden Markov Model(HMM) is established based on visual triphones. In the training stage, combined features including visual features and audio features are used. In the synthesis stage, sentence HMM is constructed by concatenating triphone HMMs, from which the feature parameters are extracted. From the result of subjective and objective evaluation, the synthesized video is real and satisfied.

FullText(HTML)

References(1)

References

]. Mini-Micro-System, 2002,23(4): 474-477.[5]Masuko T, Kobayashi T, and Tamura M, et al.. Text-tovisualspeech synthesis based on parameter generation fromHMM[C]. IEEE International Conference on Acoustics,Speech and Signal Processing, Seattle, USA, 1998, 6:3745-3748.[6]Jiang Jin-tao, Aronoff J M, and Bernstein L E. Developmentof a visual speech synthesizer via second-orderisomorphism[C]. IEEE International Conference onAcoustics, Speech and Signal Processing, Las Vegas, USA,2008: 4677-4680.[7]Zhou Wei and Wang Zeng-fu. Speech animation based onChinese mandarin triphone model. 6th IEEE/ACISInternational Conference on Computer and InformationScience, Melbourne, Australia, July 2007: 924-929.[8]吴华, 徐波, 黄泰翼. 基于三音素模型的语料自动选取算法[J]. 软件学报, 2000, 11(2): 271-276.Wu Hua, Xu Bo, and Huang Tai-yi. Automatic corpusselecting algorithm based on triphone models[J]. Journal ofSoftware, 2000, 11(2): 271-276.[9]Zhao Hui and Tang Chao-jing. Visual speech synthesis basedon Chinese dynamic visemes[C]. IEEE InternationalConference on Information and Automation, Zhangjiajie,China, June, 2008: 139-14.

Summerfield Q. Use of visual information in phoneticperception[J].Phonetic.1979, 36(4/5):314-331[2]McGurk H and Macdonald J. Hearing lips and seeingvoices[J].Nature.1976, 264(5588):746-748[3]Perng Woei-luen, Wu Yung-kang, and Ming Ouh-young.Image talk: a real time synthetic talking head using one singleimage with Chinese text-to-speech capability[C]. SixthPacific Conference on Computer Graphics and Applications,Singapore, 1998: 140-148.[4]王志明, 蔡莲红, 吴志勇. 汉语文本-可视语音转换的研究[J].小型微型计算机系统, 2002, 23(4): 474-477.Wang Zhi-ming, Cai Lian-hong, and Wu Zhi-yong. Study oftext to visual speech.

Relative Articles

Supplements(0)

Cited By

Proportional views