摘要:
汉语规则合成系统中,连续语流基频曲线(F0曲线)的生成并不是各个合成单元F0曲线的简单拼接,而是语音学功能单元的综合作用。该文提出了汉语韵律块基频曲线优化的正演思想,使韵律块内的基频曲线融入重音强度、形状失真度以及发音速度等语境和发音的个体信息,提高合成语音的自然度。基于这种优化思想,该文针对聚类后的单音节、二音节和三音节韵律块的基频曲线,利用最小均方误差准则通过反演提取了各个单元的优化相关参数(高音线、低音线、平滑因子、形状失真度、重音强度)。对音节在韵律块中的位置因素和声调因素对优化相关参数的影响的统计分析表明了参数提取结果的可靠性和基频曲线优化的合理性,得到了优化控制参数在规则合成系统中具体的应用规则。实际的听测实验表明,韵律块基频曲线进行优化前后,合成系统的清晰度分别为3.25和3.35,自然度分别为2.9和3.31。
Abstract:
The fundamental frequency contour (F0 contour) for utterance in rule-based speech synthesis system, is shaped by many functional unit in phonetics, not only the simple concatenation of F0 contour among the nearby syllables. In order to improve the naturalness of synthesized speech, this paper proposes a new forward idea of F0 contour optimization in Chinese prosodic chunk, which can integrate the environmental factors (such as, the stress, the distortion of syllable, the articulation velocity, etc.) into the F0 contour. And based on the idea of optimization, this paper inversely extracts the parameters associated with optimization (namely the top-line, the bottom-line, the smoothness, the distortion, the stress) from the clustered F0 contour using the MMSE principle for the monosyllable, the disyllable, the trisyllable chunks. Further, this paper analyzes the influence of position and tone to the parameters associated with optimization. The analyzed result shows the reliability of the extracted parameters and the rationality of the optimization theory on the whole, so the rules of the parameters associated with optimization can be got for the different prosodic chunk in speech synthesis system. The actual listening test shows that, the scores of intelligibility are 3.25 and 3.35 before and after the optimization, and the scores of naturalness are 2.9 and 3.31.