Abstract:
The fundamental frequency contour (F0 contour) for utterance in rule-based speech synthesis system, is shaped by many functional unit in phonetics, not only the simple concatenation of F0 contour among the nearby syllables. In order to improve the naturalness of synthesized speech, this paper proposes a new forward idea of F0 contour optimization in Chinese prosodic chunk, which can integrate the environmental factors (such as, the stress, the distortion of syllable, the articulation velocity, etc.) into the F0 contour. And based on the idea of optimization, this paper inversely extracts the parameters associated with optimization (namely the top-line, the bottom-line, the smoothness, the distortion, the stress) from the clustered F0 contour using the MMSE principle for the monosyllable, the disyllable, the trisyllable chunks. Further, this paper analyzes the influence of position and tone to the parameters associated with optimization. The analyzed result shows the reliability of the extracted parameters and the rationality of the optimization theory on the whole, so the rules of the parameters associated with optimization can be got for the different prosodic chunk in speech synthesis system. The actual listening test shows that, the scores of intelligibility are 3.25 and 3.35 before and after the optimization, and the scores of naturalness are 2.9 and 3.31.