基于混合线性变换的语声转换算法

简志华; 杨震

doi:10.3724/SP.J.1146.2006.00787

基于混合线性变换的语声转换算法

doi: 10.3724/SP.J.1146.2006.00787

简志华,
杨震

基金项目:

江苏省青蓝工程项目(QL003YZ)资助课题

计量
- 文章访问数: 3090
- HTML全文浏览量: 105
- PDF下载量: 762
- 被引次数: 0
出版历程
- 收稿日期: 2006-06-06
- 修回日期: 2006-10-30
- 刊出日期: 2007-07-19

An Algorithm for Voice Conversion Based on Mixtures of Linear Transformation

摘要

摘要: 针对在没有对称语音库的情况下，该文提出了一种基于混合线性变换的语声转换算法，在最大似然估计准则下，使用EM迭代算法计算变换函数的参量。为了减小线性加权对语音谱包络的平滑作用，使用线性调频Z变换来调节语音信号的LPC系数。客观评测和主观感受的实验结果都表明，基于混合线性变换的语声转换算法也可以取得与传统语声转换技术相当的转换效果，解除了传统语声转换技术需要对称语音库的要求。
- 语声转换;混合线性变换;最大期望算法;线性调频Z变换
Abstract: This paper proposes an algorithm for voice conversion based on mixtures of linear transformation which avoids the need for parallel training corpus inherent in conventional approaches. In maximum likelihood framework, the EM algorithm is used to compute the parameters of the transfer function. And the chirp Z-transform is utilized to enhance the smoothed spectral envelop due to the linear weighted averaging. The proposed voice conversion system is evaluated using both objective and subjective measures. The experiment results demonstrate that the proposed approach is capable of effectively transforming speaker identity and can achieve comparable results of the conventional methods where a parallel corpus is needed.

HTML全文

参考文献(1)

Childers D G, Wu K, and Hicks D M, et al.. Voice conversion[J].Speech Communication.1989, 8(2):147-158[2]Abe M, Nakamura S, Shikano K, and Kuwabara H. Voice conversion through vector quantization. IEEE Proceedings of ICASSP, New York, USA, Apr. 11-14, 1988: 565-568.[3]Arslan L M. Speaker transformation algorithm using segmental codebooks[J].Speech Communication.1999, 28(3):211-226[4]Narendranath M, Murthy H A, and Rajendran S, et al.. Transformation of formants for voice conversion using artificial neural networks[J].Speech Communication.1995, 16(2):207-216[5]Iwahashi N and Sagisaka Y. Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks[J].Speech Communication.1995, 16(2):139-151[6]Stylianou Y, Cappe O, and Moulines E. Continuous Probabilistic Transform for Voice Conversion[J].IEEE Trans on Speech and Audio Processing.1998, 6(2):131-142[7]Kain A and Macon M W. Spectral voice conversion for text-to-speech synthesis. IEEE Proceedings of ICASSP, Seattle, USA, May 12-15, 1998: 285-288.[8]Smits R and Yegnanarayana B. Determination of instants of significant excitation in speech using group delay function[J].IEEE Trans. on Speech and Audio Processing.1995, 3(5):325-333[9]Diakoloukas V D and Digalakis V V. Maximum likelihood stochastic transformation adaptation of hidden Markov models[J].IEEE Trans. on Speech and Audio Processing.1999, 7(2):177-187[10]Wang T T. The segmented chirp z-transform and its application in spectrum analysis[J].IEEE Trans. on Instrumentation and Measurement.1990, 39(2):318-324[11]Rao K S and Yegnanarayana B. Prosody modification using instants of significant excitation[J].IEEE Trans. on Audio, Speech and Language.2006, 14(3):972-980[12]Hasan M M, Nasr A M, and Sultana S. An approach to voice conversion using feature statistical mapping[J].Applied Acoustics.2005, 66(5):513-532

施引文献

资源附件(0)

访问统计