Jian Zhi-Hua, Wang Xiang-Wen. An Iterative Training Algorithm Based on Local Nearest Neighbor for Voice Conversion[J]. Journal of Electronics & Information Technology, 2012, 34(9): 2091-2096. doi: 10.3724/SP.J.1146.2012.00398
Citation:
Jian Zhi-Hua, Wang Xiang-Wen. An Iterative Training Algorithm Based on Local Nearest Neighbor for Voice Conversion[J]. Journal of Electronics & Information Technology, 2012, 34(9): 2091-2096. doi: 10.3724/SP.J.1146.2012.00398
Jian Zhi-Hua, Wang Xiang-Wen. An Iterative Training Algorithm Based on Local Nearest Neighbor for Voice Conversion[J]. Journal of Electronics & Information Technology, 2012, 34(9): 2091-2096. doi: 10.3724/SP.J.1146.2012.00398
Citation:
Jian Zhi-Hua, Wang Xiang-Wen. An Iterative Training Algorithm Based on Local Nearest Neighbor for Voice Conversion[J]. Journal of Electronics & Information Technology, 2012, 34(9): 2091-2096. doi: 10.3724/SP.J.1146.2012.00398
A novel algorithm named Iterative combination of a Local nearest Neighbor search step and a Conversion step Alignment (ILNCA), a modified version of the Iterative combination of a nearest Neighor search step and a Conversion step Alignment (INCA), is proposed for training voice conversion system under the situation of nonparallel corpus. Unlike INCA, ILNCA uses firstly Gaussian Mixture Model (GMM) to represent the spectral feature spaces of both source speaker and target speaker respectively, and then matches each individual Gaussian components of the GMM from source speaker to target speaker and vice versa according to Kullback-Leibler (KL) distance. Finally, ILNCA performs the frame alignment of phonetically equivalent acoustic vectors from source and target speaker in their mapped sub-spaces, not in the whole space like INCA. Both object and subject evaluations are conducted. The experimental results demonstrate that the approach can achieve better performance than INCA because of the accurate vector alignment.