基于本征音子说话人子空间的说话人自适应算法

屈丹; 张文林

doi:10.11999/JEIT141264

基于本征音子说话人子空间的说话人自适应算法

doi: 10.11999/JEIT141264 cstr: 32379.14.JEIT141264

屈丹¹,
张文林¹

基金项目:

国家自然科学基金(61175017, 61302107和61403415)资助课题

计量
- 文章访问数: 1376
- HTML全文浏览量: 158
- PDF下载量: 515
- 被引次数: 0
出版历程
- 收稿日期: 2014-09-30
- 修回日期: 2014-12-29
- 刊出日期: 2015-06-19

Speaker Adaptation Method Based on Eigenphone Speaker Subspace for Speech Recognition

Qu Dan¹,
Zhang Wen-lin¹

摘要

摘要: 本征音子说话人自适应算法在自适应数据量充足时可以取得很好的自适应效果，但在自适应数据量不足时会出现严重的过拟合现象。为此该文提出一种基于本征音子说话人子空间的说话人自适应算法来克服这一问题。首先给出基于隐马尔可夫模型-高斯混合模型(HMM-GMM)的语音识别系统中本征音子说话人自适应的基本原理。其次通过引入说话人子空间对不同说话人的本征音子矩阵间的相关性信息进行建模；然后通过估计说话人相关坐标矢量得到一种新的本征音子说话人子空间自适应算法。最后将本征音子说话人子空间自适应算法与传统说话人子空间自适应算法进行了对比。基于微软语料库的汉语连续语音识别实验表明，与本征音子说话人自适应算法相比，该算法在自适应数据量极少时能大幅提升性能，较好地克服过拟合现象。与本征音自适应算法相比，该算法以较小的性能牺牲代价获得了更低的空间复杂度而更具实用性。
- 语音信号处理 /
- 说话人自适应 /
- 本征音子 /
- 本征音子说话人子空间 /
- 低秩约束 /
- 本征音
Abstract: The eigenphone speaker adaptation method performs well when the amount of adaptation data is sufficient. However, it suffers from severe over-fitting when insufficient amount of adaptation data is provided. A speaker adaptation method based on eigenphone speaker subspace is proposed to overcome this problem. Firstly, a brief overview of the eigenphone speaker adaptation method is presented in case of Hidden Markov Model-Gaussian Mixture Model (HMM-GMM) based speech recognition system. Secondly, speaker subspace is introduced to model the inter-speaker correlation information among different speakers eigenphones. Thirdly, a new speaker adaptation method based on eigenphone speaker subspace is derived from estimation of a speaker dependent coordinate vector for each speaker. Finally, a comparison between the new method and traditional speaker subspace based method is discussed in detail. Experimental results on a Mandarin Chinese continuous speech recognition task show that compared with original eigenphone speaker adaptation method, the performance of the eigenphone speaker subspace method can be improved significantly when insufficient amount of adaptation data is provided. Compared with eigenvoice method, eigenphone speaker subspace method can save a great amount of storage space only at the expense of minor performance degradation.
- Speech signal processing /
- Speaker adaptation /
- Eigenphone /
- Eigenphones speaker subspace /
- Low-rank constraint /
- Eigenvoice

HTML全文

参考文献(16)

Zhang Wen-lin, Zhang Wei-qiang, Li Bi-cheng, et al.. Bayesian speaker adaptation based on a new hierarchical probabilistic model[J]. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20(7): 2002-2015.

Solomonoff A, Campbell W M, and Boardman I. Advances in channel compensation for SVM speaker recognition[C]. Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, United States, 2005: 629-632.

Kumar D S P, Prasad N V, Joshi V, et al.. Modified splice and its extension to non-stereo data for noise robust speech recognition[C]. Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop(ASRU), Olomouc, Czech Republic, 2013: 174-179.

Ghalehjegh S H and Rose R C. Two-stage speaker adaptation in subspace Gaussian mixture models[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Florence, Italy, 2014: 6374-6378.

Wang Y Q and Gale M J F. Tandem system adaptation using multiple linear feature transforms[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Vancouver, Canada, 2013: 7932-7936.

Kenny P, Boulianne G, and Dumouchel P. Eigenvoice modeling with sparse training data[J]. IEEE Transactions on Speech and Audio Processing, 2005, 13(3): 345-354.

Kenny P, Boulianne G, Dumouchel P, et al.. Speaker adaptation using an eigenphone basis[J]. IEEE Transaction on Speech and Audio Processing, 2004, 12(6): 579-589.

Zhang Wen-lin, Zhang Wei-qiang, and Li Bi-cheng. Speaker adaptation based on speaker-dependent eigenphone estimation[C]. Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop(ASRU), Hawaii, United States, 2011: 48-52.

张文林, 张连海, 陈琦, 等. 语音识别中基于低秩约束的本征音子说话人自适应方法[J]. 电子与信息学报, 2014, 36(4): 981-987.

Zhang Wen-lin, Zhang Lian-hai, Chen Qi, et al.. Low-rank constraint eigenphone speaker adaptation method for speech recognition[J]. Journal of Electronics Information Technology, 2014, 36(4): 981-987.

Zhang Wen-lin, Qu Dan, and Zhang Wei-qiang. Speaker adaptation based on sparse and low-rank eigenphone matrix estimation[C]. Proceedings of Annual Conference on International Speech Communication Association (INTERSPEECH), Singapore, 2014: 2972-2976.

Wang N, Lee S, Seide F, et al.. Rapid speaker adaptation using a priori knowledge by eigenspace analysis of MLLR parameters[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Salt Lake City, United States, 2001: 345-348.

Povey D and Yao K. A basis representation of constrained MLLR transforms for Robust adaptation[J]. Computer Speech and Language, 2012, 26(1): 35-51.

Miao Y, Metze F, and Waibel A. Learning discriminative basis coefficients for eigenspace MLLR unsupervised adaptation[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Vancouver, Canada, 2013: 7927-7931.

Saz O and Hain T. Using contextual information in joint factor eigenspace MLLR for speech recognition in diverse scenarios[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Florence, Italy, 2014: 6364-6368.

Chang E, Shi Y, Zhou J, et al.. Speech lab in a box: a Mandarin speech toolbox to jumpstart speech related research[C]. Proceedings of 7th?European Conference on Speech Communication and Technology(Eurospeech), Aalborg, Denmark, 2001: 2799-2802.

施引文献

资源附件(0)

访问统计