高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于本征音子说话人子空间的说话人自适应算法

屈丹 张文林

屈丹, 张文林. 基于本征音子说话人子空间的说话人自适应算法[J]. 电子与信息学报, 2015, 37(6): 1350-1356. doi: 10.11999/JEIT141264
引用本文: 屈丹, 张文林. 基于本征音子说话人子空间的说话人自适应算法[J]. 电子与信息学报, 2015, 37(6): 1350-1356. doi: 10.11999/JEIT141264
Qu Dan, Zhang Wen-lin. Speaker Adaptation Method Based on Eigenphone Speaker Subspace for Speech Recognition[J]. Journal of Electronics & Information Technology, 2015, 37(6): 1350-1356. doi: 10.11999/JEIT141264
Citation: Qu Dan, Zhang Wen-lin. Speaker Adaptation Method Based on Eigenphone Speaker Subspace for Speech Recognition[J]. Journal of Electronics & Information Technology, 2015, 37(6): 1350-1356. doi: 10.11999/JEIT141264

基于本征音子说话人子空间的说话人自适应算法

doi: 10.11999/JEIT141264
基金项目: 

国家自然科学基金(61175017, 61302107和61403415)资助课题

Speaker Adaptation Method Based on Eigenphone Speaker Subspace for Speech Recognition

  • 摘要: 本征音子说话人自适应算法在自适应数据量充足时可以取得很好的自适应效果,但在自适应数据量不足时会出现严重的过拟合现象。为此该文提出一种基于本征音子说话人子空间的说话人自适应算法来克服这一问题。首先给出基于隐马尔可夫模型-高斯混合模型(HMM-GMM)的语音识别系统中本征音子说话人自适应的基本原理。其次通过引入说话人子空间对不同说话人的本征音子矩阵间的相关性信息进行建模;然后通过估计说话人相关坐标矢量得到一种新的本征音子说话人子空间自适应算法。最后将本征音子说话人子空间自适应算法与传统说话人子空间自适应算法进行了对比。基于微软语料库的汉语连续语音识别实验表明,与本征音子说话人自适应算法相比,该算法在自适应数据量极少时能大幅提升性能,较好地克服过拟合现象。与本征音自适应算法相比,该算法以较小的性能牺牲代价获得了更低的空间复杂度而更具实用性。
  • Zhang Wen-lin, Zhang Wei-qiang, Li Bi-cheng, et al.. Bayesian speaker adaptation based on a new hierarchical probabilistic model[J]. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20(7): 2002-2015.
    Solomonoff A, Campbell W M, and Boardman I. Advances in channel compensation for SVM speaker recognition[C]. Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, United States, 2005: 629-632.
    Kumar D S P, Prasad N V, Joshi V, et al.. Modified splice and its extension to non-stereo data for noise robust speech recognition[C]. Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop(ASRU), Olomouc, Czech Republic, 2013: 174-179.
    Ghalehjegh S H and Rose R C. Two-stage speaker adaptation in subspace Gaussian mixture models[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Florence, Italy, 2014: 6374-6378.
    Wang Y Q and Gale M J F. Tandem system adaptation using multiple linear feature transforms[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Vancouver, Canada, 2013: 7932-7936.
    Kenny P, Boulianne G, and Dumouchel P. Eigenvoice modeling with sparse training data[J]. IEEE Transactions on Speech and Audio Processing, 2005, 13(3): 345-354.
    Kenny P, Boulianne G, Dumouchel P, et al.. Speaker adaptation using an eigenphone basis[J]. IEEE Transaction on Speech and Audio Processing, 2004, 12(6): 579-589.
    Zhang Wen-lin, Zhang Wei-qiang, and Li Bi-cheng. Speaker adaptation based on speaker-dependent eigenphone estimation[C]. Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop(ASRU), Hawaii, United States, 2011: 48-52.
    张文林, 张连海, 陈琦, 等. 语音识别中基于低秩约束的本征音子说话人自适应方法[J]. 电子与信息学报, 2014, 36(4): 981-987.
    Zhang Wen-lin, Zhang Lian-hai, Chen Qi, et al.. Low-rank constraint eigenphone speaker adaptation method for speech recognition[J]. Journal of Electronics Information Technology, 2014, 36(4): 981-987.
    Zhang Wen-lin, Qu Dan, and Zhang Wei-qiang. Speaker adaptation based on sparse and low-rank eigenphone matrix estimation[C]. Proceedings of Annual Conference on International Speech Communication Association (INTERSPEECH), Singapore, 2014: 2972-2976.
    Wang N, Lee S, Seide F, et al.. Rapid speaker adaptation using a priori knowledge by eigenspace analysis of MLLR parameters[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Salt Lake City, United States, 2001: 345-348.
    Povey D and Yao K. A basis representation of constrained MLLR transforms for Robust adaptation[J]. Computer Speech and Language, 2012, 26(1): 35-51.
    Miao Y, Metze F, and Waibel A. Learning discriminative basis coefficients for eigenspace MLLR unsupervised adaptation[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Vancouver, Canada, 2013: 7927-7931.
    Saz O and Hain T. Using contextual information in joint factor eigenspace MLLR for speech recognition in diverse scenarios[C]. Proceedings of International Conference on Audio, Speech and Signal Processing(ICASSP), Florence, Italy, 2014: 6364-6368.
    Chang E, Shi Y, Zhou J, et al.. Speech lab in a box: a Mandarin speech toolbox to jumpstart speech related research[C]. Proceedings of 7th?European Conference on Speech Communication and Technology(Eurospeech), Aalborg, Denmark, 2001: 2799-2802.
  • 加载中
计量
  • 文章访问数:  1259
  • HTML全文浏览量:  114
  • PDF下载量:  513
  • 被引次数: 0
出版历程
  • 收稿日期:  2014-09-30
  • 修回日期:  2014-12-29
  • 刊出日期:  2015-06-19

目录

    /

    返回文章
    返回