语音识别中基于低秩约束的本征音子说话人自适应方法

张文林; 张连海; 陈琦; 李弼程

doi:10.3724/SP.J.1146.2013.00848

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名

邮箱

手机号码

标题

留言内容

验证码

语音识别中基于低秩约束的本征音子说话人自适应方法

doi: 10.3724/SP.J.1146.2013.00848

张文林^{* 张连海陈琦李弼程},
张连海,
陈琦,
李弼程

基金项目:

国家自然科学基金(61175017)和国家863计划项目(2012AA011603)资助课题

计量
- 文章访问数: 2353
- HTML全文浏览量: 108
- PDF下载量: 684
- 被引次数: 0
出版历程
- 收稿日期: 2013-06-14
- 修回日期: 2013-12-26
- 刊出日期: 2014-04-19

Low-rank Constraint Eigenphone Speaker Adaptation Method for Speech Recognition

Zhang Wen-Lin^{* 张连海陈琦李弼程},
Zhang Lian-Hai,
Chen Qi,
Li Bi-Cheng

摘要

摘要: 该文提出一种基于低秩约束的本征音子(Eigenphone)说话人自适应方法。原始的本征音子说话人自适应方法在自适应语料充分时具有很好的效果，然而当自适应语料不足时，出现严重的过拟合现象，导致自适应后的系统可能比自适应前的系统还要差。首先，对协方差矩阵为对角阵的隐马尔可夫-高斯混合模型语音识别系统，推导出一种简化的本征音子矩阵估计算法；然后，对本征音子矩阵引入低秩约束，采用矩阵的核范数作为矩阵秩的凸近似，通过调节核范数的权重因子以有效控制自适应模型的复杂度；最后，给出一种加速近点梯度算法以求解新算法中引入的带有核范数正则项的数学优化问题。汉语连续语音识别的说话人自适应实验表明，引入低秩约束后，本征音子说话人自适应方法的自适应效果得到了明显提高，在5~50 s的自适应数据条件下，均取得了比最大似然线性回归后接最大后验(MLLR+MAP)自适应更佳的识别效果。
- 语音识别 /
- 说话人自适应 /
- 本征音子 /
- 低秩约束 /
- 近点梯度法
Abstract: A low-rank constraint eigenphone speaker adaptation method is proposed. Original eigenphone speaker adaptation method performs well when the amount of adaptation data is sufficient. However, it suffers from server overfitting when insufficient amount of adaptation data is provided, possibly resulting in lower performance than that of the unadapted system. Firstly, a simplified estimation alogrithm of the eigenphone matrix is deduced in case of hidden Markov model-Gaussian mixture model (HMM-GMM) based speech recognition system with diagonal covariance matrices. Then, a low-rank constraint is applied to estimation of the eigenphone matrix. The nuclear norm is used as a convex approximation of the rank of a matrix. The weight of the norm is adjusted to control the complexity of the adaptation model. Finally, an accelerated proximal gradient method is adopted to solve the mathematic optimization. Experiments on an Mandarin Chinese continuous speech recognition task show that, the performance of the original eigenphone method is improved remarkably. The new method outperforms the maximum likelihood linear regression followed by maximum a posterriori (MLLR+MAP) methods under 5~50 s adaptation data testing conditions.
- Speech recognition /
- Speaker adaptation /
- Eigenphone /
- Low-rank constraint /
- Proximal gradient method