Rapid Speaker Adaptation Based on Maximum-likelihood Variable Subspace

Zhang  Wen-Lin; Niu  Tong; Zhang  Lian-Hai; Li Bi-Cheng

doi:10.3724/SP.J.1146.2011.00839

Volume 34 Issue 3

Mar. 2012

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2012 > 34(3): 571-575

Zhang Wen-Lin, Niu Tong, Zhang Lian-Hai, Li Bi-Cheng. Rapid Speaker Adaptation Based on Maximum-likelihood Variable Subspace[J]. Journal of Electronics & Information Technology, 2012, 34(3): 571-575. doi: 10.3724/SP.J.1146.2011.00839

Citation:

Zhang Wen-Lin, Niu Tong, Zhang Lian-Hai, Li Bi-Cheng. Rapid Speaker Adaptation Based on Maximum-likelihood Variable Subspace[J]. Journal of Electronics & Information Technology, 2012, 34(3): 571-575. doi: 10.3724/SP.J.1146.2011.00839

Citation:

PDF( 208 KB)

Rapid Speaker Adaptation Based on Maximum-likelihood Variable Subspace

doi: 10.3724/SP.J.1146.2011.00839

Zhang Wen-Lin^{* 牛铜张连海李弼程
,},
Niu Tong,
Zhang Lian-Hai,
Li Bi-Cheng

Received Date: 2011-08-15
Rev Recd Date: 2011-11-21
Publish Date: 2012-03-19

Abstract

Abstract

A new rapid speaker adaptation method based on maximum likelihood variable subspace is proposed. A set of bases of the speaker space is obtained by performing Principal Component Analysis (PCA) on the Speaker Dependent (SD) model parameters of the training speakers. Different from conventional subspace based methods, during speaker adaptation, a subset of these bases is dynamically chosen for each speaker using maximum likelihood criteria. The new speakers model is constrained in the subspace spanned by those bases. With less free parameters required, the new method can obtain more robust SD model using very little amount of adaptation data. Speech recognition experiments show that the new method can obtain better performance than the eigenvoice method and MLLR method, both in supervised mode and in unsupervised mode.
- Continuous speech recognition,
- Speaker adaptation,
- Eigenvoice,
- Subspace method