Acoustic Model Training Based on Spatial Correlation Transformation

Su Teng-rong; Wu Ji; Wang Zuo-ying

doi:10.3724/SP.J.1146.2009.00343

Volume 32 Issue 4

Dec. 2010

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2010 > 32(4): 1003-1007

Su Teng-rong, Wu Ji, Wang Zuo-ying. Acoustic Model Training Based on Spatial Correlation Transformation[J]. Journal of Electronics & Information Technology, 2010, 32(4): 1003-1007. doi: 10.3724/SP.J.1146.2009.00343

Citation:

Su Teng-rong, Wu Ji, Wang Zuo-ying. Acoustic Model Training Based on Spatial Correlation Transformation[J]. Journal of Electronics & Information Technology, 2010, 32(4): 1003-1007. doi: 10.3724/SP.J.1146.2009.00343

Citation:

PDF( 201 KB)

Acoustic Model Training Based on Spatial Correlation Transformation

doi: 10.3724/SP.J.1146.2009.00343 cstr: 32379.14.SP.J.1146.2009.00343

Received Date: 2009-03-16
Rev Recd Date: 2009-08-17
Publish Date: 2010-04-19

Abstract

Abstract

In order to enhance the utilization of the correlation between different acoustic units in speech recognition, a novel model training approach based on the Spatial Correlation Transformation (SCT) framework is proposed in this paper, in which the speaker-independent model parameters are re-estimated using the spatial correlation information in the training data. In this algorithm, SCT is applied to all training data, to decrease the correlation among the training data, make the model re-estimated less dependent on the training data, and then improve the performance of the model. Experiments show that the combination of SCT-based model training and SCT-based feature transformation achieves a relative reduction of 18% of average syllable error rate compared to the baseline system.

FullText(HTML)

References(1)

References

Leggetter C J and Woodland P C. Maximum likelihoodlinear regression for speaker adaptation of continuousdensity hidden markov models[J].Computer Speech andLanguage.1995, 9(2):171-185[2]Kuhn R, Junqua J C, and Nguyen P, et al.. Rapid speakeradaptation in eigenvoice space[J].IEEE Transactions on Speechand Audio Processing.2000, 8(6):695-707[3]Anastasakos Tasos, McDonough John, and Makhoul John.Speaker adaptive training: A maximum likelihood approachto speaker normalization. Proceedings of ICASSP, Munich,Germany, 1997: 1043-1046.[4]Sinha R and Gales M J F, et al.. The CU-HTK mandarinbroadcast news transcription system. Proceedings ofICASSP, Toulouse, France, 2006: 1077-1080.[5]Ng Tim, et al.. Progress in the BBN 2007 mandarin speech totext system. Proceedings of ICASSP, Las Vegas, USA, 2008:1537-1540.Su Teng-rong, Wu Ji, and Wang Zuo-ying. Spatialcorrelation transformation based on minimum covariance.Proceedings of ICASSP, Las Vegas, USA, 2008: 4697-4700.[6]苏腾荣, 吴及, 王作英. 空间相关性变换及其在语音识别中的应用. 清华大学学报( 自然科学版), 2009, 49(10):1655-1659.Su Teng-rong, Wu Ji, and Wang Zuo-ying. Spatial correlationtransformation and its application in speech recognition.Journal of Tsinghua University (Science and Technology),2009, 49(10): 1655-1659.[7]王作英, 肖熙. 基于段长分布的HMM语音识别模型. 电子学报, 2004, 32(1): 46-49.Wang Zuo-ying and Xiao Xi. Duration distribution basedHMM speech recognition models. Acta Electronica Sinica,2004, 32(1): 46-49.

Relative Articles

Supplements(0)

Cited By

Proportional views