Advanced Search
Volume 32 Issue 4
Dec.  2010
Turn off MathJax
Article Contents
Su Teng-rong, Wu Ji, Wang Zuo-ying. Acoustic Model Training Based on Spatial Correlation Transformation[J]. Journal of Electronics & Information Technology, 2010, 32(4): 1003-1007. doi: 10.3724/SP.J.1146.2009.00343
Citation: Su Teng-rong, Wu Ji, Wang Zuo-ying. Acoustic Model Training Based on Spatial Correlation Transformation[J]. Journal of Electronics & Information Technology, 2010, 32(4): 1003-1007. doi: 10.3724/SP.J.1146.2009.00343

Acoustic Model Training Based on Spatial Correlation Transformation

doi: 10.3724/SP.J.1146.2009.00343
  • Received Date: 2009-03-16
  • Rev Recd Date: 2009-08-17
  • Publish Date: 2010-04-19
  • In order to enhance the utilization of the correlation between different acoustic units in speech recognition, a novel model training approach based on the Spatial Correlation Transformation (SCT) framework is proposed in this paper, in which the speaker-independent model parameters are re-estimated using the spatial correlation information in the training data. In this algorithm, SCT is applied to all training data, to decrease the correlation among the training data, make the model re-estimated less dependent on the training data, and then improve the performance of the model. Experiments show that the combination of SCT-based model training and SCT-based feature transformation achieves a relative reduction of 18% of average syllable error rate compared to the baseline system.
  • loading
  • Leggetter C J and Woodland P C. Maximum likelihoodlinear regression for speaker adaptation of continuousdensity hidden markov models[J].Computer Speech andLanguage.1995, 9(2):171-185[2]Kuhn R, Junqua J C, and Nguyen P, et al.. Rapid speakeradaptation in eigenvoice space[J].IEEE Transactions on Speechand Audio Processing.2000, 8(6):695-707[3]Anastasakos Tasos, McDonough John, and Makhoul John.Speaker adaptive training: A maximum likelihood approachto speaker normalization. Proceedings of ICASSP, Munich,Germany, 1997: 1043-1046.[4]Sinha R and Gales M J F, et al.. The CU-HTK mandarinbroadcast news transcription system. Proceedings ofICASSP, Toulouse, France, 2006: 1077-1080.[5]Ng Tim, et al.. Progress in the BBN 2007 mandarin speech totext system. Proceedings of ICASSP, Las Vegas, USA, 2008:1537-1540.Su Teng-rong, Wu Ji, and Wang Zuo-ying. Spatialcorrelation transformation based on minimum covariance.Proceedings of ICASSP, Las Vegas, USA, 2008: 4697-4700.[6]苏腾荣, 吴及, 王作英. 空间相关性变换及其在语音识别中的应用. 清华大学学报( 自然科学版), 2009, 49(10):1655-1659.Su Teng-rong, Wu Ji, and Wang Zuo-ying. Spatial correlationtransformation and its application in speech recognition.Journal of Tsinghua University (Science and Technology),2009, 49(10): 1655-1659.[7]王作英, 肖熙. 基于段长分布的HMM语音识别模型. 电子学报, 2004, 32(1): 46-49.Wang Zuo-ying and Xiao Xi. Duration distribution basedHMM speech recognition models. Acta Electronica Sinica,2004, 32(1): 46-49.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (3568) PDF downloads(841) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return