文本无关说话人识别中一种改进的模型PCA变换方法

姚志强; 周曦; 戴蓓蒨

doi:10.3724/SP.J.1146.2005.00749

文本无关说话人识别中一种改进的模型PCA变换方法

doi: 10.3724/SP.J.1146.2005.00749

基金项目:

国家自然科学基金(60272039)资助课题

计量
- 文章访问数: 3233
- HTML全文浏览量: 110
- PDF下载量: 1306
- 被引次数: 0
出版历程
- 收稿日期: 2005-06-27
- 修回日期: 2006-01-03
- 刊出日期: 2007-02-19

Improved Model-Based PCA Transformation for GMM in Speaker Identification

摘要

摘要: 对于采用高斯混合模型(GMM)的与文本无关的说话人识别，出于模型参数数量和计算量的考虑 GMM的协方差矩阵通常取为对角矩阵形式，并假设观察矢量各维之间是不相关的。然而，这种假设在大多情况下是不成立的。为了使观察矢量空间适合于采用对角协方差的GMM进行拟合，通常采用对参数空间或模型空间进行解相关变换。该文提出了一种改进模型空间解相关的PCA方法，通过直接对GMM的各高斯成分的协方差进行主成分分析，使参数空间分布更符合使用对角化协方差的混合高斯分布，并通过共享PCA变换阵的方法减少参数数量和计算量。在微软语音库上的说话人识别实验表明，该方法取得了比常规的对角协方差GMM系统的最优结果有相对35%的误识率下降。
- 话者识别;PCA;模型PCA;解相关
Abstract: There is a basic choice in the form of covariance matrix to be used with Gaussian mixture model in text-independent speaker identification. In general, diagonal covariance matrix is chose, which implies strong assumption that elements of the feature vector are independent, because full covariance matrix suffers from too many parameters and large computational requirement. Unfortunately, in most application the assumption is not reasonable. In order to make feature vectors more suit to be modeled with diagonal covariance, features are usually de-correlated in feature space or model space. In this paper, an improved model-based PCA transformation algorithm is presented to de-correlate the elements of feature vectors. In this algorithm, principal component analysis is directly made for covariance of Gaussians. Also, the number of parameter is deduced through tying the PCA transformation between Gaussians. Experiments on the MSRA mandarin task show that the algorithm could achieve above 35% identification error reduction over the best diagonal covariance models.

HTML全文

参考文献(1)

[1] Reynolds D A. Speaker identification and verification using Gaussian mixture[J].Speech Communication.1995, 17:19-108 [2] Fukunaga K. Introduction to Statistical Pattern Recognition. New York: Academic, 1990, 9. [3] Haeb-Umbach R. Linear discriminant analysis for large vocabulary speech recognition, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, 1992, 1: 13-16. [4] Gopinath R A. Maximum likelihood modeling with Gaussian distributions for classification. in Proc. ICASSP, Seattle, 1998, 2: 661-664. [5] Kumar N. Investigation of silicon-auditory models and generalization of linear discriminant analysis for improved speech recognition, [Ph.D. dissertation], Johns Hopkins Univ., Baltimore, MD, 1997. [6] Gales M J F. Semi-tied covariance matrices for hidden Markov models, IEEE Trans[J].on Speech Audio Processing.1999, 7:272-281 [7] Gales M J F. Maximum likelihood multiple subspace projections for hidden Markov models, IEEE Trans[J].on Speech Audio Processing.2002, 10:37-47 [8] Chang Eric，Shi Yu, Zhou Jianlai, and Huang Chao. Speech lab in a box: A mandarin speech toolbox to jumpstart speech related research. EuroSpeech'01, Aalborg, Denmark, Oct. 2001: 2799-2802. [9] Zhou Xi and Yao Zhiqiang. Improved covariance modeling for gaussian mixture model. Inter-Speech 2005, Lisboa, Portugal, Sep. 2005, 3113-3116.

施引文献

资源附件(0)

访问统计