Speaker Recognition Using Discriminant Neighborhood Embedding
-
摘要: 该文提出一种基于判别邻域嵌入(DNE)算法的说话人识别。判别邻域嵌入算法作为流形学习方法的一种,可以通过构建邻接图获取数据的局部邻域结构信息;同时该算法可以充分利用类间判别信息,具有更强的判别能力。在美国国家标准技术研究院2010年说话人识别评测(NIST SRE 2010)电话-电话核心测试集上的实验结果表明了该算法的有效性。Abstract: Discriminant Neighborhood Embedding (DNE) algorithm is introduced into the speaker recognition system. DNE is a manifold learning approach and aims at preserving the local neighborhood structure on the data manifold. As well, DNE has much more power in discrimination by sufficiently using the between-class discriminant information. The experimental results on the telephone-telephone core condition of the NIST 2010 Speaker Recognition Evaluation (SRE) dataset indicate the effectiveness of DNE algorithm.
-
表 1 NIST SRE 2010电话-电话测试集上DNE和NPE的EER和minDCF比较(无信道补偿)
系统 男声 女声 EER(%) minDCF EER(%) minDCF NPE 5.76 0.0575 6.98 0.0744 DNE 5.28 0.0544 6.35 0.0683 表 2 NIST SRE 2010电话-电话测试集上DNE和NPE的EER和minDCF比较(LDA信道补偿)
系统 男声 女声 EER(%) minDCF EER(%) minDCF NPE+LDA 4.71 0.0492 6.11 0.0633 DNE+LDA 4.19 0.0453 5.57 0.0604 表 3 NIST SRE 2010电话-电话测试集上DNE和NPE的EER和minDCF比较(WCCN信道补偿)
系统 男声 女声 EER(%) minDCF EER(%) minDCF NPE+WCCN 5.07 0.0512 6.49 0.0677 DNE+WCCN 4.59 0.0478 5.83 0.0617 表 4 NIST SRE 2010电话-电话测试集上DNE和NPE的EER和minDCF比较(LDA+WCCN信道补偿)
系统 男声 女声 EER(%) minDCF EER(%) minDCF NPE+LDA+WCCN 4.41 0.0476 5.72 0.0584 DNE+LDA+WCCN 4.15 0.0434 5.24 0.0553 表 5 NIST SRE 2010电话-电话测试集上DNE和PLDA的EER和minDCF比较
系统 男声 女声 EER(%) minDCF EER(%) minDCF DNE+LDA+WCCN 4.15 0.0434 5.24 0.0553 PLDA 4.12 0.0428 5.37 0.0532 -
REYNOLDS D A and ROSE R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72–83. doi: 10.1109/89.365379 KINNUNEN T and LI Haizhou. An overview of text-independent speaker recognition: From features to supervectors[J]. Speech Communication, 2010, 52(1): 12–40. doi: 10.1016/j.specom.2009.08.009 王伟, 韩纪庆, 郑铁然, 等. 基于Fisher判别字典学习的说话人识别[J]. 电子与信息学报, 2016, 38(2): 367–372. doi: 10.11999/JEIT150566WANG Wei, HAN Jiqing, ZHENG Tieran, et al. Speaker recognition based on fisher discrimination dictionary Learning[J]. Journal of Electronics &Information Technology, 2016, 38(2): 367–372. doi: 10.11999/JEIT150566 KENNY P, BOULIANNE G, OUELLET P, et al. Speaker and session variability in GMM-based speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4): 1448–1460. doi: 10.1109/tasl.2007.894527 郭武, 戴礼荣, 王仁华. 采用因子分析和支持向量机的说话人确认系统[J]. 电子与信息学报, 2009, 31(2): 302–305. doi: 10.3724/SP.J.1146.2007.01289GUO Wu, DAI Lirong, and WANG Renhua. Speaker verification based on factor analysis and SVM[J]. Journal of Electronics &Information Technology, 2009, 31(2): 302–305. doi: 10.3724/SP.J.1146.2007.01289 DEHAK N, KENNY P J, DEHAK R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788–798. doi: 10.1109/tasl.2010.2064307 DHANUSH B K, SUPARNA S, AARTHY R, et al. Factor analysis methods for joint speaker verification and spoof detection[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5385–5389. SU Hang and WEGMANN S. Factor analysis based speaker verification using ASR[C]. Proceedings of the Interspeech 2016, San Francisco, USA, 2016: 2223–2227. MAK M W, PANG Xiaomin, and CHIEN J T. Mixture of PLDA for noise robust i-vector speaker verification[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(1): 130–142. doi: 10.1109/TASLP.2015.2499038 LEI Yun and HANSEN J H L. Speaker recognition using supervised probabilistic principal component analysis[C]. Proceedings of the Interspeech 2010, Makuhari, Japan, 2010: 382–385. LIANG Chunyan, YANG Lin, ZHAO Qingwei, et al. Factor Analysis of neighborhood-preserving embedding for speaker verification[J]. IEICE Transactions on Information and Systems, 2012, 95(10): 2572–2576. doi: 10.1587/transinf.e95.d.2572 YANG Jinchao, LIANG Chunyan, YANG Lin, et al. Factor analysis of Laplacian approach for speaker recognition[C]. Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, 2012: 4221–4224. CHIEN J T and HSU C W. Variational manifold learning for speaker recognition[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 4935–4939. WU Di. Speaker recognition based on i-vector and improved local preserving projection[C]. Proceedings of the 2015 Chinese Intelligent Automation Conference, Fuzhou, China, 2015: 115–121. HE Xiaofei, CAI Deng, YAN Shuicheng, et al. Neighborhood preserving embedding[C]. Proceedings of the Tenth IEEE International Conference on Computer Vision, Beijing, China, 2005: 1208–1213. KAJAREKAR S S and STOLCKE A. NAP and WCCN: Comparison of approaches using MLLR-SVM speaker verification system[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, USA, 2007: IV-249–IV-252. HAEB-UMBACH R and NEY H. Linear discriminant analysis for improved large vocabulary continuous speech recognition[C]. Proceedings of 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, USA, 1992: 13–16. DING Chuntao and ZHANG Li. Double adjacency graphs-based discriminant neighborhood embedding[J]. Pattern Recognition, 2015, 48(5): 1734–1742. doi: 10.1016/j.patcog.2014.08.025 WANG Jing, CHEN Fang, and GAO Quanxue. Discriminant neighborhood structure embedding using trace ratio criterion for image recognition[J]. Journal of Computer and Communications, 2015, 3(11): 61282. doi: 10.4236/jcc.2015.311011 魏权龄, 王日爽, 徐冰, 等. 数学规划与优化设计[M]. 北京: 国防工业出版社, 1984: 358–470.WEI Quanling, WANG Rishuang, XU Bing, et al. Mathematical Programming and Optimization Design[M]. Beijing: National Defense Industry Press, 1984: 358–470. NIST. The NIST year 2010 speaker recognition evaluation plan[EB/OL]. http://www.oalib.com/references/16891962, 2012. SCHEFFER N, FERRER L, GRACIARENA M, et al. The SRI NIST 2010 speaker recognition evaluation system[C]. Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, 2011: 5292–5295. JOACHIMS T. SVM-light support vector machine[EB/OL]. http://svmlight.joachims.org/, 2008. KINNUNEN T, JUVELA L, ALKU P, et al. Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5535–5539. BAHMANINEZHAD F and HANSEN J H L. i-Vector/PLDA speaker recognition using support vectors with discriminant analysis[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5410–5414.
计量
- 文章访问数: 2181
- HTML全文浏览量: 618
- PDF下载量: 75
- 被引次数: 0