基于判别邻域嵌入算法的说话人识别

梁春燕; 袁文浩; 李艳玲; 夏斌; 孙文珠

doi:10.11999/JEIT180761

基于判别邻域嵌入算法的说话人识别

doi: 10.11999/JEIT180761 cstr: 32379.14.JEIT180761

1.
山东理工大学计算机科学与技术学院淄博 255049
2.
内蒙古师范大学计算机与信息工程学院呼和浩特 010022

基金项目: 国家自然科学基金(11704229, 61701286, 61562068)，山东省自然科学基金(ZR2017LA011, ZR2015FL003, ZR2017MF047)，山东省高等学校科技计划项目(J17KA078)，内蒙古自然科学基金项目(2015MS0629)

详细信息

作者简介:
梁春燕：女，1986年生，讲师，研究方向为说话人识别、语种识别

袁文浩：男，1985年生，讲师，研究方向为语音信号处理、语音增强

李艳玲：女，1978年生，副教授，研究方向为自然语言处理、口语理解、机器学习

夏斌：男，1973年生，副教授，研究方向为深度学习、信号与信息处理

孙文珠：男，1983年生，讲师，研究方向为多媒体信号传输

通讯作者:
梁春燕　liangchunyan@sdut.edu.cn

中图分类号: TP391.42
计量
- 文章访问数: 2258
- HTML全文浏览量: 655
- PDF下载量: 76
- 被引次数: 0
出版历程
- 收稿日期: 2018-08-03
- 修回日期: 2019-01-21
- 网络出版日期: 2019-02-24
- 刊出日期: 2019-07-01

Speaker Recognition Using Discriminant Neighborhood Embedding

1.
College of Computer Science and Technology, Shandong University of Technology, Zibo 255049, China
2.
College of Computer and Information Engineering, Inner Mongolia Normal University, Hohhot 010022, China

Funds: The National Natural Science Foundation of China (11704229, 61701286, 61562068), The Shandong Provincial Natural Science Foundation (ZR2017LA011, ZR2015FL003, ZR2017MF047), The Project of Shandong Province Higher Educational Science and Technology Program (J17KA078), The Natural Science Foundation of Inner Mongolia Autonomous Region of China (2015MS0629)

摘要

摘要: 该文提出一种基于判别邻域嵌入(DNE)算法的说话人识别。判别邻域嵌入算法作为流形学习方法的一种，可以通过构建邻接图获取数据的局部邻域结构信息；同时该算法可以充分利用类间判别信息，具有更强的判别能力。在美国国家标准技术研究院2010年说话人识别评测(NIST SRE 2010)电话-电话核心测试集上的实验结果表明了该算法的有效性。
- 说话人识别 /
- 总变化因子分析 /
- 邻域保持嵌入 /
- 判别邻域嵌入
Abstract: Discriminant Neighborhood Embedding (DNE) algorithm is introduced into the speaker recognition system. DNE is a manifold learning approach and aims at preserving the local neighborhood structure on the data manifold. As well, DNE has much more power in discrimination by sufficiently using the between-class discriminant information. The experimental results on the telephone-telephone core condition of the NIST 2010 Speaker Recognition Evaluation (SRE) dataset indicate the effectiveness of DNE algorithm.
- Speaker recognition /
- Total variability factor analysis /
- Neighborhood Preserving Embedding (NPE) /
- Discriminant Neighborhood Embedding (DNE)

HTML全文

表 1 NIST SRE 2010电话-电话测试集上DNE和NPE的EER和minDCF比较(无信道补偿)

系统男声女声
EER(%) minDCF EER(%) minDCF

NPE 5.76 0.0575 6.98 0.0744
DNE 5.28 0.0544 6.35 0.0683

下载: 导出CSV

表 2 NIST SRE 2010电话-电话测试集上DNE和NPE的EER和minDCF比较(LDA信道补偿)

系统男声女声
EER(%) minDCF EER(%) minDCF

NPE+LDA 4.71 0.0492 6.11 0.0633
DNE+LDA 4.19 0.0453 5.57 0.0604

下载: 导出CSV

表 3 NIST SRE 2010电话-电话测试集上DNE和NPE的EER和minDCF比较(WCCN信道补偿)

系统男声女声
EER(%) minDCF EER(%) minDCF

NPE+WCCN 5.07 0.0512 6.49 0.0677
DNE+WCCN 4.59 0.0478 5.83 0.0617

下载: 导出CSV

表 4 NIST SRE 2010电话-电话测试集上DNE和NPE的EER和minDCF比较(LDA+WCCN信道补偿)

系统男声女声
EER(%) minDCF EER(%) minDCF

NPE+LDA+WCCN 4.41 0.0476 5.72 0.0584
DNE+LDA+WCCN 4.15 0.0434 5.24 0.0553

下载: 导出CSV

表 5 NIST SRE 2010电话-电话测试集上DNE和PLDA的EER和minDCF比较

系统男声女声
EER(%) minDCF EER(%) minDCF

DNE+LDA+WCCN 4.15 0.0434 5.24 0.0553
PLDA 4.12 0.0428 5.37 0.0532

下载: 导出CSV

参考文献(25)

REYNOLDS D A and ROSE R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72–83. doi: 10.1109/89.365379

KINNUNEN T and LI Haizhou. An overview of text-independent speaker recognition: From features to supervectors[J]. Speech Communication, 2010, 52(1): 12–40. doi: 10.1016/j.specom.2009.08.009

王伟, 韩纪庆, 郑铁然, 等. 基于Fisher判别字典学习的说话人识别[J]. 电子与信息学报, 2016, 38(2): 367–372. doi: 10.11999/JEIT150566

WANG Wei, HAN Jiqing, ZHENG Tieran, et al. Speaker recognition based on fisher discrimination dictionary Learning[J]. Journal of Electronics &Information Technology, 2016, 38(2): 367–372. doi: 10.11999/JEIT150566

KENNY P, BOULIANNE G, OUELLET P, et al. Speaker and session variability in GMM-based speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4): 1448–1460. doi: 10.1109/tasl.2007.894527

郭武, 戴礼荣, 王仁华. 采用因子分析和支持向量机的说话人确认系统[J]. 电子与信息学报, 2009, 31(2): 302–305. doi: 10.3724/SP.J.1146.2007.01289

GUO Wu, DAI Lirong, and WANG Renhua. Speaker verification based on factor analysis and SVM[J]. Journal of Electronics &Information Technology, 2009, 31(2): 302–305. doi: 10.3724/SP.J.1146.2007.01289

DEHAK N, KENNY P J, DEHAK R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788–798. doi: 10.1109/tasl.2010.2064307

DHANUSH B K, SUPARNA S, AARTHY R, et al. Factor analysis methods for joint speaker verification and spoof detection[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5385–5389.

SU Hang and WEGMANN S. Factor analysis based speaker verification using ASR[C]. Proceedings of the Interspeech 2016, San Francisco, USA, 2016: 2223–2227.

MAK M W, PANG Xiaomin, and CHIEN J T. Mixture of PLDA for noise robust i-vector speaker verification[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(1): 130–142. doi: 10.1109/TASLP.2015.2499038

LEI Yun and HANSEN J H L. Speaker recognition using supervised probabilistic principal component analysis[C]. Proceedings of the Interspeech 2010, Makuhari, Japan, 2010: 382–385.

LIANG Chunyan, YANG Lin, ZHAO Qingwei, et al. Factor Analysis of neighborhood-preserving embedding for speaker verification[J]. IEICE Transactions on Information and Systems, 2012, 95(10): 2572–2576. doi: 10.1587/transinf.e95.d.2572

YANG Jinchao, LIANG Chunyan, YANG Lin, et al. Factor analysis of Laplacian approach for speaker recognition[C]. Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, 2012: 4221–4224.

CHIEN J T and HSU C W. Variational manifold learning for speaker recognition[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 4935–4939.

WU Di. Speaker recognition based on i-vector and improved local preserving projection[C]. Proceedings of the 2015 Chinese Intelligent Automation Conference, Fuzhou, China, 2015: 115–121.

HE Xiaofei, CAI Deng, YAN Shuicheng, et al. Neighborhood preserving embedding[C]. Proceedings of the Tenth IEEE International Conference on Computer Vision, Beijing, China, 2005: 1208–1213.

KAJAREKAR S S and STOLCKE A. NAP and WCCN: Comparison of approaches using MLLR-SVM speaker verification system[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, USA, 2007: IV-249–IV-252.

HAEB-UMBACH R and NEY H. Linear discriminant analysis for improved large vocabulary continuous speech recognition[C]. Proceedings of 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, USA, 1992: 13–16.

DING Chuntao and ZHANG Li. Double adjacency graphs-based discriminant neighborhood embedding[J]. Pattern Recognition, 2015, 48(5): 1734–1742. doi: 10.1016/j.patcog.2014.08.025

WANG Jing, CHEN Fang, and GAO Quanxue. Discriminant neighborhood structure embedding using trace ratio criterion for image recognition[J]. Journal of Computer and Communications, 2015, 3(11): 61282. doi: 10.4236/jcc.2015.311011

魏权龄, 王日爽, 徐冰, 等. 数学规划与优化设计[M]. 北京: 国防工业出版社, 1984: 358–470.

WEI Quanling, WANG Rishuang, XU Bing, et al. Mathematical Programming and Optimization Design[M]. Beijing: National Defense Industry Press, 1984: 358–470.

NIST. The NIST year 2010 speaker recognition evaluation plan[EB/OL]. http://www.oalib.com/references/16891962, 2012.

SCHEFFER N, FERRER L, GRACIARENA M, et al. The SRI NIST 2010 speaker recognition evaluation system[C]. Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, 2011: 5292–5295.

JOACHIMS T. SVM-light support vector machine[EB/OL]. http://svmlight.joachims.org/, 2008.

KINNUNEN T, JUVELA L, ALKU P, et al. Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5535–5539.

BAHMANINEZHAD F and HANSEN J H L. i-Vector/PLDA speaker recognition using support vectors with discriminant analysis[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5410–5414.

施引文献

资源附件(0)

访问统计

表(5)

计量

文章访问数: 2258
HTML全文浏览量: 655
PDF下载量: 76
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于判别邻域嵌入算法的说话人识别

doi: 10.11999/JEIT180761 cstr: 32379.14.JEIT180761

通讯作者:
梁春燕　liangchunyan@sdut.edu.cn

计量

Speaker Recognition Using Discriminant Neighborhood Embedding

计量

目录

系统	男声		女声
系统	EER(%)	minDCF	EER(%)	minDCF
NPE	5.76	0.0575	6.98	0.0744
DNE	5.28	0.0544	6.35	0.0683

系统	男声		女声
系统	EER(%)	minDCF	EER(%)	minDCF
NPE+LDA	4.71	0.0492	6.11	0.0633
DNE+LDA	4.19	0.0453	5.57	0.0604

系统	男声		女声
系统	EER(%)	minDCF	EER(%)	minDCF
NPE+WCCN	5.07	0.0512	6.49	0.0677
DNE+WCCN	4.59	0.0478	5.83	0.0617

系统	男声		女声
系统	EER(%)	minDCF	EER(%)	minDCF
NPE+LDA+WCCN	4.41	0.0476	5.72	0.0584
DNE+LDA+WCCN	4.15	0.0434	5.24	0.0553

系统	男声		女声
系统	EER(%)	minDCF	EER(%)	minDCF
DNE+LDA+WCCN	4.15	0.0434	5.24	0.0553
PLDA	4.12	0.0428	5.37	0.0532

留言板

基于判别邻域嵌入算法的说话人识别

doi: 10.11999/JEIT180761 cstr: 32379.14.JEIT180761

通讯作者: 梁春燕 liangchunyan@sdut.edu.cn

计量

出版历程

Speaker Recognition Using Discriminant Neighborhood Embedding

计量

出版历程

目录

通讯作者:
梁春燕　liangchunyan@sdut.edu.cn