高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于判别邻域嵌入算法的说话人识别

梁春燕 袁文浩 李艳玲 夏斌 孙文珠

梁春燕, 袁文浩, 李艳玲, 夏斌, 孙文珠. 基于判别邻域嵌入算法的说话人识别[J]. 电子与信息学报, 2019, 41(7): 1774-1778. doi: 10.11999/JEIT180761
引用本文: 梁春燕, 袁文浩, 李艳玲, 夏斌, 孙文珠. 基于判别邻域嵌入算法的说话人识别[J]. 电子与信息学报, 2019, 41(7): 1774-1778. doi: 10.11999/JEIT180761
Chunyan LIANG, Wenhao YUAN, Yanling LI, Bin XIA, Wenzhu SUN. Speaker Recognition Using Discriminant Neighborhood Embedding[J]. Journal of Electronics & Information Technology, 2019, 41(7): 1774-1778. doi: 10.11999/JEIT180761
Citation: Chunyan LIANG, Wenhao YUAN, Yanling LI, Bin XIA, Wenzhu SUN. Speaker Recognition Using Discriminant Neighborhood Embedding[J]. Journal of Electronics & Information Technology, 2019, 41(7): 1774-1778. doi: 10.11999/JEIT180761

基于判别邻域嵌入算法的说话人识别

doi: 10.11999/JEIT180761
基金项目: 国家自然科学基金(11704229, 61701286, 61562068),山东省自然科学基金(ZR2017LA011, ZR2015FL003, ZR2017MF047),山东省高等学校科技计划项目(J17KA078),内蒙古自然科学基金项目(2015MS0629)
详细信息
    作者简介:

    梁春燕:女,1986年生,讲师,研究方向为说话人识别、语种识别

    袁文浩:男,1985年生,讲师,研究方向为语音信号处理、语音增强

    李艳玲:女,1978年生,副教授,研究方向为自然语言处理、口语理解、机器学习

    夏斌:男,1973年生,副教授,研究方向为深度学习、信号与信息处理

    孙文珠:男,1983年生,讲师,研究方向为多媒体信号传输

    通讯作者:

    梁春燕 liangchunyan@sdut.edu.cn

  • 中图分类号: TP391.42

Speaker Recognition Using Discriminant Neighborhood Embedding

Funds: The National Natural Science Foundation of China (11704229, 61701286, 61562068), The Shandong Provincial Natural Science Foundation (ZR2017LA011, ZR2015FL003, ZR2017MF047), The Project of Shandong Province Higher Educational Science and Technology Program (J17KA078), The Natural Science Foundation of Inner Mongolia Autonomous Region of China (2015MS0629)
  • 摘要: 该文提出一种基于判别邻域嵌入(DNE)算法的说话人识别。判别邻域嵌入算法作为流形学习方法的一种,可以通过构建邻接图获取数据的局部邻域结构信息;同时该算法可以充分利用类间判别信息,具有更强的判别能力。在美国国家标准技术研究院2010年说话人识别评测(NIST SRE 2010)电话-电话核心测试集上的实验结果表明了该算法的有效性。
  • 表  1  NIST SRE 2010电话-电话测试集上DNE和NPE的EER和minDCF比较(无信道补偿)

    系统男声 女声
    EER(%)minDCFEER(%)minDCF
    NPE5.760.0575 6.980.0744
    DNE5.280.05446.350.0683
    下载: 导出CSV

    表  2  NIST SRE 2010电话-电话测试集上DNE和NPE的EER和minDCF比较(LDA信道补偿)

    系统男声 女声
    EER(%)minDCFEER(%)minDCF
    NPE+LDA4.710.0492 6.110.0633
    DNE+LDA4.190.04535.570.0604
    下载: 导出CSV

    表  3  NIST SRE 2010电话-电话测试集上DNE和NPE的EER和minDCF比较(WCCN信道补偿)

    系统男声 女声
    EER(%)minDCFEER(%)minDCF
    NPE+WCCN5.070.0512 6.490.0677
    DNE+WCCN4.590.04785.830.0617
    下载: 导出CSV

    表  4  NIST SRE 2010电话-电话测试集上DNE和NPE的EER和minDCF比较(LDA+WCCN信道补偿)

    系统男声 女声
    EER(%)minDCFEER(%)minDCF
    NPE+LDA+WCCN4.410.0476 5.720.0584
    DNE+LDA+WCCN4.150.04345.240.0553
    下载: 导出CSV

    表  5  NIST SRE 2010电话-电话测试集上DNE和PLDA的EER和minDCF比较

    系统男声 女声
    EER(%)minDCFEER(%)minDCF
    DNE+LDA+WCCN4.150.0434 5.240.0553
    PLDA4.120.04285.370.0532
    下载: 导出CSV
  • REYNOLDS D A and ROSE R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72–83. doi: 10.1109/89.365379
    KINNUNEN T and LI Haizhou. An overview of text-independent speaker recognition: From features to supervectors[J]. Speech Communication, 2010, 52(1): 12–40. doi: 10.1016/j.specom.2009.08.009
    王伟, 韩纪庆, 郑铁然, 等. 基于Fisher判别字典学习的说话人识别[J]. 电子与信息学报, 2016, 38(2): 367–372. doi: 10.11999/JEIT150566

    WANG Wei, HAN Jiqing, ZHENG Tieran, et al. Speaker recognition based on fisher discrimination dictionary Learning[J]. Journal of Electronics &Information Technology, 2016, 38(2): 367–372. doi: 10.11999/JEIT150566
    KENNY P, BOULIANNE G, OUELLET P, et al. Speaker and session variability in GMM-based speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4): 1448–1460. doi: 10.1109/tasl.2007.894527
    郭武, 戴礼荣, 王仁华. 采用因子分析和支持向量机的说话人确认系统[J]. 电子与信息学报, 2009, 31(2): 302–305. doi: 10.3724/SP.J.1146.2007.01289

    GUO Wu, DAI Lirong, and WANG Renhua. Speaker verification based on factor analysis and SVM[J]. Journal of Electronics &Information Technology, 2009, 31(2): 302–305. doi: 10.3724/SP.J.1146.2007.01289
    DEHAK N, KENNY P J, DEHAK R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788–798. doi: 10.1109/tasl.2010.2064307
    DHANUSH B K, SUPARNA S, AARTHY R, et al. Factor analysis methods for joint speaker verification and spoof detection[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5385–5389.
    SU Hang and WEGMANN S. Factor analysis based speaker verification using ASR[C]. Proceedings of the Interspeech 2016, San Francisco, USA, 2016: 2223–2227.
    MAK M W, PANG Xiaomin, and CHIEN J T. Mixture of PLDA for noise robust i-vector speaker verification[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(1): 130–142. doi: 10.1109/TASLP.2015.2499038
    LEI Yun and HANSEN J H L. Speaker recognition using supervised probabilistic principal component analysis[C]. Proceedings of the Interspeech 2010, Makuhari, Japan, 2010: 382–385.
    LIANG Chunyan, YANG Lin, ZHAO Qingwei, et al. Factor Analysis of neighborhood-preserving embedding for speaker verification[J]. IEICE Transactions on Information and Systems, 2012, 95(10): 2572–2576. doi: 10.1587/transinf.e95.d.2572
    YANG Jinchao, LIANG Chunyan, YANG Lin, et al. Factor analysis of Laplacian approach for speaker recognition[C]. Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, 2012: 4221–4224.
    CHIEN J T and HSU C W. Variational manifold learning for speaker recognition[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 4935–4939.
    WU Di. Speaker recognition based on i-vector and improved local preserving projection[C]. Proceedings of the 2015 Chinese Intelligent Automation Conference, Fuzhou, China, 2015: 115–121.
    HE Xiaofei, CAI Deng, YAN Shuicheng, et al. Neighborhood preserving embedding[C]. Proceedings of the Tenth IEEE International Conference on Computer Vision, Beijing, China, 2005: 1208–1213.
    KAJAREKAR S S and STOLCKE A. NAP and WCCN: Comparison of approaches using MLLR-SVM speaker verification system[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, USA, 2007: IV-249–IV-252.
    HAEB-UMBACH R and NEY H. Linear discriminant analysis for improved large vocabulary continuous speech recognition[C]. Proceedings of 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, USA, 1992: 13–16.
    DING Chuntao and ZHANG Li. Double adjacency graphs-based discriminant neighborhood embedding[J]. Pattern Recognition, 2015, 48(5): 1734–1742. doi: 10.1016/j.patcog.2014.08.025
    WANG Jing, CHEN Fang, and GAO Quanxue. Discriminant neighborhood structure embedding using trace ratio criterion for image recognition[J]. Journal of Computer and Communications, 2015, 3(11): 61282. doi: 10.4236/jcc.2015.311011
    魏权龄, 王日爽, 徐冰, 等. 数学规划与优化设计[M]. 北京: 国防工业出版社, 1984: 358–470.

    WEI Quanling, WANG Rishuang, XU Bing, et al. Mathematical Programming and Optimization Design[M]. Beijing: National Defense Industry Press, 1984: 358–470.
    NIST. The NIST year 2010 speaker recognition evaluation plan[EB/OL]. http://www.oalib.com/references/16891962, 2012.
    SCHEFFER N, FERRER L, GRACIARENA M, et al. The SRI NIST 2010 speaker recognition evaluation system[C]. Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, 2011: 5292–5295.
    JOACHIMS T. SVM-light support vector machine[EB/OL]. http://svmlight.joachims.org/, 2008.
    KINNUNEN T, JUVELA L, ALKU P, et al. Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5535–5539.
    BAHMANINEZHAD F and HANSEN J H L. i-Vector/PLDA speaker recognition using support vectors with discriminant analysis[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5410–5414.
  • 加载中
表(5)
计量
  • 文章访问数:  2119
  • HTML全文浏览量:  590
  • PDF下载量:  74
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-08-03
  • 修回日期:  2019-01-21
  • 网络出版日期:  2019-02-24
  • 刊出日期:  2019-07-01

目录

    /

    返回文章
    返回