高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于子带双特征的自适应保留似然比鲁棒语音检测算法

何伟俊 贺前华 吴俊峰 杨继臣

何伟俊, 贺前华, 吴俊峰, 杨继臣. 基于子带双特征的自适应保留似然比鲁棒语音检测算法[J]. 电子与信息学报, 2016, 38(11): 2879-2886. doi: 10.11999/JEIT160157
引用本文: 何伟俊, 贺前华, 吴俊峰, 杨继臣. 基于子带双特征的自适应保留似然比鲁棒语音检测算法[J]. 电子与信息学报, 2016, 38(11): 2879-2886. doi: 10.11999/JEIT160157
HE Weijun, HE Qianhua, WU Junfeng, YANG Jichen. Adaptively Reserved Likelihood Ratio-based Robust Voice Activity Detection with Sub-band Double Features[J]. Journal of Electronics & Information Technology, 2016, 38(11): 2879-2886. doi: 10.11999/JEIT160157
Citation: HE Weijun, HE Qianhua, WU Junfeng, YANG Jichen. Adaptively Reserved Likelihood Ratio-based Robust Voice Activity Detection with Sub-band Double Features[J]. Journal of Electronics & Information Technology, 2016, 38(11): 2879-2886. doi: 10.11999/JEIT160157

基于子带双特征的自适应保留似然比鲁棒语音检测算法

doi: 10.11999/JEIT160157
基金项目: 

国家自然科学基金 (61571192),广东省公益项目(2015A010103003),中央高校基本科研业务费项目华南理工大学(2015ZM143)

Adaptively Reserved Likelihood Ratio-based Robust Voice Activity Detection with Sub-band Double Features

Funds: 

The National Natural Science Foundation of China (61571192), The Science and Technology Foundation of Guangdong Province (2015A010103003), The Fundamental Research Funds for the Central Universities, SCUT (2015ZM143)

  • 摘要: 为了进一步提高低信噪比下语音激活检测(VAD)的准确率,该文提出一种基于子带双特征的自适应保留似然比鲁棒语音激活检测算法。算法采用子带归一化最大自相关函数与子带归一化平均过零率双重特征设置频率分量似然比的保留权值,同时利用已过去固定时长的VAD判决结果及对应的子带特征参数自适应地估计似然比的保留阈值。实验结果表明,此算法的VAD检测准确率相比原保留似然比算法在10 dB, 0 dB和-10 dB平稳白噪声下分别提高了1.2%, 7.2%和8.1%,在10 dB和0 dB非平稳Babble噪声下分别提高了1.6%和3.4%。当其被用于2.4 kbps低速率声码器系统时,合成语音的感知语音质量评价(PESQ)比原声码器系统在白噪声下提高了0.098~0.153,在Babble噪声下提高了0.157~0.186。
  • SREEKUMAR K T, GEORGE K K, ARUNRAJ K, et al. Spectral matching based voice activity detector for improved speaker recognition[C]. 2014 International Conference on Power Signals Control and Computations (EPSCICON), Thrissur, 2014: 1-4. doi: 10.1109/EPSCICON.2014.6887507.
    DUTA C L, GHEORGHE L, and TAPUS N. Real time implementation of MELP speech compression algorithm using Blackfin processors[C]. 2015 9th International Symposium on Image and Signal Processing and Analysis (ISPA), Zagreb, 2015: 250-255. doi: 10.1109/ISPA.2015. 7306067.
    CHUL Y I, HYEONTAEK L, and DONGSUK Y. Formant-based robust voice activity detection[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(12): 2238-2245. doi: 10.1109/TASLP. 2015.2476762.
    JONGSEO S, NAM SOO K, and WONYONG S. A statistical model-based voice activity detection[J]. IEEE Signal Processing Letters, 1999, 6(1): 1-3. doi: 10.1109/97. 736233.
    DUK C Y, AL-NAIMI K, and KONDOZ A. Improved voice activity detection based on a smoothed statistical likelihood ratio[C]. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Salt Lake City, 2001: 737-740. doi: 10.1109/ICASSP.2001.941020.
    RAMIREZ J, SEGURA J, BENITEZ C, et al. Statistical voice activity detection using a multiple observation likelihood ratio test[J]. IEEE Signal Process Letters, 2005, 12(10): 689-692. doi: 10.1109/LSP.2005.855551.
    RAMIREZ J, SEGURA J C, GORRIZ J M, et al. Improved voice activity detection using contextual multiple hypothesis testing for robust speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(8): 2177-2189. doi: 10.1109/TASL.2007.903937.
    ICK K S, HAING J Q, and HYUK C J. Discriminative weight training for a statistical model-based voice activity detection[J]. IEEE Signal Processing Letters, 2008, 15: 170-173. doi: 10.1109/LSP.2007.913595.
    YOUNGJOO S and HOIRIN K. Multiple acoustic model-based discriminative likelihood ratio weighting for voice activity detection[J]. Signal Processing Letters, 2012, 19(8): 507-510. doi: 10.1109/LSP.2012.2204978.
    FERRONI G, BONFIGLI R, PRINCIPI E, et al. A deep neural network approach for voice activity detection in multi-room domestic scenarios[C]. 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, 2015: 1-8. doi: 10.1109/IJCNN.2015.7280510.
    INYOUNG H and JOON HYUK C. Voice activity detection based on statistical model employing deep neural network[C]. 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), 2014: 582-585. doi: 10.1109/IIH-MSP.2014.150.
    TAN Yingwei, LIU Wenju, WEI J, et al. Hybrid SVM/HMM architectures for statistical model-based voice activity detection[C]. 2014 International Joint Conference on Neural Networks (IJCNN), Beijing, 2014: 2875-2878. doi: 10.1109/ IJCNN.2014.6889403.
    何伟俊, 贺前华, 刘杨. 基于子带保留似然比的鲁棒语音激活检测算法[J]. 华中科技大学学报(自然科学版), 2015, 43(11): 78-82. doi: 10.13245/j.hust.151115.
    HE Weijun, HE Qianhua, and LIU Yang. Sub-band reserved likelihood ratio-based robust voice activity detection[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2015, 43(11): 78-82. doi: 10.13245/ j.hust.151115.
    PEARLMAN W A and GRAY R M. Source coding of the discrete Fourier transform[J]. IEEE Transactions on Information Theory, 1978, 24(6): 683-692. doi: 10.1109/TIT. 1978.1055950.
    GERKMANN T and HENDRIKS R C. Unbiased MMSE-based noise power estimation with low complexity and low tracking delay[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(4): 1383-1393. doi: 10.1109/TASL.2011.2180896.
    EPHRAIM Y and MALAH D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1984, 32(6): 1109-1121. doi: 10.1109/ TASSP.1984.1164453.
    赵力. 语音信号处理[M]. 第2版, 北京: 机械工业出版社, 2009: 38-39.
    ZHAO Li. Speech Signal Processing[M]. Second edition, Beijing: China Machine Press, 2009: 38-39.
    MOUSAZADEH S and COHEN I. Voice activity detection in presence of transient noise using spectral clustering[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(6): 1261-1271. doi: 10.1109/TASL.2013.2248717.
    PETSATODIS T, BOUKIS C, and TALANTZIS F. Convex combination of multiple statistical models with application to VAD[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(8): 2314-2327. doi: 10.1109/TASL.2011. 2131131.
  • 加载中
计量
  • 文章访问数:  1081
  • HTML全文浏览量:  146
  • PDF下载量:  352
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-02-04
  • 修回日期:  2016-06-27
  • 刊出日期:  2016-11-19

目录

    /

    返回文章
    返回