基于Teager能量算子和经验模态分解的语音端点检测算法

沈希忠; 郑晓修

doi:10.11999/JEIT171014

基于Teager能量算子和经验模态分解的语音端点检测算法

doi: 10.11999/JEIT171014 cstr: 32379.14.JEIT171014

(上海应用技术大学电气与电子工程学院上海 201418)

基金项目:

上海市科委基金(15ZR1440700)

详细信息

作者简介:
沈希忠：男，1968年生，教授，研究方向为信号处理. 郑晓修：男，1989年生，硕士生，研究方向为信号检测技术.

中图分类号: TP391.42
计量
- 文章访问数: 1841
- HTML全文浏览量: 306
- PDF下载量: 135
- 被引次数: 0
出版历程
- 收稿日期: 2017-10-30
- 修回日期: 2018-04-11
- 刊出日期: 2018-07-19

Teager Energy Operator and Empirical Mode Decomposition Based Voice Activity Detection Method

SHEN Xizhong ZHENG Xiaoxiu

Funds:

Foundation of Shanghai Science and Technology Commission of Shanghai Municipality (15ZR1440700)

摘要

摘要: Teager能量算子是近年来提出的非线性方法，具有跟踪时变信号的特点，该文结合该算子和经验模态分解方法，提出一种新的语音端点检测算法，用于寻找合理的语音起始和终止端点。该算法利用经验模态分解，提出本征模态函数的有效性筛选条件，通过筛选本征模态函数，使得该算法能够处理含噪语音信号，同时分解所得单模态特性正好满足TEO算子对单成份能量跟踪的要求，最后利用Hilbert变换解决了可能存在的模态混叠问题。经过这些处理，算法能够处理语音信号中清音段的端点标识，比直接TEO、双门限法效果好。通过大量实验验证了该算法的有效性。
- 语音端点检测 /
- Teager能量算子 /
- 经验模态分解 /
- 本征模态函数 /
- Hilbert变换
Abstract: In recent years, Teager energy operator is proposed as a kind of nonlinear method characterized with tracking a time-varying signal. The operator is combined with empirical mode decomposition, and a new method of voice activity detection is proposed to find the best voice start point and end point. Empirical Mode Decomposition (EMD) is further exploited and some valid choice conditions are constructed to choose the valid intrinsic mode functions. Thus, the method is able to deal with the voice with noise. Also, the character of the single mode of empirical mode decomposition meets the demand of single frequency component required by Teager Energy Operator (TEO). At last, Hilbert transform is added to solve the inherent problem of the mode mixing due to empirical mode decomposition. Based on the above consideration, the proposed method can identify the unvoiced sound with noise, which is better than the direct TEO and double threshold method. Experiments show the validity of the proposed method.
- Voice Activity Detection (VAD)、Teager Energy Operator (TEO)、Empirical Mode Decomposition (EMD)、Intrinsic Mode Function (IMF)、Hilbert transform /

HTML全文

参考文献(18)

[2] KUMAR J and JENA P. Solution to fault detection during power swing using Teager-Kaiser Energy Operator[J]. Arabian Journal for Science and Engineering, 2017, 42(12): 5003-5013.

胡航. 现代语音信号处理[M]. 北京: 电子工业出版社, 2014: 30-48.

[3] BHOWMICK A, CHANDRA M, and BISWAS A. Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition[J]. International Journal of Speech Technology, 2017(4): 1-15.

HAN Xiaohuan and JING Xinxing. Speech endpoint detection based on power spectrum diference and Teager energy operator[J]. Computer Application and Software, 2011, 28(4): 82-83.

LI Jie, ZHOU Ping, and DU Zhiran. Application of short-time TEO energy in noisy speech endpoint[J]. Computer Engineering and Applications, 2013, 49(12): 144-147. doi: 10.3778/j.issn.1002-8331.1110-0479.

WANG Maorong, ZHOU Ping, JING Xinxing, et al. Voice activity detection algorithm based on Mel-TEO in noisy environment[J]. Microelectronics & Computer, 2016, 33(4): 46-49. doi: 10.19304/j.cnki.issn1000-7180.2016.04.010.

WANG Minghe, ZHANG Erhua, TANG Zhenmin, et al. Voice activity detection based on Fisher linear discriminant analysis[J]. Journal of Electronics & Information Technology, 2015, 37(6): 1343-1349. doi: 10.11999/JEIT141122.

LI Ye, ZHANG Renzhi, CUI Huijuan, et al. Voice activity detection with low signal-to-noise rations based on the spectrum entropy[J]. Journal of Tsinghua University (Science and Technology), 2005, 45(10): 1397-1440.

LIU Huan, WANG Jun, LIN Qiguang, et al. A novel speech activity detection algorithm based on the fusion of time and frequency domain features[J]. Journal of Jiangsu University of Science and Technology(Natural Science Edition), 2017, 31(1): 73-78. doi: 10.3969/j.issn.1673-4807.2017.01.014.

[10] WAN Yulong, WANG Xianliang, ZHOU Ruohua, et al. Enhanced voice activity detection based on automatic segmentation and event classification[J]. Journal of Computational Information Systems, 2014, 10(10): 4169-4177.

[11] GHOSH P K, TSIARTAS A, and NARAYANAN S. Robust voice activity detection using long-term signal variability[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(3): 600-613.

LU Zhimao, JIN Hui, ZHANG Chunxiang, et al. Voice activity detection in complex environment based on Hilbert-Huang transform and order statistics filter[J]. Journal of Electronics & Information Technology, 2012, 34(1): 213-217. doi: 10.3724/SP.J.1146.2011.0047.

[13] CHOI Jaehun and CHANG Joonhyuk. Dual-microphone voice activity detection technique based on two-step power level difference ratio[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(6): 1069-1081.

[14] TEAGER H and TEAGER S. Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract[M]. Springer, 1990: 241-261.

[15] KAISER J F. On a simple algorithm to calculate the energy of a signal[C]. IEEE International Conference on Acoustics, New York, USA, 1990: 381-384.

[16] HUANG N E, SHEN Z, LONG S R, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis[J]. Proceedings: Mathematical, Physical and Engineering Sciences, 1998, 454(1971): 903–995.

[17] KIRBAS I and PEKER M. Signal detection based on empirical mode decomposition and Teager-Kaiser energy operator and its application to P and S wave arrival time detection in seismic signal analysis[J]. Neural Computing and Applications, 2017, 28(10): 3035-3045.

ZHENG Jinde, CHENG Junsheng, and YANG Yu. Modified EEMD algorithm and its application[J]. Journal of Vibration and Shock, 2013, 32(21): 21-26.

施引文献

资源附件(0)

访问统计