Voice Activity Detection Based on Fisher Linear Discriminant Analysis

Wang Ming-he; Zhang Er-hua; Tang Zhen-min; Xu Hao

doi:10.11999/JEIT141122

Volume 37 Issue 6

Jun. 2015

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2015 > 37(6): 1343-1349

Wang Ming-he, Zhang Er-hua, Tang Zhen-min, Xu Hao. Voice Activity Detection Based on Fisher Linear Discriminant Analysis[J]. Journal of Electronics & Information Technology, 2015, 37(6): 1343-1349. doi: 10.11999/JEIT141122

Citation:

Wang Ming-he, Zhang Er-hua, Tang Zhen-min, Xu Hao. Voice Activity Detection Based on Fisher Linear Discriminant Analysis[J]. Journal of Electronics & Information Technology, 2015, 37(6): 1343-1349. doi: 10.11999/JEIT141122

Citation:

PDF( 818 KB)

Voice Activity Detection Based on Fisher Linear Discriminant Analysis

doi: 10.11999/JEIT141122 cstr: 32379.14.JEIT141122

Received Date: 2014-08-29
Rev Recd Date: 2014-12-19
Publish Date: 2015-06-19

Abstract

Abstract

Traditional Voice Activity Detection (VAD) approaches can not effectively detect consonant as well as noisy unvoiced consonant. To address this problem, this paper proposes a VAD approach Mel Frequency Cepstrum Coefficient (F-MFCC) based on Fisher linear discriminant analysis, in consideration of two-class issue regarding to consonant and background noise. Fisher criterion rule is used to solve the optimal projection vector, building upon which we can minimize the within-class scatter can be minimized and the between-class scatter can be maximized, as a result to enhance separability between consonant and background noise. Extensive experiments are conducted to evaluate the F-MFCC performance. The results demonstrate that, under different SNR and noise conditions, the proposed approach achieves higher VAD accuracy.
- Speech processing,
- Voice Activity Detection (VAD),
- Mel Frequency Cepstrum Coefficient (MFCC),
- Fisher linear discriminant analysis

FullText(HTML)

References(20)

References

Junqua J C. Robustness and cooperative multi-model man-machine communication applications[C]. The Structure of Multimodal Dialogue, Maratea, Italy, 1991: 101-112.

ETSI. Universal Mobile Telecommunication Systems (UMTS); Mandatory Speech Codec speech processing functions, AMR speech codec; Voice Activity Detector VAD[S]. ETSI TS 126 094 v11.0.0(2012-10): 1-26.

Wan Yu-long, Wang Xian-liang, Zhou Ruo-hua, et al.. Enhanced voice activity detection based on automatic segmentation and event classification[J]. Journal of Computational Information Systems, 2014, 10(10): 4169-4177.

宫朝辉, 刁麓弘. 改进共振峰提取的语音端点检测[J]. 计算机辅助设计与图形学学报, 2013, 25(8): 1230-1236.

Gong Zhao-hui and Diao Lu-hong. Improved speech endpoint detection based on formant[J]. Journal of Computer Aided Design Computer Graphics, 2013, 25(8): 1230-1236.

李晔, 张仁志, 崔慧娟, 等. 低信噪比下基于谱熵的语音端点检测算法[J]. 清华大学学报(自然科学版), 2005, 45(10): 1397-1440.

Li Ye, Zhang Ren-zhi, Cui Hui-juan, et al.. Voice activity detection algorithm with low signal-to-noise ratios based on the spectrum entropy[J]. Journal of Tsinghua University (Science and Technology), 2005, 45(10): 1397-1440.

Chen Shi-huang and Wang Jhing-fa. A wavelet-based voice activity detection algorithm in noisy environments[C]. Proceedings of the 9th IEEE International Conference on Electmnics, Circuits and Systems, Dubrovnik, Croatia, 2002: 995-998.

Ghosh P K, Tsiartas A, and Narayanan S. Robust voice activity detection using long-term signal variability[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(3): 600-613.

王宏志, 徐玉超, 李美静. 基于Mel频率倒谱参数相似度的语音端点检测算法[J]. 吉林大学学报(工学版), 2012, 42(5): 1331-1335.

Wang Hong-zhi, Xu Yu-chao, and Li Mei-jing. Voice activity detection algorithm based on Mel-frequency cepstrum coefficient (MFCC) similarity[J]. Journal of Jilin University (Engineering and Technology Edition), 2012, 42(5): 1331-1335.

Oh Sang-yeob and Chung Kyung-yong. Improvement of speech detection using ERB feature extraction[J]. Wireless Personal Communications, 2014, 79(4): 2439-2451.

卢志茂, 金辉, 张春祥, 等. 基于HHT和OSF的复杂环境语音端点检测[J]. 电子与信息学报, 2012, 34(1): 213-217.

Lu Zhi-mao, Jin Hui, Zhang Chun-xiang, et al.. Voice activity detection in complex environment based on Hilbert-Huang transform and order statistics filter[J]. Journal of Electronics Information Technology, 2012, 34(1): 213-217.

Deng Shi-wen and Han Ji-qing. Statistical voice activity detection based on sparse representation over learned dictionary[J]. Digital Signal Processing, 2013, 23(4): 1228-1232.

Zhang Yan, Tang Zhen-min, Li Yan-ping, et al.. A hierarchical framework approach for voice activity detection and speech enhancement[J]. The Scientific World Journal, 2014, Vol. 2014: Article ID 723643, 8 pages.

Choi Jae-hun and Chang Joon-hyuk. Dual-microphone voice activity detection technique based on two-step power level difference ratio[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(6): 1069-1081.

Ryant N, Liberman M, and Yuan Jia-hong. Speech activity detection on YouTube using deep neural networks[C]. Interspeech: 14th Annual Conference of the International Speech Communication Association, Lyon, France, 2013: 728-731.

Fisher R A. The use of multiple measures in taxonomic problems[J]. Annals of Eugenics, 1936, 7(2): 179-188.

Mak M W and Yu H B. A study of voice activity detection techniques for NIST speaker recognition evaluations[J]. Computer Speech Language, 2014, 28(1): 295-313.

Relative Articles

Supplements(0)

Cited By

Proportional views