基于自适应逼近残差的稀疏表示语音降噪方法

周伟力; 贺前华; 王亚楼; 庞文丰

doi:10.11999/JEIT160369

基于自适应逼近残差的稀疏表示语音降噪方法

doi: 10.11999/JEIT160369 cstr: 32379.14.JEIT160369

基金项目:

国家自然科学基金(61571192)，广东省公益项目(2015A010103003)

计量
- 文章访问数: 1737
- HTML全文浏览量: 238
- PDF下载量: 517
- 被引次数: 0
出版历程
- 收稿日期: 2016-04-18
- 修回日期: 2016-08-25
- 刊出日期: 2017-02-19

Adapted Stopping Residue Error Based Sparse Representation for Speech Denoising

Funds:

The National Natural Science Foundation of China (61571192), The Science and Technology Foundation of Guangdong Province (2015A010103003)

摘要

摘要: 该文提出一种基于自适应逼近残差的稀疏表示语音降噪方法。在字典学习阶段基于K奇异值分解(K-Singular Value Decomposition, K-SVD)算法获得干净语音谱的过完备字典，在稀疏表示阶段基于权重因子调整后的噪声谱和估计的交叉项对逼近残差持续自适应地更新，并采用正交匹配追踪(Orthogonal Matching Pursuit, OMP)方法对干净语音谱进行稀疏重构。最后结合估计的干净语音谱与带噪语音相位，通过傅里叶逆变换获得重构的干净语音。实验结果表明所提方法在不同噪声和信噪比条件下相比标准的谱减法，稀疏表示语音降噪算法和基于自回归隐马尔可夫模型的降噪方法有更好的降噪效果。
- 语音降噪 /
- 稀疏表示 /
- K奇异值分解 /
- 正交匹配追踪
Abstract: A sparse representation speech denoising method based on adapted stopping residue error is proposed. Firstly, an over complete dictionary of the clean speech power spectrum is learned by the K-Singular Value Decomposition (K-SVD) algorithm. In the sparse representation stage, the stopping residue error is adaptively achieved according to the estimated cross terms and the noise spectrum which is adjusted by a weighted factor, and the Orthogonal Matching Pursuit (OMP) approach is applied to reconstruct the clean speech spectrum from the noisy speech. Finally, the clean speech is re-synthesis via the inverse Fourier transform with the reconstructed speech spectrum and the noisy speech phase. The experiment results show that the proposed method outperforms the standard spectral subtraction, sparse representation based speech denoising algorithm and the AutoRegressive Hidden Markov Model (AR-HMM) based speech denoising method in terms of subjective and objective measure.
- Speech denoising /
- Sparse representation /
- K-Singular Value Decomposition (K-SVD) /
- Orthogonal Matching Pursuit (OMP)

HTML全文

参考文献(21)

BABY D, VIRTANEN T, GEMMEKE J F, et al. Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2015, 23(11): 1788-1799. doi: 10.1109/TASLP.2015.2450491.

ZHOU W L and HE Q H. Non-intrusive speech quality objective evaluation in high-noise environments[C]. IEEE China Summit and International Conference on Signal and Information Processing, Chengdu, China, 2015: 50-54. doi: 10.1109/ChinaSIP.2015.7230360.

KODRASI I, MARQUARDT D, and DOCLO S. Curvature-based optimization of the trade-off parameter in the speech distortion weighted multichannel wiener filter[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, Australia, 2015: 315-319. doi: 10.1109/ICASSP.2015.7177982.

MARTIN R. Noise power spectral density estimation based on optimal smoothing and minimum statistics[J]. IEEE Transactions on Speech and Language Processing, 2001, 9(5): 504-512. doi: 10.1109/89.928915.

GERKMANN T. MMSE-optimal enhancement of complex speech coefficients with uncertain prior knowledge of the clean speech phase[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, 2014: 4478-4482. doi: 10.1109/ICASSP.2014.6854449.

DAVID Y and KLEIJN W B. HMM-based gain modeling for enhancement of speech in noise[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(3): 882-892. 10.1109/TASL.2006.885256.

EVANA N, MASON J, LIU W, et al. An assessment on the fundamental limitations of spectral subtraction[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Toulous, France, 2006: 145-148. doi: 10.1109/ ICASSP.2006.1659978.

HILMAN F, KOJI I, and KOICHI S. Feature normalization based on non-extensive statistics for speech recognition[J]. Speech Communication, 2013, 55(5): 587-599. doi: 10.1016/ j.specom.2013.02.004.

HSIEH C T, HUANG P Y, CHEN Y H, et al. Speech enhancement based on sparse representation under color noisy environment[C]. International Symposium on Intelligent Signal Processing and Communication Systems, Nusa Dua, Indonesia, 2015: 134-138. doi: 10.1109/ISPACS. 2015.7432752.

孙林慧, 杨震. 基于数据驱动字典和稀疏表示的语音增强[J]. 信号处理, 2011, 27(12): 1793-1800.

SUN L H and YANG Z. Speech enhancement based on datadriven dictionary and sparse representation[J]. Signal Processing, 2011, 27(12): 1793-1800.

ZHAO Y P, ZHAO X H, and WANG B. A speech enhancement method employing sparse representation of power spectral density[J]. Journal of Information and Computational Science, 2013, 10(6): 1705-1714.

ZHAO N, XU X, and YANG Y. Sparse representations for speech enhancement[J]. Chinese Journal of Electronics, 2011, 19(2): 268-272.

SIGG C D, DIKK T, and BUHMANN J M. Speech enhancement using generative dictionary learning[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(6): 1698-1712. doi: 10.1109/TASL.2012.2187194.

ZHAO Y P and WANG B. A speech enhancement method based on sparse reconstruction of power spectral density [J]. Computers Electrical Engineering, 2014, 40(4): 1705-1714. doi: 10.1016/j.compeleceng.2013.12.007.

LOIZOU P C. Speech Enhancement: Theory and Practice [M]. Florida, US: CRC Press, 2013: 104-106.

RANGACHARI S and LOIZOU P. A noise estimation algorithm for highly nonstationary environments[J]. Speech Communication, 2006, 48(2): 220-231. doi: 10.1016/ j.specom.2006.08.005.

BEROUTI M, SCHWARTZ M, and MAKHOUL J. Enhancement of speech corrupted by acoustic noise[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Washington, US, 1979: 4478-4482. doi: 10.1109/ ICASSP.1979.1170788.

CHANG L H and WU J Y. An improved RIP-based performance guarantee for sparse signal recovery via orthogonal matching pursuit[J]. IEEE Transactions on Information Theory, 2014, 60(9): 5702-5715. doi: 10.1109/ TIT.2014.2338314.

AHARON M and ELAD M. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J]. IEEE Transactions on Signal Processing, 2006, 54(11): 4311-4322. doi: 10.1109/TSP.2006. Signal 881199.

ITU-T. P.862-2001. Perceptual evaluation of speech quality (PESQ): An objective method for end to end speech quality assessment of narrow-band telephone networks and speech codecs[S]. Geneva, ITU-T, 2001.

施引文献

资源附件(0)

访问统计