Citation: | SUN Linhui, WANG Can, LIANG Wenqing, LI Ping’an. Monaural Speech Separation Method Based on Deep Learning Feature Fusion and Joint Constraints[J]. Journal of Electronics & Information Technology, 2022, 44(9): 3266-3276. doi: 10.11999/JEIT210606 |
[1] |
田元荣, 王星, 周一鹏. 一种新的基于稀疏表示的单通道盲源分离算法[J]. 电子与信息学报, 2017, 39(6): 1371–1378. doi: 10.11999/JEIT160888
TIAN Yuanrong, WANG Xing, and ZHOU Yipeng. Novel single channel blind source separation algorithm based on sparse representation[J]. Journal of Electronics &Information Technology, 2017, 39(6): 1371–1378. doi: 10.11999/JEIT160888
|
[2] |
付卫红, 张琮. 基于步长自适应的独立向量分析卷积盲分离算法[J]. 电子与信息学报, 2018, 40(9): 2158–2164. doi: 10.11999/JEIT171156
FU Weihong and ZHANG Cong. Independent vector analysis convolutive blind separation algorithm based on step-size adaptive[J]. Journal of Electronics &Information Technology, 2018, 40(9): 2158–2164. doi: 10.11999/JEIT171156
|
[3] |
李红光, 郭英, 张东伟, 等. 基于欠定盲源分离的同步跳频信号网台分选[J]. 电子与信息学报, 2021, 43(2): 319–328. doi: 10.11999/JEIT190920
LI Hongguang, GUO Ying, ZHANG Dongwei, et al. Synchronous frequency hopping signal network station sorting based on underdetermined blind source separation[J]. Journal of Electronics &Information Technology, 2021, 43(2): 319–328. doi: 10.11999/JEIT190920
|
[4] |
UDREA R M, CIOCHINA S, and VIZIREANU D N. Multi-band bark scale spectral over-subtraction for colored noise reduction[C]. International Symposium on Signals, Circuits and Systems, Iasi, Romania, 2005: 311–314.
|
[5] |
CHEN Jingdong, BENESTY J, HUANG Yiteng, et al. New insights into the noise reduction wiener filter[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218–1234. doi: 10.1109/TSA.2005.860851
|
[6] |
WIEM B, ANOUAR B M M, and AICHA B. Monaural speech separation based on linear regression optimized using gradient descent[C]. 2020 5th International Conference on Advanced Technologies for Signal and Image Processing, Sousse, Tunisia, 2020: 1–6.
|
[7] |
WANG Chunpeng and ZHU Jie. Neural network based phase compensation methods on monaural speech separation[C]. 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 2019: 1384–1389.
|
[8] |
SUN Yang, WANG Wenwu, CHAMBERS J, et al. Two-stage monaural source separation in reverberant room environments using deep neural networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(1): 125–139. doi: 10.1109/TASLP.2018.2874708
|
[9] |
XIAN Yang, SUN Yang, WANG Wenwu, et al. Two stage audio-video speech separation using multimodal convolutional neural networks[C]. 2019 Sensor Signal Processing for Defence Conference (SSPD), Brighton, UK, 2019: 1–5.
|
[10] |
LIU Yuzhou, DELFARAH M, and WANG Deliang. Deep casa for talker-independent monaural speech separation[C]. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020: 6354–6358.
|
[11] |
WANG Deliang. On ideal binary mask as the computational goal of auditory scene analysis[M]. DIVENYI P. Speech Separation by Humans and Machines. New York: Springer, 2005, 60: 63–64.
|
[12] |
KIM G, LU Yang, HU Yi, et al. An algorithm that improves speech intelligibility in noise for normal-hearing listeners[J]. The Journal of the Acoustical Society of America, 2009, 126(3): 1486–1494. doi: 10.1121/1.3184603
|
[13] |
HAN Kun and WANG Deliang. A classification based approach to speech segregation[J]. The Journal of the Acoustical Society of America, 2012, 132(5): 3475–3483. doi: 10.1121/1.4754541
|
[14] |
SRINIVASAN S, ROMAN N, and WANG Deliang. Binary and ratio time-frequency masks for robust speech recognition[J]. Speech Communication, 2006, 48(11): 1486–1501. doi: 10.1016/j.specom.2006.09.003
|
[15] |
ZHANG Xiaolei and WANG Deliang. A deep ensemble learning method for monaural speech separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(5): 967–977. doi: 10.1109/TASLP.2016.2536478
|
[16] |
HUANG Posen, KIM N, HASEGAWA-JOHNSON M, et al. Joint optimization of masks and deep recurrent neural networks for monaural source separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(12): 2136–2147. doi: 10.1109/TASLP.2015.2468583
|
[17] |
DU Jun, TU Yanhui, DAI Lirong, et al. A regression approach to single-channel speech separation via high-resolution deep neural networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(8): 1424–1437. doi: 10.1109/TASLP.2016.2558822
|
[18] |
WANG Yannan, DU Jun, DAI Lirong, et al. A gender mixture detection approach to unsupervised single-channel speech separation based on deep neural networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(7): 1535–1546. doi: 10.1109/TASLP.2017.2700540
|
[19] |
LI Xiang, WU Xihong, and CHEN Jing. A spectral-change-aware loss function for DNN-based speech separation[C]. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019: 6870–6874.
|
[20] |
SUN Linhui, ZHU Ge, and LI Ping’an. Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation[J]. Signal, Image and Video Processing, 2020, 14(7): 1387–1395. doi: 10.1007/s11760-020-01676-6
|
[21] |
COOKE M, BARKER J, CUNNINGHAM S, et al. An audio-visual corpus for speech perception and automatic speech recognition[J]. The Journal of the Acoustical Society of America, 2006, 120(5): 2421–2424. doi: 10.1121/1.2229005
|