高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于掩蔽自监督语音特征提取的帕金森病检测方法

季薇 杨茗淇 李云 郑慧芬

季薇, 杨茗淇, 李云, 郑慧芬. 基于掩蔽自监督语音特征提取的帕金森病检测方法[J]. 电子与信息学报. doi: 10.11999/JEIT221041
引用本文: 季薇, 杨茗淇, 李云, 郑慧芬. 基于掩蔽自监督语音特征提取的帕金森病检测方法[J]. 电子与信息学报. doi: 10.11999/JEIT221041
JI Wei, YANG Mingqi, LI Yun, ZHENG Huifen. Parkinson's Disease Detection Method Based on Masked Self-supervised Speech Feature Extraction[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT221041
Citation: JI Wei, YANG Mingqi, LI Yun, ZHENG Huifen. Parkinson's Disease Detection Method Based on Masked Self-supervised Speech Feature Extraction[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT221041

基于掩蔽自监督语音特征提取的帕金森病检测方法

doi: 10.11999/JEIT221041
基金项目: 江苏省高校基础科学(自然科学)重大项目(21KJA520003),江苏省研究生实践创新计划项目(SJCX21_0257)
详细信息
    作者简介:

    季薇:女,博士,教授,硕士生导师,研究方向为机器学习与信号处理的交叉研究、无线通信与通信信号处理等

    杨茗淇:女,硕士生,研究方向为机器学习与信号处理的交叉研究

    李云:男,博士,教授,博士生导师,研究方向为机器学习、特征选择、信息安全等

    郑慧芬:女,博士,主任医师,研究方向帕金森病及相关运动障碍性疾病

    通讯作者:

    李云 liyun@njupt.edu.cn

  • 中图分类号: TN911.7; TP391.4

Parkinson's Disease Detection Method Based on Masked Self-supervised Speech Feature Extraction

Funds: The Basic Scientific (Natural Science) Major Program of the Higher Education Institutions of Jiangsu Province, China (21KJA520003), The Postgraduate Practice and Innovation Program of Jiangsu Province (SJCX21_0257)
  • 摘要: 帕金森病是一种常见的慢性神经系统疾病,构音障碍是帕金森病的早期症状之一。基于语音进行帕金森病的辅助诊疗有助于更早发现病情和观测病情的发展。传统方法常通过对语音特征(如频率微扰、振幅微扰等)的参数计算来进行疾病评估,然而这些特征可能无法全面反映所有的病理现象,从而影响了检测和评估的准确率。为更好地提取帕金森病患者语音中的病理信息,提升检测和评估的准确率,该文提出一种基于掩蔽自监督语音特征提取的帕金森病检测方法。首先,从帕金森病患者的原始语音中提取Mel语谱图特征,得到患者富含病理特征的全局时序化表示;然后,对部分Mel语谱图特征进行掩蔽,并通过掩蔽自监督模型对掩蔽部分进行重构,从而学习到帕金森病患者语音特征的更高级表示。为解决帕金森病语音数据稀缺的问题,该文先在LibriSpeech公开数据集上进行掩蔽自监督模型的预训练,然后基于迁移学习的思想,利用帕金森病语音数据对预训练好的掩蔽自监督模型进行微调和加权求和,以提升该模型特征表示学习的性能。最终,使用随机森林和支持向量机分类器分别对提取好的语音特征进行分类,以实现帕金森病的检测。该文在MaxLittle公开数据集和课题组自采数据集上,采用十折交叉验证的方法验证了所提方法的有效性。结果表明,与传统的Mel语谱图特征检测方法和其他经典的自监督特征提取方法相比,所提方法在准确率、敏感度、特异度性能方面均有明显提升。
  • 图  1  基于掩蔽自监督语音特征提取的帕金森病检测模型及其训练过程

    图  2  掩蔽自监督语音特征提取模型结构

    图  3  以Mel语谱图为例的随机掩蔽规则

    表  1  自采帕金森病语音数据集信息统计

    PDHC
    受试者 [女/男]25/3110/12
    语音样本数 [女/男]25/3125/34
    年龄 [女/男]69.2(7.4)/68.9(8.8)72.6(5.4)/66(9.3)
    患病时间 [女/男]6.7(4.3)/7.8(4.1)
    HY分期 [女/男]2.6(0.4)/2.2(0.6)
    下载: 导出CSV

    表  2  结合支持向量机分类器进行帕金森病检测的实验结果(%)

    方法性别ACCTPRTNR
    Fbank79.576.279.6
    74.270.875.0
    全部78.880.778.6
    Fbank-unsupervised92.589.895.0
    83.581.686.2
    全部90.589.090.5
    MFCC78.563.778.3
    75.067.570.1
    全部77.073.475.6
    MFCC-unsupervised90.181.885.0
    88.575.178.3
    全部89.877.979.1
    Mel80.575.882.5
    75.160.870.1
    全部79.174.981.0
    Mel-unsupervised93.594.192.5
    85.882.887.5
    全部91.589.891.3
    下载: 导出CSV

    表  3  结合随机森林分类器进行帕金森病检测的实验结果(%)

    方法性别ACCTPRTNR
    Fbank78.569.066.3
    69.168.370.0
    全部71.566.465.8
    Fbank-unsupervised83.581.686.3
    73.966.368.4
    全部79.768.163.3
    MFCC78.570.770.0
    62.563.765.4
    全部72.663.263.5
    MFCC-unsupervised87.073.774.1
    77.575.877.0
    全部78.374.470.3
    Mel83.881.780.3
    70.870.072.5
    全部78.562.568.0
    Mel-unsupervised85.079.187.9
    75.870.877.0
    全部82.377.571.0
    下载: 导出CSV

    表  4  MaxLittle数据集上的对比实验结果(%)

    方法ACCTPRTNR
    Mel+SVM80.281.779.4
    CPC86.585.283.4
    APC88.787.387.0
    本文96.296.195.9
    本文-ft298.597.798.2
    本文-ws96.996.695.4
    下载: 导出CSV

    表  5  自采数据集上的对比实验结果(%)

    方法ACCTPRTNR
    Mel+SVM79.174.981.0
    CPC83.785.082.5
    APC87.284.786.9
    本文91.589.091.3
    本文-ft294.293.192.3
    本文-ws93.993.391.5
    下载: 导出CSV
  • [1] BENBA A, JILBAB A, SANDABAD S, et al. Voice signal processing for detecting possible early signs of Parkinson’s disease in patients with rapid eye movement sleep behavior disorder[J]. International Journal of Speech Technology, 2019, 22(1): 121–129. doi: 10.1007/s10772-018-09588-0
    [2] SUPHINNAPONG P, PHOKAEWVARANGKUL O, THUBTHONG N, et al. Objective vowel sound characteristics and their relationship with motor dysfunction in Asian Parkinson's disease patients[J]. Journal of the Neurological Sciences, 2021, 426: 117487. doi: 10.1016/j.jns.2021.117487
    [3] KING N O, ANDERSON C J, and DORVAL A D. Deep brain stimulation exacerbates hypokinetic dysarthria in a rat model of Parkinson’s disease[J]. Journal of Neuroscience Research, 2016, 94(2): 128–138. doi: 10.1002/jnr.23679
    [4] 沈珺, 张天宇, 黄菲菲, 等. 帕金森病构音障碍声学特点的初步探索[J]. 中华神经科杂志, 2019, 52(8): 613–619. doi: 10.3760/cma.j.issn.1006-7876.2019.08.003

    SHEN Jun, ZHANG Tianyu, HUANG Feifei, et al. Study of voice disorder based on acoustic assessment in Parkinson's disease[J]. Chinese Journal of Neurology, 2019, 52(8): 613–619. doi: 10.3760/cma.j.issn.1006-7876.2019.08.003
    [5] SCHALLING E, JOHANSSON K, and HARTELIUS L. Speech and communication changes reported by people with Parkinson’s disease[J]. Folia Phoniatrica et Logopaedica, 2017, 69(3): 131–141. doi: 10.1159/000479927
    [6] LITTLE M A, MCSHARRY P E, HUNTER E J, et al. Suitability of dysphonia measurements for telemonitoring of Parkinson's disease[J]. IEEE Transactions on Biomedical Engineering, 2009, 56(4): 1015–1022. doi: 10.1109/TBME.2008.2005954
    [7] TSANAS A, LITTLE M A, MCSHARRY P E, et al. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease[J]. IEEE Transactions on Biomedical Engineering, 2012, 59(5): 1264–1271. doi: 10.1109/TBME.2012.2183367
    [8] MORO-VELAZQUEZ L, GOMEZ-GARCIA J A, GODINO-LLORENTE J I, et al. A forced Gaussians based methodology for the differential evaluation of Parkinson’s Disease by means of speech processing[J]. Biomedical Signal Processing and Control, 2019, 48: 205–220. doi: 10.1016/j.bspc.2018.10.020
    [9] KANINIKA and TAYAL A. Determination of Parkinson’s disease utilizing machine learning methods[C]. Proceedings of 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India, 2018: 170–173.
    [10] KARAMAN O, ÇAKIN H, ALHUDHAIF A, et al. Robust automated Parkinson disease detection based on voice signals with transfer learning[J]. Expert Systems with Applications, 2021, 178: 115013. doi: 10.1016/j.eswa.2021.115013
    [11] FRID A, SAFRA E J, HAZAN H, et al. Computational diagnosis of Parkinson's disease directly from natural speech using machine learning techniques[C]. Proceedings of 2014 IEEE International Conference on Software Science, Technology and Engineering, Ramat Gan, Israel, 2014: 50–53.
    [12] RAHMAN A, RIZVI S S, KHAN A, et al. Parkinson's disease diagnosis in cepstral domain using MFCC and dimensionality reduction with SVM classifier[J]. Mobile Information Systems, 2021, 2021: 8822069. doi: 10.1155/2021/8822069
    [13] LI Yongming, ZHANG Xinyue, WANG Pin, et al. Insight into an unsupervised two-step sparse transfer learning algorithm for speech diagnosis of Parkinson's disease[J]. Neural Computing and Applications, 2021, 33(15): 9733–9750. doi: 10.1007/s00521-021-05741-0
    [14] JIANG Dongwei, LI Wubo, CAO Miao, et al. Speech SimCLR: Combining contrastive and reconstruction objective for self-supervised speech representation learning[C]. Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 2021: 1544–1548.
    [15] VAN DEN OORD A, LI Yazhe, and VINYALS O. Representation learning with contrastive predictive coding[J]. arXiv: 1807.03748, 2018. (查阅所有网上资料, 请联系作者核对文献类型及格式)
    [16] CHUNG Y A, HSU W N, TANG Hao, et al. An unsupervised autoregressive model for speech representation learning[C]. Proceedings of the Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 2019: 146–150.
    [17] DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, America, 2019: 4171–4186.
    [18] JIANG Dongwei, LI Wubo, ZHANG Ruixiong, et al. A further study of unsupervised pretraining for transformer based speech recognition[C]. Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021: 6538–6542.
    [19] LIU A H, CHUNG Y A, and GLASS J R. Non-autoregressive predictive coding for learning speech representations from local dependencies[C]. Proceedings of the Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 2021: 3730–3734.
    [20] LIU A T, LI Shangwei, and LEE H Y. TERA: Self-supervised learning of transformer encoder representation for speech[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 2351–2366. doi: 10.1109/TASLP.2021.3095662
    [21] 赵力. 语音信号处理[M]. 3版. 北京: 机械工业出版社, 2016. (查阅所有网上资料, 未找到对应的页码信息, 请联系作者确认)

    ZHAO Li. Speech Signal Processing[M]. 3rd ed. Beijing: China Machine Press, 2016.
    [22] 张涛, 蒋培培, 张亚娟, 等. 基于时频混合域局部统计的帕金森病语音障碍分析方法研究[J]. 生物医学工程学杂志, 2021, 38(1): 21–29. doi: 10.7507/1001-5515.202001024

    ZHANG Tao, JIANG Peipei, ZHANG Yajuan, et al. Parkinson's disease diagnosis based on local statistics of speech signal in time-frequency domain[J]. Journal of Biomedical Engineering, 2021, 38(1): 21–29. doi: 10.7507/1001-5515.202001024
    [23] PANAYOTOV V, CHEN Guoguo, POVEY D, et al. Librispeech: An ASR corpus based on public domain audio books[C]. Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Australia, 2015: 5206–5210.
    [24] LIU A T, YANG Shuwen, CHI P H, et al. Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders[C]. Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020: 6419–6423.
    [25] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010.
    [26] PHAM N Q, NGUYEN T S, NIEHUES J, et al. Very deep self-attention networks for end-to-end speech recognition[C]. Proceedings of the Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 2019: 66–70.
    [27] LITTLE M A, MCSHARRY P E, ROBERTS S J, et al. Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection[J]. BioMedical Engineering OnLine, 2007, 6: 23. doi: 10.1186/1475-925X-6-23
    [28] KINGMA D and BA J. Adam: A method for stochastic optimization[C]. Proceedings of the 3rd International Conference for Learning Representations, San Diego, USA, 2015.
  • 加载中
图(3) / 表(5)
计量
  • 文章访问数:  35
  • HTML全文浏览量:  14
  • PDF下载量:  3
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-08-09
  • 录用日期:  2023-01-13
  • 修回日期:  2023-01-13
  • 网络出版日期:  2023-01-17

目录

    /

    返回文章
    返回