基于掩蔽自监督语音特征提取的帕金森病检测方法

季薇; 杨茗淇; 李云; 郑慧芬

doi:10.11999/JEIT221041

基于掩蔽自监督语音特征提取的帕金森病检测方法

doi: 10.11999/JEIT221041 cstr: 32379.14.JEIT221041

季薇¹,
杨茗淇¹,
李云^2, ,,
郑慧芬³

1.
南京邮电大学通信与信息工程学院南京 210003
2.
南京邮电大学计算机学院南京 210023
3.
南京医科大学附属老年医院南京 210024

基金项目: 江苏省高校基础科学(自然科学)重大项目(21KJA520003)，江苏省研究生实践创新计划项目(SJCX21_0257)

详细信息

作者简介:
季薇：女，博士，教授，硕士生导师，研究方向为机器学习与信号处理的交叉研究、无线通信与通信信号处理等

杨茗淇：女，硕士生，研究方向为机器学习与信号处理的交叉研究

李云：男，博士，教授，博士生导师，研究方向为机器学习、特征选择、信息安全等

郑慧芬：女，博士，主任医师，研究方向帕金森病及相关运动障碍性疾病

通讯作者:
李云　liyun@njupt.edu.cn

中图分类号: TN911.7; TP391.4
计量
- 文章访问数: 1625
- HTML全文浏览量: 1047
- PDF下载量: 139
- 被引次数: 0
出版历程
- 收稿日期: 2022-08-09
- 修回日期: 2023-01-13
- 录用日期: 2023-01-13
- 网络出版日期: 2023-01-17
- 刊出日期: 2023-10-31

Parkinson's Disease Detection Method Based on Masked Self-supervised Speech Feature Extraction

1.
School of Communication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2.
School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
3.
Affiliated Geriatric Hospital of Nanjing Medical University, Nanjing 210024, China

Funds: The Basic Scientific (Natural Science) Major Program of the Higher Education Institutions of Jiangsu Province, China (21KJA520003), The Postgraduate Practice and Innovation Program of Jiangsu Province (SJCX21_0257)

摘要

摘要: 帕金森病是一种常见的慢性神经系统疾病，构音障碍是帕金森病的早期症状之一。基于语音进行帕金森病的辅助诊疗有助于更早发现病情和观测病情的发展。传统方法常通过对语音特征(如频率微扰、振幅微扰等)的参数计算来进行疾病评估，然而这些特征可能无法全面反映所有的病理现象，从而影响了检测和评估的准确率。为更好地提取帕金森病患者语音中的病理信息，提升检测和评估的准确率，该文提出一种基于掩蔽自监督语音特征提取的帕金森病检测方法。首先，从帕金森病患者的原始语音中提取Mel语谱图特征，得到患者富含病理特征的全局时序化表示；然后，对部分Mel语谱图特征进行掩蔽，并通过掩蔽自监督模型对掩蔽部分进行重构，从而学习到帕金森病患者语音特征的更高级表示。为解决帕金森病语音数据稀缺的问题，该文先在LibriSpeech公开数据集上进行掩蔽自监督模型的预训练，然后基于迁移学习的思想，利用帕金森病语音数据对预训练好的掩蔽自监督模型进行微调和加权求和，以提升该模型特征表示学习的性能。最终，使用随机森林和支持向量机分类器分别对提取好的语音特征进行分类，以实现帕金森病的检测。该文在MaxLittle公开数据集和课题组自采数据集上，采用10折交叉验证的方法验证了所提方法的有效性。结果表明，与传统的Mel语谱图特征检测方法和其他经典的自监督特征提取方法相比，所提方法在准确率、敏感度、特异度性能方面均有明显提升。
- 帕金森病 /
- 自监督学习 /
- 迁移学习 /
- 特征提取
Abstract: Parkinson’s disease is a common chronic neurological disease, and dysarthria is one of the early symptoms of this disease. The auxiliary diagnosis and treatment of Parkinson’s disease based on speech is helpful for early detection and observation of the development of this disease. Traditional methods evaluate often Parkinson’s disease by calculating the parameters of speech features (such as Jitter, Shimmer, etc.). However, these features may not fully reflect all pathological phenomena, which affects the accuracy of detection and evaluation. In order to extract better the pathological information from speech of patients with Parkinson’s disease and improve the accuracy of detection and evaluation, a Parkinson’s disease detection method based on masking self-supervised speech feature extraction is proposed. First, Mel spectrogram features are extracted from the original speech of Parkinson’s disease patients, and the global temporal representation with rich pathological features is obtained. Then, partial Mel spectrogram features are masked, and the masked parts are reconstructed by masking self-supervised model, so as to learn a higher-level representation of speech features of Parkinson’s disease patients. In order to solve the problem of the scarcity of Parkinson’s disease speech data, the masking self-supervised model will first be pre-trained on LibriSpeech public data set, and then based on the idea of transfer learning, the pre-trained model will be fine-tuned and weighted summed on Parkinson’s disease speech data. Thus, the feature representation learning performance of the proposed masking self-supervised model can be improved. Finally, random forest classifier and support vector machine classifier are used to classify the extracted speech features to achieve the detection of Parkinson’s disease. The effectiveness of the masking self-supervised model is verified on MaxLittle public data set and our self-collected data set by ten-fold cross-validation. The results show that, compared with the traditional Mel spectrogram feature detection method and other classical self-supervised feature extraction methods, the proposed method has significantly improved the Accuracy, True Positive Rate and True Negative Rate performance.
- Parkinson’s disease /
- Self-supervised learning /
- Transfer learning /
- Feature extraction

HTML全文

图 1 基于掩蔽自监督语音特征提取的帕金森病检测模型及其训练过程

下载: 全尺寸图片幻灯片

图 2 掩蔽自监督语音特征提取模型结构

下载: 全尺寸图片幻灯片

图 3 以Mel语谱图为例的随机掩蔽规则

下载: 全尺寸图片幻灯片

表 1 自采帕金森病语音数据集信息统计

	PD	HC
受试者 [女/男]	25/31	10/12
语音样本数 [女/男]	25/31	25/34
年龄 [女/男]	69.2(7.4)/68.9(8.8)	72.6(5.4)/66(9.3)
患病时间 [女/男]	6.7(4.3)/7.8(4.1)	–
HY分期 [女/男]	2.6(0.4)/2.2(0.6)	–

下载: 导出CSV

表 2 结合支持向量机分类器进行帕金森病检测的实验结果(%)

方法	性别	ACC	TPR	TNR
Fbank	男	79.5	76.2	79.6
	女	74.2	70.8	75.0
	全部	78.8	80.7	78.6
Fbank-unsupervised	男	92.5	89.8	95.0
	女	83.5	81.6	86.2
	全部	90.5	89.0	90.5
MFCC	男	78.5	63.7	78.3
	女	75.0	67.5	70.1
	全部	77.0	73.4	75.6
MFCC-unsupervised	男	90.1	81.8	85.0
	女	88.5	75.1	78.3
	全部	89.8	77.9	79.1
Mel	男	80.5	75.8	82.5
	女	75.1	60.8	70.1
	全部	79.1	74.9	81.0
Mel-unsupervised	男	93.5	94.1	92.5
	女	85.8	82.8	87.5
	全部	91.5	89.8	91.3

下载: 导出CSV

表 3 结合随机森林分类器进行帕金森病检测的实验结果(%)

方法	性别	ACC	TPR	TNR
Fbank	男	78.5	69.0	66.3
	女	69.1	68.3	70.0
	全部	71.5	66.4	65.8
Fbank-unsupervised	男	83.5	81.6	86.3
	女	73.9	66.3	68.4
	全部	79.7	68.1	63.3
MFCC	男	78.5	70.7	70.0
	女	62.5	63.7	65.4
	全部	72.6	63.2	63.5
MFCC-unsupervised	男	87.0	73.7	74.1
	女	77.5	75.8	77.0
	全部	78.3	74.4	70.3
Mel	男	83.8	81.7	80.3
	女	70.8	70.0	72.5
	全部	78.5	62.5	68.0
Mel-unsupervised	男	85.0	79.1	87.9
	女	75.8	70.8	77.0
	全部	82.3	77.5	71.0

下载: 导出CSV

表 4 MaxLittle数据集上的对比实验结果(%)

方法	ACC	TPR	TNR
Mel+SVM	80.2	81.7	79.4
CPC	86.5	85.2	83.4
APC	88.7	87.3	87.0
本文	96.2	96.1	95.9
本文-ft2	98.5	97.7	98.2
本文-ws	96.9	96.6	95.4

下载: 导出CSV

表 5 自采数据集上的对比实验结果(%)

方法	ACC	TPR	TNR
Mel+SVM	79.1	74.9	81.0
CPC	83.7	85.0	82.5
APC	87.2	84.7	86.9
本文	91.5	89.0	91.3
本文-ft2	94.2	93.1	92.3
本文-ws	93.9	93.3	91.5

下载: 导出CSV

参考文献(28)

[1]	BENBA A, JILBAB A, SANDABAD S, et al. Voice signal processing for detecting possible early signs of Parkinson’s disease in patients with rapid eye movement sleep behavior disorder[J]. International Journal of Speech Technology, 2019, 22(1): 121–129. doi: 10.1007/s10772-018-09588-0
[2]	SUPHINNAPONG P, PHOKAEWVARANGKUL O, THUBTHONG N, et al. Objective vowel sound characteristics and their relationship with motor dysfunction in Asian Parkinson's disease patients[J]. Journal of the Neurological Sciences, 2021, 426: 117487. doi: 10.1016/j.jns.2021.117487
[3]	KING N O, ANDERSON C J, and DORVAL A D. Deep brain stimulation exacerbates hypokinetic dysarthria in a rat model of Parkinson’s disease[J]. Journal of Neuroscience Research, 2016, 94(2): 128–138. doi: 10.1002/jnr.23679
[4]	沈珺, 张天宇, 黄菲菲, 等. 帕金森病构音障碍声学特点的初步探索[J]. 中华神经科杂志, 2019, 52(8): 613–619. doi: 10.3760/cma.j.issn.1006-7876.2019.08.003 SHEN Jun, ZHANG Tianyu, HUANG Feifei, et al. Study of voice disorder based on acoustic assessment in Parkinson's disease[J]. Chinese Journal of Neurology, 2019, 52(8): 613–619. doi: 10.3760/cma.j.issn.1006-7876.2019.08.003
[5]	SCHALLING E, JOHANSSON K, and HARTELIUS L. Speech and communication changes reported by people with Parkinson’s disease[J]. Folia Phoniatrica et Logopaedica, 2017, 69(3): 131–141. doi: 10.1159/000479927
[6]	LITTLE M A, MCSHARRY P E, HUNTER E J, et al. Suitability of dysphonia measurements for telemonitoring of Parkinson's disease[J]. IEEE Transactions on Biomedical Engineering, 2009, 56(4): 1015–1022. doi: 10.1109/TBME.2008.2005954
[7]	TSANAS A, LITTLE M A, MCSHARRY P E, et al. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease[J]. IEEE Transactions on Biomedical Engineering, 2012, 59(5): 1264–1271. doi: 10.1109/TBME.2012.2183367
[8]	MORO-VELAZQUEZ L, GOMEZ-GARCIA J A, GODINO-LLORENTE J I, et al. A forced Gaussians based methodology for the differential evaluation of Parkinson’s Disease by means of speech processing[J]. Biomedical Signal Processing and Control, 2019, 48: 205–220. doi: 10.1016/j.bspc.2018.10.020
[9]	KANINIKA and TAYAL A. Determination of Parkinson’s disease utilizing machine learning methods[C]. 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India, 2018: 170–173.
[10]	KARAMAN O, ÇAKIN H, ALHUDHAIF A, et al. Robust automated Parkinson disease detection based on voice signals with transfer learning[J]. Expert Systems with Applications, 2021, 178: 115013. doi: 10.1016/j.eswa.2021.115013
[11]	FRID A, SAFRA E J, HAZAN H, et al. Computational diagnosis of Parkinson's disease directly from natural speech using machine learning techniques[C]. 2014 IEEE International Conference on Software Science, Technology and Engineering, Ramat Gan, Israel, 2014: 50–53.
[12]	RAHMAN A, RIZVI S S, KHAN A, et al. Parkinson's disease diagnosis in cepstral domain using MFCC and dimensionality reduction with SVM classifier[J]. Mobile Information Systems, 2021, 2021: 8822069. doi: 10.1155/2021/8822069
[13]	LI Yongming, ZHANG Xinyue, WANG Pin, et al. Insight into an unsupervised two-step sparse transfer learning algorithm for speech diagnosis of Parkinson's disease[J]. Neural Computing and Applications, 2021, 33(15): 9733–9750. doi: 10.1007/s00521-021-05741-0
[14]	JIANG Dongwei, LI Wubo, CAO Miao, et al. Speech SimCLR: Combining contrastive and reconstruction objective for self-supervised speech representation learning[C]. The Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 2021: 1544–1548.
[15]	VAN DEN OORD A, LI Yazhe, and VINYALS O. Representation learning with contrastive predictive coding[J]. arXiv: 1807.03748, 2018. doi: 10.48550/arXiv.1807.03748.
[16]	CHUNG Y A, HSU W N, TANG Hao, et al. An unsupervised autoregressive model for speech representation learning[C]. The Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 2019: 146–150.
[17]	DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]. The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, America, 2019: 4171–4186.
[18]	JIANG Dongwei, LI Wubo, ZHANG Ruixiong, et al. A further study of unsupervised pretraining for transformer based speech recognition[C]. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021: 6538–6542.
[19]	LIU A H, CHUNG Y A, and GLASS J R. Non-autoregressive predictive coding for learning speech representations from local dependencies[C]. The Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 2021: 3730–3734.
[20]	LIU A T, LI Shangwei, and LEE H Y. TERA: Self-supervised learning of transformer encoder representation for speech[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 2351–2366. doi: 10.1109/TASLP.2021.3095662
[21]	赵力. 语音信号处理[M]. 3版. 北京: 机械工业出版社, 2016. ZHAO Li. Speech Signal Processing[M]. 3rd ed. Beijing: China Machine Press, 2016.
[22]	张涛, 蒋培培, 张亚娟, 等. 基于时频混合域局部统计的帕金森病语音障碍分析方法研究[J]. 生物医学工程学杂志, 2021, 38(1): 21–29. doi: 10.7507/1001-5515.202001024 ZHANG Tao, JIANG Peipei, ZHANG Yajuan, et al. Parkinson's disease diagnosis based on local statistics of speech signal in time-frequency domain[J]. Journal of Biomedical Engineering, 2021, 38(1): 21–29. doi: 10.7507/1001-5515.202001024
[23]	PANAYOTOV V, CHEN Guoguo, POVEY D, et al. Librispeech: An ASR corpus based on public domain audio books[C]. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Australia, 2015: 5206–5210.
[24]	LIU A T, YANG Shuwen, CHI P H, et al. Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders[C]. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020: 6419–6423.
[25]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010.
[26]	PHAM N Q, NGUYEN T S, NIEHUES J, et al. Very deep self-attention networks for end-to-end speech recognition[C]. The Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 2019: 66–70.
[27]	LITTLE M A, MCSHARRY P E, ROBERTS S J, et al. Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection[J]. BioMedical Engineering OnLine, 2007, 6: 23. doi: 10.1186/1475-925X-6-23
[28]	KINGMA D and BA J. Adam: A method for stochastic optimization[C]. The 3rd International Conference for Learning Representations, San Diego, USA, 2015.