Parkinson's Disease Detection Method Based on Masked Self-supervised Speech Feature Extraction

JI Wei; YANG Mingqi; LI Yun; ZHENG Huifen

doi:10.11999/JEIT221041

Volume 45 Issue 10

Oct. 2023

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2023 > 45(10): 3502-3510

JI Wei, YANG Mingqi, LI Yun, ZHENG Huifen. Parkinson's Disease Detection Method Based on Masked Self-supervised Speech Feature Extraction[J]. Journal of Electronics & Information Technology, 2023, 45(10): 3502-3510. doi: 10.11999/JEIT221041

Citation:

JI Wei, YANG Mingqi, LI Yun, ZHENG Huifen. Parkinson's Disease Detection Method Based on Masked Self-supervised Speech Feature Extraction[J]. Journal of Electronics & Information Technology, 2023, 45(10): 3502-3510. doi: 10.11999/JEIT221041

Citation:

PDF( 1721 KB)

Parkinson's Disease Detection Method Based on Masked Self-supervised Speech Feature Extraction

doi: 10.11999/JEIT221041 cstr: 32379.14.JEIT221041

1.
School of Communication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2.
School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
3.
Affiliated Geriatric Hospital of Nanjing Medical University, Nanjing 210024, China

Funds: The Basic Scientific (Natural Science) Major Program of the Higher Education Institutions of Jiangsu Province, China (21KJA520003), The Postgraduate Practice and Innovation Program of Jiangsu Province (SJCX21_0257)

Received Date: 2022-08-09
Accepted Date: 2023-01-13
Rev Recd Date: 2023-01-13

Available Online: 2023-01-17

Publish Date: 2023-10-31

Abstract

Abstract

Parkinson’s disease is a common chronic neurological disease, and dysarthria is one of the early symptoms of this disease. The auxiliary diagnosis and treatment of Parkinson’s disease based on speech is helpful for early detection and observation of the development of this disease. Traditional methods evaluate often Parkinson’s disease by calculating the parameters of speech features (such as Jitter, Shimmer, etc.). However, these features may not fully reflect all pathological phenomena, which affects the accuracy of detection and evaluation. In order to extract better the pathological information from speech of patients with Parkinson’s disease and improve the accuracy of detection and evaluation, a Parkinson’s disease detection method based on masking self-supervised speech feature extraction is proposed. First, Mel spectrogram features are extracted from the original speech of Parkinson’s disease patients, and the global temporal representation with rich pathological features is obtained. Then, partial Mel spectrogram features are masked, and the masked parts are reconstructed by masking self-supervised model, so as to learn a higher-level representation of speech features of Parkinson’s disease patients. In order to solve the problem of the scarcity of Parkinson’s disease speech data, the masking self-supervised model will first be pre-trained on LibriSpeech public data set, and then based on the idea of transfer learning, the pre-trained model will be fine-tuned and weighted summed on Parkinson’s disease speech data. Thus, the feature representation learning performance of the proposed masking self-supervised model can be improved. Finally, random forest classifier and support vector machine classifier are used to classify the extracted speech features to achieve the detection of Parkinson’s disease. The effectiveness of the masking self-supervised model is verified on MaxLittle public data set and our self-collected data set by ten-fold cross-validation. The results show that, compared with the traditional Mel spectrogram feature detection method and other classical self-supervised feature extraction methods, the proposed method has significantly improved the Accuracy, True Positive Rate and True Negative Rate performance.
- Parkinson’s disease,
- Self-supervised learning,
- Transfer learning,
- Feature extraction

FullText(HTML)

References(28)

References

[1]	BENBA A, JILBAB A, SANDABAD S, et al. Voice signal processing for detecting possible early signs of Parkinson’s disease in patients with rapid eye movement sleep behavior disorder[J]. International Journal of Speech Technology, 2019, 22(1): 121–129. doi: 10.1007/s10772-018-09588-0
[2]	SUPHINNAPONG P, PHOKAEWVARANGKUL O, THUBTHONG N, et al. Objective vowel sound characteristics and their relationship with motor dysfunction in Asian Parkinson's disease patients[J]. Journal of the Neurological Sciences, 2021, 426: 117487. doi: 10.1016/j.jns.2021.117487
[3]	KING N O, ANDERSON C J, and DORVAL A D. Deep brain stimulation exacerbates hypokinetic dysarthria in a rat model of Parkinson’s disease[J]. Journal of Neuroscience Research, 2016, 94(2): 128–138. doi: 10.1002/jnr.23679
[4]	沈珺, 张天宇, 黄菲菲, 等. 帕金森病构音障碍声学特点的初步探索[J]. 中华神经科杂志, 2019, 52(8): 613–619. doi: 10.3760/cma.j.issn.1006-7876.2019.08.003 SHEN Jun, ZHANG Tianyu, HUANG Feifei, et al. Study of voice disorder based on acoustic assessment in Parkinson's disease[J]. Chinese Journal of Neurology, 2019, 52(8): 613–619. doi: 10.3760/cma.j.issn.1006-7876.2019.08.003
[5]	SCHALLING E, JOHANSSON K, and HARTELIUS L. Speech and communication changes reported by people with Parkinson’s disease[J]. Folia Phoniatrica et Logopaedica, 2017, 69(3): 131–141. doi: 10.1159/000479927
[6]	LITTLE M A, MCSHARRY P E, HUNTER E J, et al. Suitability of dysphonia measurements for telemonitoring of Parkinson's disease[J]. IEEE Transactions on Biomedical Engineering, 2009, 56(4): 1015–1022. doi: 10.1109/TBME.2008.2005954
[7]	TSANAS A, LITTLE M A, MCSHARRY P E, et al. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease[J]. IEEE Transactions on Biomedical Engineering, 2012, 59(5): 1264–1271. doi: 10.1109/TBME.2012.2183367
[8]	MORO-VELAZQUEZ L, GOMEZ-GARCIA J A, GODINO-LLORENTE J I, et al. A forced Gaussians based methodology for the differential evaluation of Parkinson’s Disease by means of speech processing[J]. Biomedical Signal Processing and Control, 2019, 48: 205–220. doi: 10.1016/j.bspc.2018.10.020
[9]	KANINIKA and TAYAL A. Determination of Parkinson’s disease utilizing machine learning methods[C]. 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India, 2018: 170–173.
[10]	KARAMAN O, ÇAKIN H, ALHUDHAIF A, et al. Robust automated Parkinson disease detection based on voice signals with transfer learning[J]. Expert Systems with Applications, 2021, 178: 115013. doi: 10.1016/j.eswa.2021.115013
[11]	FRID A, SAFRA E J, HAZAN H, et al. Computational diagnosis of Parkinson's disease directly from natural speech using machine learning techniques[C]. 2014 IEEE International Conference on Software Science, Technology and Engineering, Ramat Gan, Israel, 2014: 50–53.
[12]	RAHMAN A, RIZVI S S, KHAN A, et al. Parkinson's disease diagnosis in cepstral domain using MFCC and dimensionality reduction with SVM classifier[J]. Mobile Information Systems, 2021, 2021: 8822069. doi: 10.1155/2021/8822069
[13]	LI Yongming, ZHANG Xinyue, WANG Pin, et al. Insight into an unsupervised two-step sparse transfer learning algorithm for speech diagnosis of Parkinson's disease[J]. Neural Computing and Applications, 2021, 33(15): 9733–9750. doi: 10.1007/s00521-021-05741-0
[14]	JIANG Dongwei, LI Wubo, CAO Miao, et al. Speech SimCLR: Combining contrastive and reconstruction objective for self-supervised speech representation learning[C]. The Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 2021: 1544–1548.
[15]	VAN DEN OORD A, LI Yazhe, and VINYALS O. Representation learning with contrastive predictive coding[J]. arXiv: 1807.03748, 2018. doi: 10.48550/arXiv.1807.03748.
[16]	CHUNG Y A, HSU W N, TANG Hao, et al. An unsupervised autoregressive model for speech representation learning[C]. The Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 2019: 146–150.
[17]	DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]. The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, America, 2019: 4171–4186.
[18]	JIANG Dongwei, LI Wubo, ZHANG Ruixiong, et al. A further study of unsupervised pretraining for transformer based speech recognition[C]. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021: 6538–6542.
[19]	LIU A H, CHUNG Y A, and GLASS J R. Non-autoregressive predictive coding for learning speech representations from local dependencies[C]. The Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 2021: 3730–3734.
[20]	LIU A T, LI Shangwei, and LEE H Y. TERA: Self-supervised learning of transformer encoder representation for speech[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 2351–2366. doi: 10.1109/TASLP.2021.3095662
[21]	赵力. 语音信号处理[M]. 3版. 北京: 机械工业出版社, 2016. ZHAO Li. Speech Signal Processing[M]. 3rd ed. Beijing: China Machine Press, 2016.
[22]	张涛, 蒋培培, 张亚娟, 等. 基于时频混合域局部统计的帕金森病语音障碍分析方法研究[J]. 生物医学工程学杂志, 2021, 38(1): 21–29. doi: 10.7507/1001-5515.202001024 ZHANG Tao, JIANG Peipei, ZHANG Yajuan, et al. Parkinson's disease diagnosis based on local statistics of speech signal in time-frequency domain[J]. Journal of Biomedical Engineering, 2021, 38(1): 21–29. doi: 10.7507/1001-5515.202001024
[23]	PANAYOTOV V, CHEN Guoguo, POVEY D, et al. Librispeech: An ASR corpus based on public domain audio books[C]. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Australia, 2015: 5206–5210.
[24]	LIU A T, YANG Shuwen, CHI P H, et al. Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders[C]. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020: 6419–6423.
[25]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010.
[26]	PHAM N Q, NGUYEN T S, NIEHUES J, et al. Very deep self-attention networks for end-to-end speech recognition[C]. The Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 2019: 66–70.
[27]	LITTLE M A, MCSHARRY P E, ROBERTS S J, et al. Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection[J]. BioMedical Engineering OnLine, 2007, 6: 23. doi: 10.1186/1475-925X-6-23
[28]	KINGMA D and BA J. Adam: A method for stochastic optimization[C]. The 3rd International Conference for Learning Representations, San Diego, USA, 2015.