2023, 45(10): 3502-3510.
doi: 10.11999/JEIT221041
Abstract:
Parkinson’s disease is a common chronic neurological disease, and dysarthria is one of the early symptoms of this disease. The auxiliary diagnosis and treatment of Parkinson’s disease based on speech is helpful for early detection and observation of the development of this disease. Traditional methods evaluate often Parkinson’s disease by calculating the parameters of speech features (such as Jitter, Shimmer, etc.). However, these features may not fully reflect all pathological phenomena, which affects the accuracy of detection and evaluation. In order to extract better the pathological information from speech of patients with Parkinson’s disease and improve the accuracy of detection and evaluation, a Parkinson’s disease detection method based on masking self-supervised speech feature extraction is proposed. First, Mel spectrogram features are extracted from the original speech of Parkinson’s disease patients, and the global temporal representation with rich pathological features is obtained. Then, partial Mel spectrogram features are masked, and the masked parts are reconstructed by masking self-supervised model, so as to learn a higher-level representation of speech features of Parkinson’s disease patients. In order to solve the problem of the scarcity of Parkinson’s disease speech data, the masking self-supervised model will first be pre-trained on LibriSpeech public data set, and then based on the idea of transfer learning, the pre-trained model will be fine-tuned and weighted summed on Parkinson’s disease speech data. Thus, the feature representation learning performance of the proposed masking self-supervised model can be improved. Finally, random forest classifier and support vector machine classifier are used to classify the extracted speech features to achieve the detection of Parkinson’s disease. The effectiveness of the masking self-supervised model is verified on MaxLittle public data set and our self-collected data set by ten-fold cross-validation. The results show that, compared with the traditional Mel spectrogram feature detection method and other classical self-supervised feature extraction methods, the proposed method has significantly improved the Accuracy, True Positive Rate and True Negative Rate performance.