New Speech-Based Model Detects Early Neurological Disorders with High Accuracy

Jun 27, 2025 | By YANG Lizhuang; ZHAO Weiwei

Recently, the research team led by Prof. LI Hai at the Institute of Health and Medical Technology, the Hefei Institutes of Physical Science of the Chinese Academy of Sciences, has developed a novel deep learning framework that significantly improves the accuracy and interpretability of detecting neurological disorders through speech.

"A slight change in the way we speak might be more than just a slip of the tongue—it could be a warning sign from the brain," said Prof. LI Hai, who led the team, "Our new model can detect early symptoms of neurological diseases like Parkinson' s, Huntington' s, and Wilson disease—by analyzing voice recordings."

The study was recently published in Neurocomputing.

Dysarthria is a common early symptom of various neurological disorders. Given that these speech abnormalities often reflect underlying neurodegenerative processes, voice signals have emerged as promising non-invasive biomarkers for early screening and continuous monitoring of such conditions. Automated speech analysis offers high efficiency, low cost, and non-invasiveness. However, current mainstream methods often suffer from over-reliance on handcrafted features, limited capacity to model temporal-variable interactions, and poor interpretability.

To address these challenges, the team proposed Cross-Time and Cross-Axis Interactive Transformer (CTCAIT) for multivariate time series analysis. This framework first employs a large-scale audio model to extract high-dimensional temporal features from speech, representing them as multidimensional embeddings along time and feature axes. It then leverages the Inception Time network to capture multi-scale and multi-level patterns within the time series. By integrating cross-time and cross-channel multi-head attention mechanisms, CTCAIT effectively captures pathological speech signatures embedded across different dimensions.

The method achieved a detection accuracy of 92.06% on a Mandarin Chinese dataset and 87.73% on an external English dataset, demonstrating strong cross-linguistic generalizability.

Furthermore, the team conducted interpretability analyses of the model's internal decision-making processes and systematically compared the effectiveness of different speech tasks, offering valuable insights for its potential clinical deployment.

These efforts provide important guidance for potential clinical applications of the method in early diagnosis and monitoring of neurological disorders.

The study was supported by the National Natural Science Foundation of China, the Natural Science Foundation of Anhui Province, and the Anhui Provincial Key Research and Development Program.

Attachments Download:

Contact

Reference

Multivariate time series approach integrating cross-temporal and cross-channel attention for dysarthria detection from speech

ABOUT

RESEARCH

NEWSROOM

CAREER

STUDY

PARTNERSHIP

HOME