Authors: Lasse Hansen; Yan-Ping Zhang; Detlef Wolf; Konstantinos Sechidis; Nicolai Ladegaard; Riccardo Fusaroli · Research
Can Voice Analysis Detect Depression and Track Recovery?
A study found that analyzing vocal patterns can help identify depression and monitor recovery, highlighting potential for new screening tools.
Source: Hansen, L., Zhang, Y. P., Wolf, D., Sechidis, K., Ladegaard, N., & Fusaroli, R. (2021). A Generalizable Speech Emotion Recognition Model Reveals Depression and Remission. bioRxiv.
What you need to know
- Researchers developed a computer model that can detect signs of depression by analyzing vocal patterns
- The model could distinguish between depressed patients and healthy controls with 71% accuracy
- Patients who recovered from depression had vocal patterns similar to healthy controls
- Background noise greatly impacts the model’s performance, highlighting the need for controlled recording conditions
How Voice Analysis Could Help Detect Depression
Depression affects over 160 million people worldwide and can severely impact quality of life. However, current methods for screening and monitoring depression rely heavily on self-reported symptoms, which can be unreliable. Researchers are exploring new ways to detect depression more objectively, including through analysis of vocal patterns.
A team of scientists recently investigated whether a computer model trained to recognize emotions in voices could also detect signs of depression. Their study, published as a preprint on bioRxiv, found promising results that highlight the potential for voice analysis as a tool for mental health screening and monitoring.
Training a Model to Recognize Emotions in Speech
The researchers started by training a computer model to recognize emotions like happiness and sadness in recorded speech. They used existing databases of actors speaking sentences with different emotional tones in English and German.
This emotion recognition model was then tested on recordings of interviews with Danish-speaking patients diagnosed with depression and healthy control participants. The goal was to see if the model trained on acted emotional speech could detect real differences in the vocal patterns of depressed versus non-depressed individuals speaking naturally.
Distinguishing Depression Through Voice
When applied to the Danish interview recordings, the emotion recognition model was able to distinguish between depressed patients and healthy controls with 71% accuracy. This means it correctly classified participants as either depressed or non-depressed 71% of the time based solely on their voice patterns.
The model tended to classify the speech of depressed patients as sounding “sadder” compared to the healthy controls. Specifically, about 70% of the speech samples from depressed patients were classified as sad-sounding, compared to only 22-25% for healthy controls.
Tracking Recovery Through Changes in Voice
An interesting finding was that patients who recovered from depression showed vocal patterns very similar to the healthy control group in follow-up interviews. This suggests that the vocal markers of depression detected by the model tend to normalize as patients recover.
Dr. Riccardo Fusaroli, one of the study authors, explained: “We found that the emotional tone of voice for patients in remission was indistinguishable from that of the control group. This indicates that voice-based symptoms of depression decrease following successful treatment.”
This ability to track changes in vocal patterns could potentially be used to monitor treatment progress over time. However, the researchers note that more studies are needed to confirm how reliably these vocal changes correlate with clinical improvement.
Consistency of Vocal Patterns
The researchers found that the model’s predictions were quite stable over the course of each interview, which typically lasted 20-50 minutes. This suggests that even short voice samples of 20-30 seconds may be sufficient to screen for potential signs of depression.
However, Dr. Fusaroli cautions that “While clear trends were visible at the group level, the extent to which each participant’s voice changed between visits differed markedly. This highlights the need to perform multiple recordings over multiple days in any practical application, to increase the robustness of the method.”
Important Factors for Accurate Analysis
The study identified several key factors that impact the accuracy of voice-based depression detection:
Background noise: The presence of background noise in recordings significantly reduced the model’s performance. This underscores the need for quiet recording conditions when using voice analysis tools.
Speaker separation: Removing speech from interviewers in the recordings slightly improved the model’s accuracy. This process of isolating the patient’s voice from other speakers is called “speaker diarization.”
Recording length: Analyzing longer voice samples of at least 20-30 seconds provided the best results. Samples shorter than this were less reliable for detecting signs of depression.
Dr. Lasse Hansen, the study’s lead author, emphasized: “Our findings show that data collection settings and data cleaning are crucial when considering automated voice analysis for clinical purposes. Background noise removal, in particular, is essential for making meaningful inferences.”
Potential Applications and Limitations
The ability to detect signs of depression through automated voice analysis could have several valuable applications:
Screening: Voice analysis tools could potentially be used to screen for depression risk, helping to identify individuals who may benefit from further evaluation.
Treatment monitoring: Tracking changes in vocal patterns over time could help clinicians assess how well a patient is responding to treatment.
Remote assessment: Voice-based tools could enable remote monitoring of mental health status, which is particularly relevant in the era of telemedicine.
However, the researchers emphasize that their model is not intended to replace clinical diagnosis. Dr. Hansen notes, “The primary area of application for such systems should be screening and disease monitoring, not diagnosis. Depression is a complex disorder, and no single measure can capture all its aspects.”
The study also has some limitations to consider:
- It only included patients with major depressive disorder, so it’s unclear how the model would perform with other mental health conditions.
- The research was conducted in Danish, and while the model showed promise in generalizing across languages, its performance in non-Germanic languages is unknown.
- The study did not include patients who did not recover from depression, so the researchers couldn’t assess voice changes in that group.
Future Directions
This research opens up several avenues for future study:
- Testing the model on larger and more diverse patient populations
- Investigating how well the approach works across a broader range of languages
- Exploring how voice analysis might be combined with other objective measures to improve depression screening and monitoring
- Studying how specific symptoms of depression correlate with changes in vocal patterns
Dr. Fusaroli concludes: “Voice-based systems have the advantage of being less prone to biases related to self-reports and human ratings, and can be used remotely, cheaply, and non-invasively. Successful implementation of voice-based depression screening and monitoring has potential for providing earlier diagnosis and a more granular view of treatment effect, thereby facilitating improved prognosis of major depressive disorder.”
Conclusions
- A computer model trained to recognize emotions in speech can detect signs of depression with promising accuracy
- Vocal patterns appear to normalize as patients recover from depression
- Voice analysis could potentially be used as a tool for depression screening and treatment monitoring
- Controlled recording conditions are crucial for accurate voice-based assessment
- More research is needed to validate these findings and explore practical applications
While this research shows promise, it’s important to remember that depression is a complex disorder that cannot be fully captured by any single measure. Voice analysis tools, if developed further, would likely serve as just one part of a comprehensive approach to mental health assessment and treatment.