Automated Depression Detector From Speech

Conference Presentation

Authors: Angel Paul, Ananya Muralidhar, R. Aishwarya, B. N. Shriya

Presenters: Ananya Muralidhar, R. Aishwarya

Event: IRST-2023
Presentation At: IRST-2023

Presentation Date: 24th May, 2023

Motivation

The challenges my aunt faced while grappling with depression deeply resonated with me, illuminating the pressing need for tools that aid in early detection. Knowing that many silently battle their emotional tumult, I felt compelled to merge my strengths in speech signals, natural language processing, and fog computing to devise an automated tool.

Summary

The project proposed an innovative method to detect depression using Deep Convolutional Neural Networks (CNN) based on speech signals represented via spectrograms. This method aimed to harness the detailed insights provided by spectrograms and the power of deep learning to discern patterns indicative of depression. To ensure the privacy and efficient transmission of healthcare data, fog computing was introduced. The DAIC-WOZ dataset was employed for this purpose. The dataset underwent preprocessing to enhance its quality, followed by data augmentation to address the scarcity of voice samples and class imbalance. The CNN model, designed with 6 layers, was then trained and validated to predict depressive tendencies based on the user's voice.

Through its meticulous analysis of prosodic features in speech, this tool discreetly allows individuals to gauge their emotional well-being. Furthermore, with its user-centric design, it serves as an accessible bridge to potential therapeutic interventions.

Keywords: Data augmentation, CNN, Fog computing, spectrograms, depression.

Automated Depression Detector Architecture.png

CNN Model

System Architecture

Tools and Technologies: PyCharm, Oracle VM virtual box, Docker, Amazon EC2, soX and ffmpeg

Features:

Deep-Learned Feature Extraction: Utilized Deep Convolutional Neural Networks to interpret raw speech waveforms and spectrograms, contributing to informed mental health predictions.
Advanced Noise Reduction: Employed FFMPEG and SOX techniques to enhance speech quality, ensuring clear and accurate analysis.
Optimal Model Performance: Evaluated and employed the most efficient sound-processing techniques and transformations for maximum model performance.

Implementation:

Database Analysis: Analyzed 189 sessions of interactions from USC’s DAIC-WOZ database, spanning over 3,000 minutes, serving as the foundation for the model development.
(Speech analysis):
Data Augmentation: Applied extensive data augmentation techniques to ensure balanced and diverse training, contributing to the model’s overall accuracy.
Precision and Accuracy: Achieved a prediction accuracy of 71% by focusing on precise psychological diagnosis through improved speech quality and advanced neural networks.

Spectrograms of a regular person

Spectrograms of a depressed person

Reflection:

This project was a convergence of technology and mental health, reflecting my passion for leveraging technology to address critical issues in mental health. The journey was filled with learning experiences, from understanding acoustic features in speech to optimizing neural networks for accurate predictions. It reinforced my commitment to blending technological advancements with healthcare for the betterment of society.

Relevance & Application:

The developed tool holds significant relevance in today’s context, where mental health awareness is crucial. It can be applied in various settings like healthcare institutions and counseling centers for early detection and intervention, potentially preventing severe mental health conditions.

Further Research & Questions:

- Exploring multi-modal fusion techniques, combining speech signal analysis with physiological data, to create a comprehensive depression detection tool that considers both verbal and non-verbal cues.

- Conducting a longitudinal study to assess the tool's effectiveness in early intervention and its long-term impact on individuals' mental well-being.