Depression has become one of the major global health concerns. Technology like AI and ML can be used to analyze depression data to provide better treatments to people suffering from different types of depressive disorders. We’ll discuss depression and the ML Python code used to analyze data.
The changing lifestyle and social scenarios have brought many changes to our lives. We have access to too much information. We are way too connected with the virtual world, and the lines between real and virtual are blurring rapidly. While it sounds like a good thing to stay up to date and informed about anything under the sun, it also has severe side effects.
The fast-paced world has resulted in a lot of anxiety and stress, leading to different psychological issues in people. Depression and poor emotional health are now among the major concerns across the globe. Thankfully, technology is coming to the rescue yet again. Machine learning engineers and researchers are working on analyzing depression in people to detect the symptoms at earlier stages and provide better ways to cope with mental health issues.
Artificial intelligence and machine learning algorithms can be used to analyze datasets with depression-related data to deliver accurate and in-depth insights. Let’s understand what depression actually is and how ML can provide a feasible solution to help people with depression and make their lives happier.
Depression is a serious mental illness that makes you feel sad, lonely, tired, or anxious. It makes you lose interest in things you previously enjoyed. Depression is a psychological disorder that increases negative thoughts and emotions, leading to other health conditions. It also reduces your productivity, alertness, and ability to think coherently. It affects how you think, feel, and act.
Depression is a common condition seen in many people. Many times, people themselves don’t realize that they are in depression. Statistics show that around 3.8% of the global population suffers from depression. This includes 5.7% of adults who are aged over sixty years and 5% of adults aged less than sixty.
To put it in figures, 280-310 million people have depression. What’s alarming is that more than 800,000 people commit suicide due to depression every year. Kids and teens are by no means safe from depression. The US is among the states with the highest depression rates around the world.
Depression (Major Depressive Disorder, MDD) is commonly known as clinical depression. MDE (Major Depressive Episode) is a measure of time a person exhibits or has the symptoms of depression. Note that mood swings and short bursts of anger/ irritation are not considered depression.
Depression is an umbrella term that covers more than one type of mental illness/ disorder. It can be classified into the following types:
Anxiety is when you feel stressed and tense throughout the day. It brings negative thoughts about how things can go wrong or that something really bad will happen to you or your loved ones. So much worry takes over your mind and your thoughts. It also leads to anxiety and panic attacks.
You feel uneasy and uncomfortable no matter what. You cannot relax and calm down. An agitated person has jerky movements and is constantly fidgeting or in motion. You cannot sit in a position for more than a few seconds. Some people also tend to talk a lot when agitated. It doesn’t make sense, but you can’t control it either.
Melancholy is intense sadness or emotional pain. It fills your mind to an extent where even good things don’t cheer you up. Activities you usually enjoy also fail to make you happy. Melancholy results in loss of appetite, sad thoughts, feeling down/ low in the mornings, disturbed and irregular sleep patterns, and suicidal thoughts.
Persistent Depressive Disorder is when a person is suffering from depression for more than two years. It is a chronic condition where the person is highly vulnerable and susceptible to making harmful decisions. PDD is used to describe chronic major depression and dysthymia (low-grade persistent depression). The symptoms of this disorder are:
Bipolar disorder is also called manic depression, as it causes extreme mood swings in a person. You might experience random bursts of energy where you feel fantastic and at the top of the world. You work and overdo things until you’re exhausted. Meanwhile, on the other end of the spectrum, you’ll feel miserable and horrible about anything and everything. You feel fatigued, tired, and worthless.
This is a vicious cycle where you alter between two contrasting moods but no middle ground. Doctors recommend mood stabilizers like lithium and calming activities like meditation to bring some sort of balance and stability to your mood.
Depression has many symptoms, some of which overlap with a general lack of mood or exhaustion after a long day of work. Naturally, all of us feel low at some point in our lives or another. But when the feelings persist and take over our lives, it is a sign of depression.
Depression isn’t general sadness or pain of loss. It is more intense and can wreak havoc in your life by gradually robbing your happiness and ability to assert yourself. You can no longer feel, think, work, enjoy, and act the way you used to do. Some people term it as ‘living in a black hole’, where the void sucks out even the last bit of energy and happiness from you.
Some feel apathetic to their surroundings. Nothing matters to them anymore. Others have a constant sense of impending doom and cannot consider a positive alternative. Men exhibit signs of anger and restlessness, while women have excessive feelings of guilt, sleepiness, hunger, etc. Obviously, this varies from person to person.
Apart from this, all the above-listed symptoms are warnings signs of depression. A person who exhibits such signs needs medical intervention as soon as possible.
Using the following datasets, it is possible to build an AI and ML model for depression analysis.
The datasets available at Kaggle can help developers and researchers build systems that can automatically detect the depression state of a person based on the given sensor data. This dataset can be used in the following ways (and more):
Kaggle is a Google subsidiary and an online community of machine learning practitioners and data scientists.
This dataset is used as the main source at the Let’s Get Healthy California indicator on the site, https://letsgethealthy.ca.gov/. The data is geographically limited to California and comes from the California Behavioral Risk Factor Surveillance Survey (BRFSS). The data in the table is about the number of adults who never knew they had a depressive disorder.
The BRFSS survey is conducted by the Public Health Survey Research Program of California State University in Sacramento, under contract from CDPH (California Department of Public Health). It is an annual telephone survey conducted cross-sectionally to understand health-related issues in Californians. Aspects like chronic health conditions, risk behavior, and use of preventive services are measured.
The US Census Bureau, along with five federal agencies started a survey to gather data about COVID-19 and its impact on American households. The Household Pulse Survey aimed to measure the extent of the pandemic’s impact on aspects like food security, employment, housing, education, disruptions, consumer spending abilities, and the physical plus mental wellness of the citizens.
The survey took place online, where participants received invitations via emails and text messages. The email addresses and phone numbers were randomly selected, making sure that only one person from a household participated in the survey. The sample data was selected from the Census Bureau Master Address File Data. The estimates were adjusted to accommodate the lack of response from some participants. It also had to match the estimates of the Census Bureau in terms of age, gender, educational qualification, race, and ethnicity. The estimates also met the NCHS Data Presentation Standards for Proportions.
Camden is a town in northwest London, UK. This dataset provides information about the patterns and trends of diagnosed depression and anxiety in the townsfolk aged 18 and above.
S.No. | Dataset Name | Provided By | Dataset Size | Download links |
---|---|---|---|---|
1 | The depression dataset | Kaggle | 53.12 MB | https://www.kaggle.com/datasets/arashnic/the-depression-dataset |
2 | Adult Depression (LGHC Indicator) | Get Healthy California | 54.71 KB | https://data.world/chhs/5a281abf-1730-43b0-b17b-ac6a35db5760 |
3 | Indicators of Anxiety or Depression Based on Reported Frequency of Symptoms During Last 7 Days | U.S. Census Bureau | 1.6 MB | https://catalog.data.gov/dataset/indicators-of-anxiety-or-depression-based-on-reported-frequency-of-symptoms-during-last-7- |
4 | Camden Depression and Anxiety Profile | Public Health Intelligence | 1.8 MB | https://data.world/datagov-uk/fdf83747-0aeb-4cb9-840d-75f16c8d3105 |
!pip install nltk
!pip install pandas
!pip install numpy
import pandas as pd
import seaborn as sns
import re
from nltk.stem import PorterStemmer,WordNetLemmatizer
from nltk.corpus import stopwords
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from imblearn import under_sampling
from imblearn import over_sampling
from imblearn.over_sampling import SMOTE
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
import re
import pickle
df = pd.read_csv("depression_data.csv")
df.head(5)
df['message'].iloc[:1]
df.columns
df = df.drop('Unnamed: 0',axis=1)
df
10314 rows × 2 columns
message | label | |
---|---|---|
0 | just had a real good moment. i missssssssss hi… | 0 |
1 | is reading manga http://plurk.com/p/mzp1e | 0 |
2 | @comeagainjen http://twitpic.com/2y2lx – http:… | 0 |
3 | @lapcat Need to send ’em to my accountant tomo… | 0 |
4 | ADD ME ON MYSPACE!!! myspace.com/LookThunder | 0 |
… | … | |
10309 | No Depression by G Herbo is my mood from now o… | 1 |
10310 | What do you do when depression succumbs the br… | 1 |
10311 | Ketamine Nasal Spray Shows Promise Against Dep… | 1 |
10312 | dont mistake a bad day with depression! everyo… | 1 |
10313 | 0 | 1 |
sns.countplot(df['label'])
C:\Users\rahul\anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
warnings.warn(
<AxesSubplot:xlabel='label', ylabel='count'>
wo = WordNetLemmatizer()
corpus=[]
for i in range(0,len(df)):
message = re.sub('[^a-zA-Z]',' ',df['message'][i])
message = message.lower()
message = message.split()
message = [wo.lemmatize(word) for word in message ]
message = ' '.join(message)
corpus.append(message)
corpus[2]
depressive_words = ' '.join(list(df[df['label'] == 1]['message']))
depressive_wc = WordCloud(width = 512,height = 512, collocations=False, colormap="Blues").generate(depressive_words)
plt.figure(figsize = (10, 8), facecolor = 'k')
plt.imshow(depressive_wc)
plt.axis('off')
plt.tight_layout(pad = 0)
plt.show()
positive_words = ' '.join(list(df[df['label'] == 0]['message']))
positive_wc = WordCloud(width = 512,height = 512, collocations=False, colormap="Blues").generate(positive_words)
plt.figure(figsize = (10, 8), facecolor = 'k')
plt.imshow(positive_wc)
plt.axis('off'),
plt.tight_layout(pad = 0)
plt.show()
X_train, X_test, y_train, y_test = train_test_split(corpus,df['label'],test_size=0.25,random_state=42)
vectorizer = TfidfVectorizer( ngram_range=(1,3), stop_words='english',max_features=15000)
X_train_vect = vectorizer.fit_transform(X_train)
X_test_vect = vectorizer.transform(X_test)
X_train_vect.shape
x_resample, y_resample = SMOTE().fit_resample(X_train_vect, y_train)
x_test_resample, y_test_resample = SMOTE().fit_resample(X_test_vect, y_test)
# lets print the shape of x and y after resampling it
print(x_resample.shape)
print(y_resample.shape)
clf = LogisticRegression(solver='lbfgs')
clf.fit(x_resample,y_resample)
y_pred = clf.predict(x_test_resample)
accuracy_score(y_test_resample,y_pred)
print(classification_report(y_test_resample,y_pred))
precision | recall | f1-score | support | |
0.96 | 0.99 | 0.97 | 2011 | |
0.99 | 0.95 | 0.97 | 2011 | |
accuracy | 0.97 | 4022 | ||
macro average | 0.97 | 0.97 | 0.97 | 4022 |
weighted average | 0.97 | 0.97 | 0.97 | 4022 |
mnb = MultinomialNB()
mnb.fit(x_resample,y_resample)
y_pred = mnb.predict(x_test_resample)
accuracy_score(y_test_resample,y_pred)
print(classification_report(y_test_resample,y_pred))
precision | recall | f1-score | support | |
0.96 | 0.95 | 0.96 | 2011 | |
0.95 | 0.96 | 0.96 | 2011 | |
accuracy | 0.96 | 4022 | ||
macro average | 0.96 | 0.96 | 0.96 | 4022 |
weighted average | 0.96 | 0.96 | 0.96 | 4022 |
def preprocess(data):
#preprocess
a = re.sub('[^a-zA-Z]',' ',data)
a = a.lower()
a = a.split()
a = [wo.lemmatize(word) for word in a ]
a = ' '.join(a)
return a
strr = input('Enter Your Message: ')
print("-------------------------------")
examples = strr
a = preprocess(examples)
example_counts = vectorizer.transform([a])
prediction =mnb.predict(example_counts)
prediction[0]
if prediction[0]==0:
print('Positive')
elif prediction[0]==1:
print('Depressive')
Enter Your Message: happy birthday
-------------------------------
Positive
The machine learning models can be used to analyze depression and its types, anxiety, PTSD, bipolar disorder, and a variety of other mental disorders that affect people from all parts of the world. As more and more data is created about depression patterns and symptoms, the algorithm will get more accurate and deliver better predictions.
This will help identify depression in its early stages and enable medical practitioners to help people recognize their condition and opt for appropriate treatment to control depression and lead happier lives.