Detecting Early Onset of Depression from Social Media Text using Learned Confidence Scores (original) (raw)
Related papers
Towards Measuring the Severity of Depression in Social Media via Text Classification
XXV Congreso Argentino de Ciencias de la Computación (CACIC) (Universidad Nacional de Río Cuarto, Córdoba, 14 al 18 de octubre de 2019), 2019
Psychologists have used tests or carefully designed survey questions, such as Beck's Depression Inventory (BDI), to identify the presence of depression and to assess its severity level. On the other hand, methods for automatic depression detection have gained increasing interest since all the information available in social media, such as Twitter and Facebook, enables novel measurement based on language use. These methods learn to characterize depression through natural language use and have shown that, in fact, language usage can provide strong evidence in detecting depressive people. However, not much attention has been paid to measuring finer grain relationships between both aspects, such as how is connected the language usage with the severity level of depression. The present study is a first step towards that direction. First, we train a binary text classifier to detect "depressed" users and then we use its confidence values to estimate the user's clinical depression level. In order to do that, our system has to fill the standard BDI depression questionnaire on users' behalf, based only on the text of users' postings. Our proposal, publicly tested in the eRisk 2019 T3 task, obtained promising results. This offers very interesting evidence of the potential of our method to estimate the level of depression directly form user's posts in social media.
Using Text Classification to Estimate the Depression Level of Reddit Users
Journal of Computer Science and Technology, 2021
Psychologists have used tests and carefully designed survey questions, such as Beck's Depression Inventory (BDI), to identify the presence of depression and to assess its severity level.On the other hand, methods for automatic depression detection have gained increasing interest since all the information available in social media, such as Twitter and Facebook, enables novel measurement based on language use.These methods learn to characterize depression through natural language use and have shown that, in fact, language usage can provide strong evidence in detecting depressive people.However, not much attention has been paid to measuring finer grain relationships between both aspects, such as how is connected the language usage with the severity level of depression.The present study is a first step towards that direction.We train a binary text classifier to detect ``depressed'' users and then we use its confidence value to estimate the user's clinical depression leve...
International Journal of Electrical and Computer Engineering (IJECE), 2024
Language provides significant insights into an individual's emotional state, social status, and personality traits. This research aims to enhance depression detection through the analysis of linguistic features and various dataset attributes. The dataset, sourced from the social networking platform Reddit, comprises posts and comments from individuals diagnosed with depression. Logistic regression with term frequency-inverse document frequency (TF-IDF) is employed as the primary model for text classification. To improve model performance, a novel feature-the average time interval between consecutive posts or comments-is introduced, contributing to a marginal but noteworthy improvement in accuracy. The proposed model demonstrates superior F1 scores compared to other models applied to the same dataset. Given the increasing recognition of mental health's significance, accurately diagnosing mental disorders is of paramount importance. This study underscores the potential of leveraging linguistic analysis and advanced machine learning techniques to identify depressive symptoms, thereby contributing to more effective mental health diagnostics and interventions.
Early Identification of Depression Severity Levels on Reddit Using Ordinal Classification
Proceedings of the ACM Web Conference 2022
User-generated text on social media is a promising avenue for public health surveillance and has been actively explored for its feasibility in the early identification of depression. Existing methods in the identification of depression have shown promising results; however, these methods were all focused on treating the identification as a binary classification problem. To date, there has been little effort towards identifying users' depression severity level and disregard the inherent ordinal nature across these fine-grain levels. This paper aims to make early identification of depression severity levels on social media data. To accomplish this, we built a new dataset based on the inherent ordinal nature over depression severity levels using clinical depression standards on Reddit posts. The posts were classified into 4 depression severity levels covering the clinical depression standards on social media. Accordingly, we reformulate the early identification of depression as an ordinal classification task over clinical depression standards such as Beck's Depression Inventory and the Depressive Disorder Annotation scheme to identify depression severity levels. With these, we propose a hierarchical attention method optimized to factor in the increasing depression severity levels through a soft probability distribution. We experimented using two datasets (a public dataset having more than one post from each user and our built dataset with a single user post) using real-world Reddit posts that have been classified according to questionnaires built by clinical experts and demonstrated that our method outperforms state-of-the-art models. Finally, we conclude by analyzing the minimum number of posts required to identify depression severity level followed by a discussion of empirical and practical considerations of our study. CCS CONCEPTS • Applied computing → Health informatics; • Computing methodologies → Natural language processing.
Expert Systems with Applications
With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users provide their data over time. In addition, these systems must be able to decide when the processed data is sufficient to actually classify users. Moreover, since ERD tasks involve risky decisions by which people's lives could be affected, such systems must also be able to justify their decisions. However, most standard and state-of-the-art supervised machine learning models (such as SVM, MNB, Neural Networks, etc.) are not well suited to deal with this scenario. This is due to the fact that they either act as black boxes or do not support incremental classification/learning. In this paper we introduce SS3, a novel supervised learning model for text classification that naturally supports these aspects. SS3 was designed to be used as a general framework to deal with ERD problems. We evaluated our model on the CLEF's eRisk2017 pilot task on early depression detection. Most of the 30 contributions submitted to this competition used state-of-the-art methods. Experimental results show that our classifier was able to outperform these models and standard classifiers, despite being less computationally expensive and having the ability to explain its rationale.
Detection of Depression-Related Posts in Reddit Social Media Forum
IEEE Access, 2019
Depression is viewed as the largest contributor to global disability and a major reason for suicide. It has an impact on the language usage reflected in the written text. The key objective of our study is to examine Reddit users' posts to detect any factors that may reveal the depression attitudes of relevant online users. For such purpose, we employ the Natural Language Processing (NLP) techniques and machine learning approaches to train the data and evaluate the efficiency of our proposed method. We identify a lexicon of terms that are more common among depressed accounts. The results show that our proposed method can significantly improve performance accuracy. The best single feature is bigram with the Support Vector Machine (SVM) classifier to detect depression with 80% accuracy and 0.80 F1 scores. The strength and effectiveness of the combined features (LIWC+LDA+bigram) are most successfully demonstrated with the Multilayer Perceptron (MLP) classifier resulting in the top performance for depression detection reaching 91% accuracy and 0.93 F1 scores. According to our study, better performance improvement can be achieved by proper feature selections and their multiple feature combinations. INDEX TERMS Natural language processing, machine learning, Reddit, social networks, depression.
CLPsych 2019 Shared Task: Predicting the Degree of Suicide Risk in Reddit Posts
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology, 2019
The shared task for the 2019 Workshop on Computational Linguistics and Clinical Psychology (CLPsych’19) introduced an assessment of suicide risk based on social media postings, using data from Reddit to identify users at no, low, moderate, or severe risk. Two variations of the task focused on users whose posts to the r/SuicideWatch subreddit indicated they might be at risk; a third task looked at screening users based only on their more everyday (non-SuicideWatch) posts. We received submissions from 15 different teams, and the results provide progress and insight into the value of language signal in helping to predict risk level.
DEPTWEET: A typology for social media texts to detect depression severities
Computers in Human Behavior
Mental health research through data-driven methods has been hindered by a lack of standard typology and scarcity of adequate data. In this study, we leverage the clinical articulation of depression to build a typology for social media texts for detecting the severity of depression. It emulates the standard clinical assessment procedure Diagnostic and Statistical Manual of Mental Disorders (DSM-5) and Patient Health Questionnaire (PHQ-9) to encompass subtle indications of depressive disorders from tweets. Along with the typology, we present a new dataset of 40191 tweets labeled by expert annotators. Each tweet is labeled as 'non-depressed' or 'depressed'. Moreover, three severity levels are considered for 'depressed' tweets: (1) mild, (2) moderate, and (3) severe. An associated confidence score is provided with each label to validate the quality of annotation. We examine the quality of the dataset via representing summary statistics while setting strong baseline results using attention-based models like BERT and DistilBERT. Finally, we extensively address the limitations of the study to provide directions for further research.
Making a Case for Social Media Corpus for Detecting Depression
2019
The social media platform provides an opportunity to gain valuable insights into user behaviour. Users mimic their internal feelings and emotions in a disinhibited fashion using natural language. Techniques in Natural Language Processing have helped researchers decipher standard documents and cull together inferences from massive amount of data. A representative corpus is a prerequisite for NLP and one of the challenges we face today is the non-standard and noisy language that exists on the internet. Our work focuses on building a corpus from social media that is focused on detecting mental illness. We use depression as a case study and demonstrate the effectiveness of using such a corpus for helping practitioners detect such cases. Our results show a high correlation between our Social Media Corpus and the standard corpus for depression.