
Artificial intelligence and Twitter are set to be key in creating anxiety and depression prediction models that could give vital signs of these conditions before clinical diagnosis.
The study, from the University of São Paulo (USP) in Brazil, is inspired by previous research indicating mental health problems are often reflected by the language used by those experiencing them, and will use social media as a unique insight into whether interventions can be identified at an earlier stage.
Construction of a database, called SetembroBR, was the first step in the study. The name is a reference to Yellow September, an annual suicide awareness and prevention campaign, and also to the fact that data collection for the study began one day in September.
The second step is still in progress but has provided some preliminary findings, such as the possibility of detecting whether a person is likely to develop depression solely on the basis of their social media friends and followers, without taking their own posts into account.
The database compiled by the group contains information relating to a corpus of texts and the network of connections involving 3,900 Twitter users who reported having been diagnosed with or treated for mental health problems before the survey.
The corpus includes all 47 million public tweets posted by these users individually, without retweets). The study also collected tweets from friends and followers, in accordance with the observation that people with mental health problems tend to follow certain accounts, such as discussion forums, influencers and celebrities who publicly acknowledge their depression.
“First, we collected timelines manually, analysing tweets by some 19,000 users, equivalent to the population of a village or small town,” said Ivandre Paraboni, last author of the article and a professor at USP’s School of Arts, Sciences and Humanities (EACH).
“We then used two datasets, one for users who reported being diagnosed with a mental health problem and another selected at random for control purposes. We wanted to distinguish between people with depression and the general population.”
Mental health disturbances, including depression and anxiety, are a growing global concern. The World Health Organization (WHO) estimated on the basis of 2021 data that 3.8 per cent of the world population, or some 280 million people, were affected by depression.
WHO also estimated an increase of 25 per cent in global prevalence of these mental health problems during the COVID-19 pandemic. The tweets were collected for the study during this period.
The researchers pre-processed the corpus to remove hashtags, URLs, emoticons and non-standard characters while maintaining the original texts.
They then deployed deep learning, an AI technique that teaches computers to process data in a way inspired by the human brain, to create four text classifiers and word embeddings (context-dependent mathematical representations of relations between words) using models based on bidirectional encoder representations from transformers (BERT), a machine learning algorithm for NLP.
These models correspond to a neural network that learns contexts and meanings by monitoring sequential data relationships, such as words in a sentence.
The training input consisted of a sample of 200 tweets selected at random from each user. The parameters were defined by executing cross-validation of the training data five times and calculating the average result.
The conclusion was that BERT performed best in terms of predicting depression and anxiety, with a statistically significant difference between it and LogReg, the next best option.
Because the models analysed sequences of words and complete sentences, it was possible to observe that people with depression, for example, tended to write about subjects connected to themselves, using verbs and phrases in the first person, as well as topics such as death, crisis and psychology.
“The signs of depression that can be detected during a visit to the doctor aren’t necessarily the same as the ones that appear on social media,” Paraboni said.
“For example, use of the first-person singular pronouns I and me was very evident, and in psychology this is considered a classic sign of depression.
“We also observed frequent use of the heart emoji by depressive users. This is widely felt to be a symbol of affection and love, but maybe psychologists haven’t yet characterised it as such.”
The researchers are now extending the database, refining their computational techniques and upgrading the models in order to see if they can produce a tool for future use in screening for mental health problems and helping families and friends of young people at risk from depression and anxiety.









