Study | Language | N | Diagnosis | Source of text | Features extraction method | Outcome | Type of classification model | Validation | Accuracy | F1 | AUC | Specificity | Precision | Recall/Sensitivity |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Abilkaiyrkyzy 202429 | English | 20 (+ 275 trained on E-DAIC) | Mildly depressed: 8, Moderately depressed: 6 Not depressed: 4 | Open-ended questions | Transformer (BERT tokenizer and LanguageModelFeaturizer) | PHQ-9 | Fine-tuned BERT sequence classifier for multi-class depression severity (Softmax output). | Trained on E-DAIC tested on sample of 20 university students | 0.65 | |||||
Aloshban 202130 | Italian | 59 | Depressed: 29 Not Depressed: 30 | Interviewed about everyday life aspects (e.g., activities in the weekend of interaction with family members) | Embedding (Wikipedia2Vec) | Professional psychiatrists’ diagnosis | BiLSTM | 5-fold cross validation | 0.729 | 0.619 | 1 | 0.448 | ||
Antoniou 202231 | English | 773 (270) | Depression/stress: 356 sessions of 184 patients. Other problems: 417 sessions of 86 patients | Interaction with therapist (text-based counseling) | LIWC | Patients report presenting problem before the first interaction | Quadratic discriminant analyses | 5-fold cross validation | 0.778 | 0.71 | 0.76 | |||
Banerjee 202132 | English | 1999 | Unclear, 71.4% from the data before cleaning | Open-ended questions | Embedding (doc2vec); Affective features; Word polarity; linguistic tags (e.g., Proper Noun Tag, Singular Noun Tag) | PHQ-9 | CNN-Dynamic Attention | 60-20-20 random train-validation-test split | 0.644 | |||||
Boian 202533 | Romanian | 3955 (861) | Per-item classification Not at all (NO): 457 Several days (SD): 1,063, More than half the days (HA): 446, Nearly every day (EV): 442 or Irrelevant (IR): 227 | Clinical interview (conducted by aiCARE chatbot) | TF-IDF | PHQ-9 | Logistic regression | Split to train and test set Train: 1320 Test: 2635 | 0.840 | 0.80 | 0.80 | 0.80 | 0.80 | |
Burkhardt 202234 | English | 13327 (6551) | Unclear | Interaction with therapist | LIWC; Embedding (BERT-based model for GoEmotions was used to extract emotion features) | PHQ-9 | Random forest | Training set (80%): 4,913 patients, 10,006 observations Test set (20%): 1,638 patients, 3,321 observations | 0.520 | 0.67 | 0.612 | 0.453 | ||
Cao 202535 | Chinese | 50 (the full sample included 100 patients the best model included 50) | Very severe 37 Mild 27 Severe 19 Normal 13 Moderate 13 | Clinical interview | LLM (Qwen2.5-7B-Instruct) | HAMD-17 | LLM: Qwen2.5-7B-Instruct fine-tuned with LoRA | leave-one-out cross-validation | 0.61 | 0.61 | 0.61 | |||
Chen 202436 (CMDC) (Same as Zou 2023) | Chinese | 78 | Depressed: 26 Not depressed: 52 | Clinical interview | Transformer (the Chinese-BERT); Xmnlp: Word-level features: ratios of adjectives, adverbs, exclamations, verbs, auxiliary words, modal particles, and total word count; Sentence-level features: number of sentences, ratio of positive and negative sentences, and overall sentiment score; Lexical-emotion feature: proportion of modal words | MINI | IIFDD | 5-fold cross-validation | 0.87* | 0.8 | 0.82 | 0.79 | ||
Chen 202436 (EATD) (Same as Shen 2022)* | Chinese | 162 | Depressed: 30 Not depressed: 132 | General interview | Same as Chen 2024a (above) | The Self-rating Depression Scale (SDS). | IIFDD | 3-fold cross-validation | 0.45 | 0.36 | 0.70 | |||
Cohen 202337 | English | 73 (68) | Depressed: 15 Control: 58 | Interaction with an online agent, Tina | TDF-IDF | PHQ-9 | SVM | leave-one-subject-out | 0.54 | |||||
Cook 201638 | Spanish | 1458 | Depressed: 662 Not depressed: 796 | Free-text responses to the question: "how do you feel today?" | n-gram | GHQ-12 | Logistic regression | 50% split to train and test | 0.53 | 0.42 | 0.79 | 0.64 | 0.31 | |
deHond 202439 | English | 4070 | Depressed: 127 Not depressed: 3943 | Patient (cancer)-generated emails to their health care teams | Transformer (BERT) | ICD-9 and ICD-10 codes obtained from electronic health record data | LASSO logistic regression | Train (67%): 2713 Test (33%): 1357 | 0.925 | 0.091 | 0.54 | 0.95 | 0.925 | 0.13 |
Demiroglu 202040 | Turkish | 77 (70) | Depressed: 50 records. Not depressed: 27 records | Interview, with 3 types of questions: neutral, positive, and negative. | Average length of the utterances, subjects in negative, positive, and neutral answers separately. Three-dimensional feature. Rate of speech for negative, positive, and neutral answers. Sentiments of the question-answer pairs. | BDI | SVM | Leave-on-out | 0.65* | 0.68 | 0.76 | 0.67 | ||
Demiroglu 202040 | German | 100 (84) | Depressed: 44 records. Not depressed: 56 records | Interview, general questions (e.g., “What is your favorite dish?”) | Same as above | BDI | SVM | Leave-one-out | 0.85* | 0.77 | 0.89 | 0.75 | ||
Gao 2024a41 | Chinese | 156 | Depressed: 77 Not depressed:, 79 | Responses to four questions about recent events, sleep, mood, and suicidal tendencies | Transformer (BERT and an improved TextCNN) | Medical records | Dual-branch BERT + improved TextCNN model | Train: 94 (60%) Validation: 31 (20%) Test: 31 (20%) | 0.942 | 0.947 | 0.931 | 0.964 | ||
Guo 202442 | Chinese | 524 | Depressed: 59 Not depressed: 465 | Clinical interview | Transformer (EmoLLM + GraphRAG) | HAMD, HAMA | EmoLLM | N/A | 0.84 | 0.49 | 0.38 | 0.68 | ||
Hayati 202243 | Dialectal Malay | 53 | Depressed: 11 Not depressed: 42 | Clinical interview | Transformer (GPT-3) | BDI | GPT3 (They compared its performance using 2–10 examples) | N/A | 0.71 | 0.67 | ||||
He 2022 (same data as Yuan 2021)44 | Chinese | 108 | Depressed: 54 Not depressed: 54 | Picture description and question-answering tasks) | Embedding (Glove) | BDI-II and PHQ-9 | GRU based RNN | 8:1:1 random train-validation-test split | 0.659 | 0.631 | 0.688 | 0.583 | ||
Howes 201445 | English | 882 (167) | Unclear | Interaction with therapist (text-based counseling) | n-gram | PHQ-9 | Logistic regression | 10-fold cross-validation | 0.686 | |||||
Iyortsuun 202446 (Same as Shen 2022) | Chinese | 162 | Depressed: 30 Not depressed: 132 | General interview | Transformer (Transformer-based, USE-large) | SDS | BiLSTM + Attention | 3-Fold cross-validation | 0.606 | 0.66 | 0.79 | 0.58 | ||
Joharee 202347 | Bahasa Malaysia | 511 (172) | Unclear (in the teste set 28 depressed and 23 not depressed) | 3 open-ended questions | TF-IDF | BDI-II and PHQ-9 | Extra Tree Classifier | Split 70% training and 30% test | 0.73 | 0.63 | ||||
Krishnamurti 202248 | English | 1007 (666) | Not depressed: 48.2% Mild: 38.3% moderate: 10.5% severe: 3.0% | Open-ended questions documenting their pregnancy journey | LIWC; Embedding (Word2Vec); Latent Dirichlet Allocation (LDA), SentiWordNet (SWN) | Edinburgh Postnatal Depression Scale (EPDS) | LASSO regression model | 70% training, 15% for prediction (additional 15% were not used) | 0.87 | |||||
Li 202349 | Chinese | 387 (329) | Euthymia = 46, mild = 102, moderate = 160, severe = 79 | Clinical interview | Transformer (BERT) | HAMD-17 | BiLSTM + Self-Attention + Multilayer Perceptron (MLP) + Softmax, | Training = 273 recordings, Test = 114 recordings. | 0.86 | 0.911 | 0.696 | 0.921 | 0.901 | |
Liu 202250 | English | 219 | Depressed: 64 Not Depressed: 155 | Text message | LIWC | PHQ-8 | Logistic Regression with L2 regularization | leave-one-out | 0.72 | |||||
Munthuli 202351 | Thai | 80 | Depressed: 40 Healthy control: 40 | Clinical interview | Fine-tuned transformer encoder (XLM-RoBERTa) | PHQ-9 and HAM-D | Transformer-based binary classifier (XLM-RoBERTa) | K×L-fold stratified and nested cross-validation | 0.9 | 0.898 | 0.925 | 0.921 | 0.875 | |
Nobles 201852 | English | 1213 (33) | Suicidality day: 685 Depression Day: 528 | Text message | TDF-IDF | Depression: periods where the individual had no suicidal ideation or attempt | DNN | 10-fold cross-validation | 0.7 | 0.75 | 0.56 | 0.71 | 0.81 | |
Oh 202453 | Korean | 166 (77) | Depressed: 60 Other psychiatric illnesses: 17 | Clinical interview | Emotional Analysis Module patented by Acryl Inc. | Clinical diagnosis (DSM-5), provided by psychiatrist | XGBoost | Train: 136 Test:30 | 0.794 | 0.877 | 0.85 | 0.25 | 0.962 | |
Ohse 202454 | German | 84 | Depressed: 25 Not Depressed: 59 | Clinical interview | GPT3.5 fine-tuned | PHQ-8 | GPT3.5 fine-tuned | N/A | 0.910 | 0.820 | 0.850 | 0.840 | ||
Orhan 201955 | Turkish | 60 | Depressed: 30 Healthy control: 30 | 10-minute free verbal samples of the subjects | Turkish version of the Harvard-III Psychological Dictionary | Structured clinical diagnosis | Bayesian Logistic Regression | Train: 42 (21 for each category) Test: 18 (9 for each set) | 0.89 | |||||
Parkeaw 202556 | Thai | 373 | Low risk: 261, High risk: 112 | SCT consisted of 34 items covering four key depression-related domains: 1) family, 2) society, 3) health, and 4) self-concept | LLM (LLama3.1) was used to extract sentiment scores | PHQ-9 | Random forest | 5-fold cross-validation | 0.786 | 0.782 | ||||
Pérez-Toro 202257 | Spanish | 60 | Depressed Parkinson's Disease patients (D-PD): 25 Non-depressed Parkinson's Disease patients (ND-PD): 35 | Free response prompt (asked to talk about their daily routines) | Transformer (BERT) | Depression item from the MDS-UPDRS | Gaussian Mixture Model-Universal Background Model | Nested leave one out cross-validation | 0.67* | 0.7 | 0.7 | 0.8 | 0.56 | |
Podina 202558 | Romanian | 765 | Depressed: 397 Not depressed: 367 | Clinical interview (with the aiCARE chatbot) | TF-IDF | PHQ-9 | Logistic regression | This is a test set for the algorithm that was built in Boian et al., 202533 | 0.84 | 0.85 | 0.78 | 0.76 | 0.93 | |
Qin 202515 | English | 37 | Depressed: 17 Control: 20 | 3 Phases: 1. small talk, 2. semi-structural interview. 3. Demographic questions | LLM (qCammel-13B-GPTQ) | MINI | LLM (qCammel-13B-GPTQ) | N/A | 0.81 | 0.87 | 0.88 | 0.80 | ||
Ren 202459 | English | 1070 (94) | Depressed: 570 Not depressed: 500 | Interaction with therapist (message-based online therapy) | LIWC; Transformer (BERT) | PHQ-9 | Neural network (classification head, unspecified) | Training: 870 Test set: 200 Each of the 94 participants contribute 3 observations for training and one for the test. | 0.60 | 0.59 | 0.64 | |||
Resnik 201360 | English | 124 | Depressed: 12 Not depressed: 112 | Students were asked to “describe your deepest thoughts and feelings about being in college”. | LIWC; Topic modeling (LDA) | BDI | Logistic Regression | Split to train (94) and test (30) | 0.80* | 0.50 | 0.50 | 0.50 | ||
Rutowski 202061 | English | 15,950 (11,000) | Depressed: 4259 Not depressed: 11,691 | Participants interacted with an app that presented questions on different topics, such as “work” or “home”. | Transfer learning, implemented via ULMFiT | PHQ-8 | LSTM | Split to train (80%) and test (20%) | 0.75 | 0.82 | 0.75 | 0.75 | ||
Shen 202262 | Chinese | 162 | Depressed: 30 Not depressed: 132 | Interviews | Embedding (ELMo) | PHQ-8 SDS | BiLSTM with Attention | 3-fold CV | 0.65 | 0.65 | 0.66 | |||
Shin 202263 | Korean | 166 | Depressed: 83 Healthy control: 83 | Clinical interview | LIWC; Bag-of-words | MINI | Naive bayes | 80/20 split | 0.83 | 0.91 | 0.96 | 0.70 | ||
Shin 202464 | Korean | 428 (91) | Depressed: 73 Not depressed: 357 | Daily diary | Transformer Gpt3.5_ft_CoT (fine-tuned models, Chain-of-thought) | PHQ-9 and Beck Scale for Suicide Ideation (BSS) | Gpt3.5_ft_CoT (fine-tuned models, Chain-of-thought) | N/A | 0.90 | 0.69 | 0.95 | 0.75 | 0.64 | |
Smirnova 201865 | Russian | 201 | Depressed: 124 Healthy control: 77 | Free response prompt; (Participants wrote narratives on the topic ‚"The current state of life and future expectations) | Lexico-semantic features: metaphors, similes, informal words, repetitions Syntactic features: sentence types, word order, ellipses Lexico-grammatical features: pronouns, verb tenses/forms | Clinical psychiatric interviews coded using ICD-10 diagnostic criteria | Linear discriminant analysis | Mention cross validation, but not clear which type | 0.99 | |||||
Smirnova 201966 | Russian | 201 | Same as above | Same as above | Component lexis analysis | HDRS-21 | Linear discriminant analysis | Mention cross validation, but not clear which type | 0.96 | |||||
Sood 202367 | English | 626 | Depressed: 152 Not depressed: 474 | Clinical interview [Combination of 3 data sets: DAIC, E-DAIC and EATD corpus (originally chines but translated to English)] | TDF-IDF | PHQ-8 and SDS | SVM | Training set: 399 Development set: 108 Test set: 119 (34 depressed) | 0.90* | 0.82 | 0.83 | 0.83 | ||
Tao 202368 | Chinese | 139 | Depressed: 64 Anxious: 75 | Interaction with chatbot asking about daily activities | Transformer (ChatGPT) | Psychiatrist diagnosis | ChatGPT | N/A | 0.68 | 0.71 | 0.69 | 0.72 | ||
Tlachac 202069 | English | 162 | Depressed:55 Not Depressed: 107 | Text message | Lexical category features via Empath; POS tag frequencies; Sentiment scores (polarity and subjectivity); Volume features: number of messages, words, characters | PHQ-9 | Logistic Regression | 5-fold cross-validation | 0.804* | 0.806 | 0.742 | 0.728 | 0.925 | |
Tlachac 2022a70 | English | 302 | Depressed: 142 (47.0%) Not depressed: 160 (53.0%) | Free response prompt | Transformer (BERT); Part-of-speech (POS) tagging Lexical category features via Empath | PHQ-9 | BERT-LSTM (a variation of BERT incorporating a Long Short-Term Memory layer) | Training set: 218 Test set: 84 (27.8%) | 0.55 | 0.67 | 0.17 | 0.51 | 0.97 | |
Tlachac 2022b71 | English | 3,000 (unclear of how many participants) | Unclear | Text message | Transformer (BERT) | PHQ-9 | Fine-tuned BERT classifier | Training: 2400 (1,200 messages per class) Testing: 600 (300 messages per class) | 0.711 | |||||
Tlachac 202272 | English | 88 | Depressed: 53 Not depressed: 35 | Text message | Lexical category; Frequency features; BOW | PHQ-9 | Logistic Regression | leave-group-out cross validation | 0.71 | 0.79 | 0.4 | 0.93 | ||
Weber 202573 | German | 126 (65 from 44 participants, 61 synthetic) | n/a | Clinical interview | Transformer (BERT-base-German-cased) | MADRS | Linear regression | 5-fold cross validation | 0.83 | |||||
Wright-Berryman 202374 | English | 2416 (1433) | Depressed: 863 Not depressed:1553 | Clinical interview | TF-IDF | PHQ-9 | SVM | Leave-one-subject-out cross-validation | 0.69 | 0.77 | 0.04 | 0.68 | 0.55 | |
Xue 202475 (Same as Shen 2022) | Chinese | 162 | Depressed: 30 Not depressed: 132 | EATD-Corpus (general interview) | Transformer (BERT) | SDS | Fine-tuned BERT model with fully connected layers | Does not specify what type of validation was used | 0.72 | 0.66 | 0.80 | |||
Ye 202176 | Chinese | 160 | Depressed: 80 Not depressed: 80 | Clinical interview | Embedding (Word2vec) | HAMD | One-hot Transformer | of 5-fold cross validation. | 0.882 | 0.874 | ||||
Yuan 202177 | Chinese | 108 | Depressed: 54 Not depressed: 54 | Picture descriptions and responses to 30 questions. | Embedding | BDI-II and PHQ-9 | Text Recurrent Encoder (TRE) | 8:1:1 random train-validation-test split | 0.659 | 0.651 | 0.688 | 0.583 | ||
Zhang 2024a78 | Chinese | 240 | Depressed: 120 Not depressed: 120 | Clinical interview | BERT Chinese pre-training model with Multi-Head Attention (MHA) module | PHQ-9 | Fully connected deep learning classifier | Training set: 168, Test set: 72 | 0.64 | 0.64 | 0.64 | 0.64 | ||
Zou 202379 | Chinese | 78 | Depressed: 26 Not depressed: 52 | Clinical interview | Transformer (Chinese BERT) | MINI | Logistic Regression | 5-fold cross-validation | 0.92* | 0.93 | 0.99 | 0.87 | 0.93 | |
Studies Reporting Continuous Outcomes | ||||||||||||||
Study | Language | N | Source of text | Features extraction method | Outcome | Type of classification model | Validation | MAE | RMSE | R2 | ||||
Morales 201680 | German | 138 (84) | Interview on everyday life aspects | LIWC; n-gram; Part-of-Speech (POS); Text-based speech rate features | BDI-II | SVM | leave-one-out cross-validation | 7.56 | 9.21 | 0.526 | ||||
Ozkanca 201881 | Turkish | 70 | Open-ended questions (neutral, positive, and negative questions) | Manual sentiment tagging (positive/negative/neutral), number of responses per sentiment, average utterance length, speech rate, features computed separately for positive/negative/neutral questions (15 total features) | BDI-II | SVR | Leave-one-out | 10.3 | ||||||
| Note: The number in the “N” column represents the total number of text observations, with the value in parentheses indicating the number of participants from whom these observations were collected. The accuracy result marked with a * has been computed by us. BiLSTM: Bidirectional Long Short-Term Memory; LIWC: Linguistic Inquiry and Word Count; CNN: Convolutional Neural Network; TF-IDF: Term Frequency–Inverse Document Frequency; PHQ-9: Patient Health Questionnaire–9; BERT: Bidirectional Encoder Representations from Transformers; IIFDD: Intra- and Inter-modal Fusion Model for Depression Detection; SVM: Support Vector Machine; LASSO: Least absolute shrinkage and selection operator; BDI-II: Beck Depression Inventory–II; GRU: Gated Recurrent Unit; RNN: Recurrent Neural Network; LDA: Latent Dirichlet Allocation; POS: Part-of-speech; HAMD: Hamilton Depression Rating Scale; DNN: Deep Neural Net; XGBoost: Extreme Gradient Boosting; MDS-UPDRS: Movement Disorders Society Unified Parkinson's Disease Rating Scale; ULMFiT: Universal Language Model Fine-tuning; SDS: Self-rating Depression Scale questionnaire; HDRS-21: Hamilton Depression Rating Scale-21; SVR: Support Vector Regression; SCT: Sentence Completion Test; MADRS: Montgomery-Åsberg Depression Rating Scale. | ||||||||||||||
First author | Features extraction method | Outcome type | Type of classification model | Accuracy | F1 | Precision | Recall/ Sensitivity | MAE | RMSE |
|---|---|---|---|---|---|---|---|---|---|
Agarwal 202282 | Embedding (GloVe) | Binary | MV-IA-Mean | 0.72 | 0.73 | 0.74 | UAR: 0.72 | ||
Agarwal 202483 | Embedding (Sentence embeddings from all-mpnet-base-v2; graph built using cosine similarity between embeddings) | Binary | GCN + Transformer multi-head attention | 0.83 | 0.81 | 0.80 | UAR: 0.82 | ||
Al-Hanai 201884 | Embedding (Word2Vec) | Binary | LSTM | 0.67 | 0.57 | 0.8 | 5.18 | 6.38 | |
Ansari 202385 | Count vectorization | Binary | LR and LSTM | LR: 0.748, LSTM: 0.73 | LR: 0.67, LSTM: 0.61 | ||||
Burdisso 2023*86 | TF-IDF; PMI (Pointwise Mutual Information).; PageRank | Binary | node-weighted GCN | 0.84 | |||||
Cao 202287 | Transformer (BERT) | Binary | BERT | 0.91 | |||||
Chen 202488 | TF-IDF; PMI (Pointwise Mutual Information).; PageRank | Binary | GCN | 0.84 | |||||
Correia 201689 | Embedding (GloVe) | Binary | SVM | Per sentence: 0.533 Per interview: 1.00 | |||||
Dang 201790 | • SALAT; • siNLP; • TAALES; • SEANCE; • ANEW; • EmoLex; • SenticNet; • Lasswell | Cont. | SVR | 4.98 | 6.02 | ||||
Danner 202391 | Transformer (BERT) | Binary | BERT | 0.82 | 0.83 | 0.82 | |||
Fang 202392 | Transformer (USE) | Cont. | Bi-LSTM with an attention mechanism | 3.61 | 4.76 | ||||
Firoz 2023a93 | • BoW • TF-IDF • Embedding (Word2Ve, FastTex) | Binary | Ensemble model of CNN-LSTM-and Bi-LSTM | 0.80 | |||||
Firoz 2023b94 | Transformer (BERT); Counts of absolutist language (e.g., always, never, completely) | Cont. | LSTM | 5.65 | 9.45 | ||||
Flores 202324 | Transformer (BERT) | Binary | LSTM | 0.72 | |||||
Guo 202495 | Transformer (BERT) | Binary | PTDD | 0.69 | 0.60 | 0.48 | 0.73 | ||
Hadzic 202496 | GPT4 | Binary | GPT4 | 0.71 | 0.81 | 0.70 | |||
Hong 202297 | Embedding (GRL using Schema Encoders) | Cont. | Schema-Based Graph Neural Network | 3.76 | |||||
Iyortsuun 202445 | Transformer (Transformer-based, USE-large) | Binary and cont. | BiLSTM + Attention | 0.727 | 0.78 | 0.80 | 0.76 | 3.96 | |
Jo 2022*98 | Embedding (unclear the exact type) | Binary | CNN | 0.8171 | 0.8101 | 0.80 | 0.8205 | ||
Kokkera 202399 | • Word frequencies • POS tags • Sentiment scores | Binary | RF | 0.40 | 0.40 | 0.44 | 0.43 | ||
Lam 2019100 | Manual topic modelling + augmentation + embedding + Transformer | Binary | Transformer architecture | 0.78 | 0.91 | 0.83 | |||
Lau, 2021101 | Transformer (BERT) | Binary and cont. | BiLSTM + attention | 0.83 | 0.83 | 0.83 | 4.23 | 5.32 | |
Lau 2023102 | Transformer (BERT and RoBERTa) | Cont. | BiLSTM + attention | 4.17 | 0.02 | ||||
Li 2022a103 | Embedding (from scratch) | Binary | biLSTM + RNN network | 0.745 | 0.706 | 0.701 | 0.715 | ||
Li 2022b104 | Transformer (BERT) (utterance-based) | Binary | BiLSTM + attention with an MLP-Softmax classifier | 0.78 | UAR: 0.79 | ||||
Li 2023105 | Part-of-Speech (POS); Named Entity Recognition (NER); Embedding (GloVe) | Binary | BiLSTM | 0.79 | 0.69 | 0.80 | |||
Lin 2020106 | Embedding (Elmo) | Binary | BiLSTM + Attention | 0.83 | 0.83 | 0.83 | |||
Lopez-Otero 2017107 | Embedding (GloVe) | Binary | SVM | 0.857 | 0.730 | ||||
Lorenc 2022108 | Embedding and transformer (USE5, DAN, sBERT ) | Binary | Chunk-based biLSTM model | UAR: 0.803 | |||||
Lu 2023109 | Transformer (BERT) | Binary | BERT | 0.76 | |||||
Rodrigues Makiuchi 2019110* | Transformer (BERT) | Cont. | 8 CNN blocks-LSTM | 4.22 | |||||
Mallol-Ragolta 2019111 | Embedding (GloVe) | Binary | HCAN | 0.63 | UAR: 0.66 | ||||
Mao 2023112 | Embedding (GloVe) | 5-levels classification | BiLSTM | 0.968 | 0.971 | ||||
Milintsevich 2023113 | Transformer (RoBERTa) | Binary, 5-levels classification and cont. | BiLSTM + Attention | Binary: Micro-F1 = 0.766 Macro-F1 = 0.739 5-Class: Micro-F1 = 0.426 Macro-F1 = 0.270 | 3.78 | ||||
Niu 2021114 | Embedding (GloVe) | Binary and cont. | Hierarchical context-aware graph attention model | 0.77 | 0.70 | 0.82 | 3.73 | 4.8 | |
Pampouchidou 2016115 | • LIWC; • Total number of words and sentences • Average sentence length • Laughter-to-word ratio • Depression-related word ratio: • ANEW • Mean and SD of pleasure, arousal, dominance ratings • Word frequency | Binary | Decision Tree | Depressed: 0.23 Not depressed: 0.79 | 8.99 | 10.75 | |||
Prabhu 2022116 | Embedding (Word2vec pretrained) | Binary | LSTM | 0.823 | |||||
Qureshi 2019117 | Embedding (from scratch, feature learning via an LSTM encoder) | Continuous and 5-class | DNN | 0.67 | 0.53 | 3.90 | 4.96 | ||
Qureshi 2020118 | Transformer (USE) | Cont. and 5-level class | LSTM | 0.667 | 0.62 | Class: 0.66 Cont: 3.81 | Class: 1.23 Cont: 4.70 | ||
Qureshi 2021119 | Transformer (USE) | Cont. | LSTM | 3.78 | 4.88 | ||||
Rasipuram 2022120 | Transformer (GPT2) | Cont. | BiLSTM | 3.21 | 4.25 | ||||
Ray 2019121* | Transformer (USE) | Cont. | stacked BiLSTM + feedforward network | 4.02 | 4.73 | ||||
Rinaldi 2020122 | Embedding (GloVe) | Binary | Joint Latent Prompt Categorization (JLPC) | 0.604 | |||||
Rohanian 2019123 | Embedding (GloVe) | Binary and cont. | LSTM | 0.69 | 0.68 | 4.98 | 6.05 | ||
Sadeghi 2023124* | Transformer (GPT-3.5-Turbo and DepRoBERTa ) | Cont. | SVR with a polynomial (poly) kernel | 4.26 | 5.36 | ||||
Sadeghi 2024125* | Transformer (GPT-3.5-Turbo (prompt asking the model to describe the interview + DepRoBERTa and GPT-3.5-Turbo response to 11 questions on the interview) | Cont. | SVR | 3.86 | 4.66 | ||||
Samareh 2018126 | • Basic linguistic stats (e.g., word count); • Dictionary based depression-related word ratio; • Sentiment features (AFINN) | Cont. | RF regression with confidence-based decision-level fusion. | 4.78 | 5.59 | ||||
Senn 2022127 | Transformer (BERT and RoBERTa) | Binary | Ensemble of BERT, RoBERTa, DistilBERT | 0.62 | 0.64 | ||||
Shen 202261 (used also eatd corpus) | Embedding (ELMo) | Binary | BiLSTM with Attention | 0.83 | 0.83 | 0.83 | |||
Stasak 2017128 | Word Affect Features: single affect word-rating reference, such as the General Index | Binary | decision tree classification | 0.82 | |||||
Stepanov 2018129 | BOW | Cont. | SVR | 4.88 | 5.83 | ||||
Sun 2017130 | Selected key phrases related to symptoms | Cont. | RF | 0.55 | 0.40 | 0.89 | 3.87 | 4.98 | |
Tlachac 202270 | Transformer (BERT) | Binary | fine-tuned BERT classifier | 0.48 | |||||
Toto 2021131 | Transformer (BERT) | Binary | LSTM | 0.67 | |||||
Marriwala 2023132 | Embedding (Word2vec) | Binary | CNN | 0.8 | 0.6 | 0.63 | 0.68 | ||
Van Steijn 2022133* | LIWC; Transformer (BERT); Sentiment; speech rate; Repetition rate; Confidence score | Cont. | KELM | 6.06 | |||||
Villatoro-Tello 2021134* | Lexical Availability | Binary | MLP (Train the model on E-DAIC and tested on DAIC-WOZ) | 0.83 | 0.87 | 0.81 | |||
Williamson 2016135 | Embedding (GloVe); Topics | Binary and cont. | SVR | 0.84 | 3.34 | 4.46 | |||
Xezonaki 2020136 | LIWC; TDF-IDF; Embedding (GloVe) ; Affective lexica (AFINN, Bing Liu, MPQA, Emolex, SemEval15) | Binary | Hierarchical Attention Network with Lexicon and Summary Integration | 0.70 | 0.70 | ||||
Xia 2024137 | Embedding (Word2vec) | Binary | BiLSTM-GNN | 0.64 | 0.60 | 0.585 | 0.584 | ||
Xiao 2021138 | Transformer (BERT) | Binary | BERT | 0.70 | |||||
Xu 2023139* | Transformer (BERT) | Binary | Two-layer STM network | 0.82 | 0.81 | 0.83 | |||
Xue 202473 | Transformer (BERT) | Binary | Fine-tuned BERT model with fully connected (FC) layers | 0.85 | 0.79 | 0.92 | |||
Yadav 2023140 | Embedding (Word2Vec, ELMo); Transformer (BERT) | 5-levels class | BGRU model with two Fully Coupled (FC) networks as output layers | 0.923 | 0.929 | 0.928 | |||
Yang 2017a141 | Embedding (PV); Global structural and behavioral text features (e.g., Number of words) | Binary | SVM | Depressed: 0.667 Not depressed: 0.885 | Depressed: 1.000 Not depressed: 0.793 | Depressed: 0.50 Not depressed: 1.00 | |||
Yang 2017b142 | Embedding (PV) | Cont. | DCNN and DNN | Female: 3.750 Male: 3.525 | Female: 4.361 Male: 4.406 | ||||
Yang 2018143 | Embedding (PV) | Binary | SVM | 0.75 | |||||
Yang 2019144 | Embedding (Doc2vec) and Text Convolutional Neural Network | Binary | SVM | 0.72 | |||||
Zhang 2020a145* | Embedding (PV or doc2vec ) | Binary and cont. | Multitask Deep Neural Network (DNN) | 0.839 | 0.907 | 4.66 | |||
Zhang 2020b146 | Transformer (BERT); Key phrase matching | Binary | bidirectional variable-length LSTM model | 0.81 | 0.82 | 0.8 | |||
Zhang 2024b147 | Transformer [Sentence-BERT (nli-bert-large)] | Binary | BiLSTM | 0.87 | |||||
Zhang 2024c148 | Transformer (T5-Encoder and BERT) | Binary, 3-levels, 5-levels | T5 + BERT dual-branch fusion | Binary: 0.8913 3-level: 0.6739 5-level: 0.5435 | Binary: 0.8276 3-level: 0.6677 5-level: 0.5259 | Binary: 0.80 | Binary: 0.857 | 5.283 | |
Zhao 2022149 | n-gram | Cont. | Transformer-based architecture with self-attention and feed-forward layers | 5.03 | 5.95 | ||||
| Note: * indicate studies using the E-DAIC dataset; all others are based on DAIC-WOZ. Since only two studies reported AUC and three studies reported specificity, these metrics were removed from the table. MV-IA-Mean: Multi-view model with inter-view attention coupled with the mean function; GCN: Graph Convolutional Network; SALAT: Suite of Linguistic Analysis Tools: This open-source toolkit was used to extract various linguistic and word affect features from transcripts. siNLP: Simple Natural Language Processing Tool; TAALES: Tool for Automatic Analysis of Lexical Sophistication; SÉANCE: Sentiment Analysis and Cognition Engine; ANEW: Affective Norms for English Words; EmoLex: This provided features based on token words related to eight emotion types (e.g., anger, anticipation, disgust, fear, joy, sadness, surprise, trust); SenticNet: This provided features based on nearly 13,000 token words, evaluating perceptual polarity norms for aptitude, attention, pleasantness, and sensitivity; Lasswell: This provided 146 features from 63 different word lists categorized by eight semantic characterizations, with a particular interest in the well-being category; BERT: Bidirectional Encoder Representations from Transformers; SVR: Linear Support Vector Regression; Bi-LSTM: Bi-directional LSTM; SVM: Support Vector Machine ; NB: Naïve Bayes; LR: Logistic Regression; LSTM: Long Short Term Memory; USE: Universal Sentence Encoder; BoW: Bag of Words; TF-IDF: Term Frequency-Inverse Document Frequency; PTDD: Prompt-based Topic-modeling method for Depression Detection; RF: Random Forest; UAR: Unweighted Average Recall; USE5: USE Transformer-based; DAN: Deep Averaging Network a simpler sentence embedding model; sBERT: Sentence-BERT Transformer fine-tuned for sentence similarity; HCAN: Hierarchical Contextual Attention Network; POS: Part-of-speech; DepRoBERTa: fine-tuned RoBERTa language model, which is specifically designed for depression detection; CNN: Convolutional Neural Network; KELM: Kernel Extreme Learning Machine; BGRU: Bidirectional Gated Recurrent Unit; DCNN :Deep Convolutional Neural Network; DNN: Deep Neural Network; MLP: Multi-layer Perceptron; MDSD-T5: the T5-based (Google encoder–decoder Transformer) branch of the MDSD-FGPL system; PV: paragraph vector, an extension of Word2Vec. | |||||||||