As mental health researchers with a digital interest, we are increasingly approached by researchers and industry partners to recommend self-report instruments that can be offered to the general public for monitoring mental health digitally. A barrier to providing an evidence-based answer to this apparently simple question is the lack of an overview of instruments that have already been used for this or related purposes. The overarching aim of this review, consequently, was to scope existing instruments that could fulfill this role, as demonstrated by their use in published empirical research. Broadly, we sought to answer the question: which instruments have been most frequently used, and what are their characteristics? First, we briefly review evidence that digital technologies render self-monitoring more feasible than ever before, and introduce contemporary approaches to test construction. We then introduce the dual continua model of mental health that was used to guide the systematic search strategy and results synthesis of this study.
Self-Monitoring of Mental HealthSelf-monitoring has proven effective and economical in managing many chronic illnesses [-], and is potentially a useful component of population-based approaches to managing mental health [-]. In the realm of mental health, self-monitoring via self-report inventories can raise emotional self-awareness [] and promote positive health behavioral change over time []. Self-monitoring mental health can detect early symptoms of mental illness [] while also providing a better understanding of factors that influence positive mental health [-], aligning with the goals of mental illness prevention and mental health promotion programs []. The focus here was on the full range of mental health constructs that have been measured by self-report, including general mental health measures (eg, the Kessler Psychological Distress Scale [K10] []), and instruments designed to screen for (eg, the Patient Health Questionnaire 9 Items [PHQ-9] []), or monitor severity of diagnosable disorders (eg, the Beck Depression Inventory II []).
Ubiquitous digital platforms have enormous potential to support self-monitoring of mental health in the general population. These opportunities are increasingly recognized in fields including digital monitoring [], digital assessments [], digital phenotyping [], and self-management []. Remarkably, it is estimated that more than 10,000 mobile apps are available in the digital mental health market [,], and common mental illnesses such as anxiety and depression are measured in numerous associated apps [-]. Importantly, these apps typically incorporate existing validated self-report instruments in a digital format for self-tracking purposes []. For example, the PHQ-9 is often used as a measure of depressive symptoms, despite being developed in the predigital era.
A shift toward digital modalities of data collection also supports a transition in survey development from classical test theory (CTT) to item response theory (IRT). CTT [] has been the dominant approach to survey development for traditional paper-and-pen surveys. However, advances in measurement theories such as IRT [,] highlight several shortcomings of CTT and are starting to positively influence psychological test development through improved measurement precision [,,]. One notable application of IRT, originating from educational assessments, is the Computerized Adaptive Test (CAT) [], which harnesses the computational power of digital technologies to provide dynamic survey administration. CAT can maintain the precision of mental health measurement while significantly reducing the total number of questions and thereby alleviating the response burden [,]. A prominent example of this contemporary approach to measurement is the PROMIS (Patient Reported Outcome Measures Information System), a program initiated by the US National Institutes of Health []. PROMIS includes suites of mental health outcome measures designed with digital administration in mind, available in various formats such as fixed-length (eg, comprehensive long-form or equivalent short-form self-report assessments) and computerized adaptive testing assessments.
Dual Continuum Model of Mental HealthAlthough originally linked to mental illness diagnoses [-], contemporary theorizing [,-] conceptualizes mental health as multifaceted, subsuming deficits-based constructs related to mental illness, but also strengths-based constructs such as subjective well-being [], the mental health continuum [], and flourishing (in positive psychology) [].
The influential dual continua model of mental health [] recognizes 2 correlated but separable dimensions, typically named “mental illness” and “mental well-being” (for clarity, we use the terms “deficit” and “strength” here [], see ). This conceptual model explains more variance in diverse populations than an alternative model in which mental illness and mental well-being are placed at opposite ends of a single continuum []. As described in the protocol paper for this study, the dual continua model directly informed the present search strategy and the synthesis of findings ( []).
Figure 1. Dual continua model of mental health [] (adapted from Teng et al [] with permission). This ReviewThe present review is unique because it intends to advance understanding of the range of instruments that could function as mental health self-monitoring tools in the digital era. Specifically, we systematically searched for empirical studies that used instruments measuring mental health in (1) the general population, (2) digital format, and (3) longitudinal designs. While there is a growing body of research investigating measures of mental health in the nonclinical adult population [-], there is a lack of reviews that focus specifically on studies that meet the 3 aforementioned criteria. Consequently, this review has a narrow focus on instruments that have been applied in a manner comparable to digital self-monitoring of mental health. To the best of our knowledge, no review shares the specific focus of the present review. We anticipate that the review findings will be useful to the burgeoning number of researchers who share our interest in valid digital self-report measurements of mental health in the general population [,].
The aim of this scoping review was therefore to systematically identify empirical studies that used instruments measuring mental health (broadly defined) in (1) the general population, (2) digital format, and (3) longitudinal designs. This aim was addressed with the following research question (RQ):
RQ: Which self-report mental health instruments have been most frequently used in empirical studies via digital delivery to measure the mental health of the general adult population at more than 1 time point?
In this review, usage of the instrument is operationalized as the frequency of the instrument being administered in identified empirical studies over the number of years since the release of the instrument, bounded by the time frame of the review (ie, papers published between January 1, 2010, and December 31, 2021, inclusively).
The peer review process has led to some changes to study reporting that represent deviations from the original study protocol []. Most importantly, the study was originally designed as a systematic review, but through peer review, the authors learnt that it is easier to understand the study as a scoping review with a systematic literature review step. Second, we have reorganized the presentation of this study’s RQs as described in the protocol. The primary RQ above is a more succinct wording of the erstwhile questions RQ1 and RQ4. To avoid misunderstanding about their status in the study design, ancillary questions about the characteristics of the most popular instruments (erstwhile RQ2, RQ3, and RQ5) have been removed from the main text (see ), and the findings now appear only in Section S2 in [,,,,,,-].
This scoping review adheres to the PRISMA-ScR (Preferred Reporting Items for Systematic Review and Meta-Analysis Extension for Scoping Reviews) 2018 guidelines []. Methods such as eligibility criteria, information sources, search strategy, selection process, data collection process, data extraction, risk of bias (quality) assessment, and data synthesis plan are described in our published protocol []. The following section elaborates on deviations from the original protocol and the execution of these methods. Section S1 in presents the full search terms and strategies used for all databases in this review.
Selection ProcessEligibility CriteriaIn the original protocol [], eligibility criteria were defined in the Population, Intervention, Comparison, Outcome, and Time (PICOT) [] format. For this scoping review, a mapping from the PICOT format to the population, concept, and context [] format is detailed below. The population is identical to the protocol. The concept represents the types of empirical studies included and the scope of the mental health constructs considered, which were associated with the section “Intervention of Interest” of the protocol []. The context in this review represents the timeframe of the included published papers, which is between January 1, 2010 and December 31, 2021 (inclusive).
Preliminary Screening of Titles and AbstractsIn a preliminary scoping of the literature, a large number of search results (over 90,000 records) were returned, requiring the application of various tools to help screen records. One reviewer (ZHK) screened the titles and abstracts of the search records using text-mining techniques []. Subsequently, the screening effort was validated using a semiautomated screening tool (ASReview []). The details of this preliminary screening process are described elsewhere [].
Full-Text ScreeningA large number of eligible records (N=8460) were identified for full-text review after preliminary screening. This large record count is a result of our broad interest in the most frequently used mental health instruments being administered by past empirical studies. Furthermore, our three inclusion criteria (instruments being used in (1) the general population, (2) digital format, and (3) across multiple time points) are often not mentioned in the records’ title or abstract, as they relate more to methods than the substantive questions of the study.
Considering that a typical review takes, on average, 5 members and approximately 67.3 weeks to conduct and publish [,], full-text screening of 8460 records was not feasible with 2 reviewers and limited resources. This feasibility challenge has been acknowledged by COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) when reviews are conducted on patient-reported outcome measures []. The large return of eligible studies, therefore, forced us to develop a pragmatic approach to full-text screening. Previous systematic reviews facing similar challenges, for example, these studies [-] have screened a subset of the eligible records using either a random sampling strategy (eg, stratified sampling by time period) or a data saturation strategy to ensure the representativeness of the variables of interest.
Here, we also decided to screen a subset of eligible records, but adopted a statistical strategy to test whether or not the subset was representative of the entire population of eligible records. We randomly selected a subset of 1000 records from the 8460 records eligible for full-text screening in batches of 250 (no repeats) over 4 iterations of sampling. As the primary focus of the present review was instruments’ usage frequency (above), usage frequency was the variable used to statistically investigate the representativeness of the subset of n=1000 records. It was hypothesized that if the distribution of the usage frequency of instruments in each of the 4 batches was not statistically different, given the 4 batches were randomly selected from the same population of 8460 records, it could be inferred that the distribution of the usage frequency of instruments in each batch was likely to reflect in the full set of 8460 records [-]. The hypothesis was tested using statistical techniques (chi-square test of homogeneity [] and Fisher exact test [], significance level α of .05) once the 1000 records were screened in full-text and the data were extracted from the included records.
Informed by these approaches, 2 reviewers (ZHK and DS) screened the selected 1000 records in 4 iterative batches of 250 (). To help manage the screening process, we developed a tool [] to facilitate full-text screening and data extraction (hereafter referred to as “the custom tool”). Both reviewers pilot-tested the custom tool with 25 randomly selected records before commencing the full-text screening and data extraction. In each iteration, both reviewers independently screened 250 records in full text. If the record was still eligible, relevant data were extracted using the custom tool. If further information was required, ZHK contacted the researchers of the studies. After each iteration, both reviewers reconciled the review results and resolved any disagreements. The interrater agreement between reviewers was calculated using Cohen κ [] (). Suggestions to improve the custom tool were considered and implemented in subsequent iterations.
Figure 2. Full-text screening process of 1000 randomly sampled records in 4 iterations. After 2 iterations, it became apparent that one of the main disagreements between reviewers was determining whether the instruments used in these records were in English, as not all records explicitly declared the language of the instruments. A decision flowchart was created () to facilitate the decision-making between reviewers after iteration 2. Since then, the interrater agreements were improved in iteration 3 and were maintained in iteration 4.
Data Extraction ProcessBoth reviewers (ZHK and DS) independently extracted data from each record using a separate copy of the custom tool. After completing each iteration, ZHK reconciled the instruments extracted from both reviewers. Then, both reviewers discussed any discrepancies and resolved disagreements before commencing the next iteration.
Quality AssessmentWhile the usage frequency of instruments is the main focus of the present scoping review, to ensure this review is useful to future researchers, similar to past reviews [-], we briefly commented on the quality assessment at the instrument level instead of individual studies, with a provisional evaluation of their core psychometric properties using studies from secondary searches (Section S2 in ).
Data SynthesisFindings are summarized using a narrative synthesis approach [] guided by the Synthesis Without Meta-Analysis (SWiM) reporting guideline [] and framed by Keyes dual continua model (above). As described in our published protocol [], the included studies were grouped by instruments and synthesized in tabular format, summarizing the properties of all studies associated with each instrument. The usage of instruments among the included studies is presented descriptively as figures.
Ancillary InformationThis study was not designed to rank identified instruments in terms of psychometric properties. It is expected that the present frequency-of-use ranking and synthesis will provide the foundation for future researchers to ask questions about the optimal approach to digital, prospective self-report of mental health (eg, selecting the best instrument for a particular population or building new adaptive instruments from item content in established instruments). For each instrument identified, ZHK manually searched the literature to elicit the ancillary information, such as the instrument’s structure, format, and psychometric properties. The ancillary information was then synthesized in tabular format (Section S2 in ). The psychometric properties of each instrument were provisionally rated based on the quality criteria defined by Terwee et al [].
Searches on 5 databases (Scopus, Web of Science, PubMed, PsycINFO, and Psychology & Behavioral Sciences Collection [via EBSCOhost]) returned 191,815 records, of which 95,849 were unique records after deduplication (). Preliminary screening of titles and abstracts resulted in 8460 eligible records for full-text screening. As noted above, 1000 of 8460 eligible records were randomly selected in batches of 250 for full-text screening over 4 iterations. Statistical tests were conducted to evaluate the representativeness of the instruments’ usage distribution (the key variable of interest) within each iteration across the 1000 records, as described in the results below. Among these 1000 records, 770 were deemed ineligible, and 7 records were excluded through author nonresponse to queries from ZHK. A representative group of 223 records was included for data extraction and synthesis.
Figure 3. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart showing how records were identified from the initial search to the final decision. *: for pragmatic reasons, 1000 records were randomly sampled from 8460 records over 4 iterations for full-text screening. Statistical testing of the validity of this pragmatic strategy is described above and reported below. Among the 770 records deemed ineligible during full-text screening, the most common reasons for excluding records were that the instrument was non-English [-], records were reviews, meta-analyses, or book chapters [,], the instrument had not been administered repeatedly [,], was not self-reported [,], or was nondigital [,].
The 223 included records (empirical studies) reported the usage of 278 eligible self-report mental health instruments, of which 68 were used in more than 1 record. To ensure our synthesis was manageable, we selected instruments that were most commonly used across studies, resulting in 30 of 68 instruments, which accounted for approximately 78.4% (308/393) of the total instruments’ usage across studies (see for details about the selection).
Six instruments in were excluded from the 30 instruments to narrow the measurement construct more precisely onto states of mental health. Specifically, instruments were excluded for the following reasons: first, the Penn State Worry Questionnaire, Brief Resilience Scale, and Rosenberg Self-Esteem Scale were excluded because they measure traits, which are not the focus of this review. These instruments were not excluded during screening and data extraction because it was unclear that they measured trait-like constructs based on their repeated usage in the included empirical studies. It became apparent that subsequent secondary searches that obtained the original publication of these instruments have illuminated the ineligibility of these instruments in this review (n=3). Second, the MOS Social Support Survey measures social support rather than mental states (n=1). Third, custom questions on mood or affect and stress were excluded because they were nonstandardized instruments, with question items varying from study to study (n=2). The synthesis from here on will focus only on the remaining 24 instruments () among 147 studies.
The statistical investigation of our pragmatic strategy for reducing the number of records for full-text review involved, first, a chi-square test of the distribution of instruments’ usage across the 4 iterations of sampling. As expected, the test was not significant (χ269=60.98, P=.74). However, one of the assumptions (80% of the expected values >5) of the chi-square test was not met. Hence, a Fisher Exact Test was subsequently conducted, which was also not significant (P=.74). Therefore, we could not reject the null hypothesis of no difference, indicating that the usage distribution of instruments across batches was not significantly different. Given that each batch of records was randomly sampled from a population of 8460 records, this finding suggests that the usage distribution of instruments presented in these batches most likely approximates the overall instrument usage of all eligible records.
Figure 4. Cumulative percentage of the 68 instruments’ usage across studies. Thirty instruments were selected for synthesis (left of the red dotted line) and accounted for 78.4% (308/393) of the total instruments’ usage across studies. Instruments with names prefixed with “custom:” are groups of custom questions created by researchers to measure similar mental health constructs commonly used in ecological momentary assessment studies. For example, the instrument named “custom: mood or affect” includes custom questions measuring mood or affect. Table 1. The 24 instruments selected for synthesis, sorted by descending usage frequency.InstrumentMental health constructs measuredDeficits- or strengths-focusedPatient Health Questionnaire 9 Items (PHQ-9) []DepressionDeficitsGeneralized Anxiety Disorder 7 Items (GAD-7) []Anxiety (generalized anxiety disorder)DeficitsDepression, Anxiety, Stress Scale 21 Items (DASS-21) []Depression, anxiety, and stressDeficitsKessler Psychological Distress Scale (K10)a []Psychological distressDeficitsPerceived Stress Scale 10 Items (PSS-10) [,]StressDeficitsHospital Anxiety and Depression Scale (HADS) []Anxiety and depressionDeficitsBeck Depression Inventory II (BDI-II) []DepressionDeficitsSatisfaction With Life Scale (SWL) []Life satisfactionStrengthsCenter for Epidemiologic Studies-Depression Scale (CES-D) []DepressionDeficitsWarwick-Edinburgh Mental Well-Being Scale (WEMWBS) []Mental well-beingStrengthsPatient Health Questionnaire 8 Items (PHQ-8) []DepressionDeficitsPositive and Negative Affect Schedule (PANAS) []Positive affect and negative affectDeficits and strengthsQuick Inventory of Depressive Symptomatology 16 Items (QIDS-SR)a []DepressionDeficitsAlcohol Use Disorder Identification Test (AUDIT) []Alcohol useDeficitsEating Disorder Examination Questionnaire (EDE-Q) []Eating disorderDeficitsImpact of Event Scale-Revised (IES-R) [] (first edition) []Posttraumatic stress disorderDeficitsMedical Outcome Survey Short Form 36 Items (SF-36)b [,,]Vitality, social functioning, role-emotional, and mental healthDeficits and strengthsPanic Disorder Severity Scale-Self Report 7 Items (PDSS-SR) [,]Panic disorderDeficitsPerceived Stress Scale 14 Items (PSS-14) [,]StressDeficitsPerceived Stress Scale 4 Items (PSS-4) []StressDeficitsWHOQOL-BREFb,c []Psychological (positive feelings, cognitions, self-esteem, body image, negative feelings, and spirituality)Deficits and strengthsEQ-5D (also known as EQ-5D-3L)b []Anxiety or depressionDeficitsMental Health Continuum Short-Form (MHC-SF) [,]Mental well-being (emotional well-being, psychological well-being, and social well-being)StrengthsSpielberger State-Trait Anxiety Inventory (STAI) []Anxiety (state and trait)DeficitsaThese instruments were developed or psychometrically evaluated using item response theory during the first publication of the instruments.
bThese are instruments measuring quality-of-life domains; only relevant mental health constructs are listed.
cWHOQOL-BREF: The World Health Organization Quality of Life (abbreviated version).
Instrument SynthesisSome instruments in are shorter versions of the same instruments, which are listed separately due to usage differences. For example, Patient Health Questionnaire 8 Items [] is identical to PHQ-9 [] except without the item measuring suicidal thoughts or self-harm. Likewise, the Perceived Stress Scale 10 Items (PSS-10) and Perceived Stress Scale 4 Items are shorter versions of the Perceived Stress Scale 14 Items [,]. The 24 instruments that were included for synthesis in can be loosely categorized into 3 groups: deficits-focused (18 instruments, 75%), strengths-focused (3 instruments, 12.5%), and “holistic” (deficit and strength, 3 instruments, 12.5%). The top 7 most commonly used instruments were deficits-focused, measuring perceived stress (eg, Perceived Stress Scale 14 Items), psychological distress (eg, K10), and early symptoms of anxiety (eg, Generalized Anxiety Disorder 7 Items [GAD-7]) and depression (eg, PHQ-9). In contrast, among the 3 strengths-focused instruments, the Satisfaction With Life Scale (SWL) measures life satisfaction while the other 2 (Warwick-Edinburgh Mental Well-Being Scale [WEMWBS] and Mental Health Continuum Short-Form [MHC-SF]) measure broad mental well-being constructs. SWL has the highest usage compared to the other 2 strengths-focused instruments. Among the 3 instruments measuring deficits- and strengths-based constructs, 2 measure health-related quality-of-life (HRQoL) instruments (Medical Outcome Survey Short Form 36 Items [SF-36] and The World Health Organization Quality of Life [abbreviated version, WHOQOL-BREF]), which typically subsume multiple constructs in different domains apart from mental health, such as physical health and functioning. EQ-5D is an HRQoL instrument, but was categorized as a deficits-focused instrument here because it measures only symptoms of anxiety or depression within the mental health domain. The final “holistic” measure was the Positive and Negative Affect Schedule, which measures positive and negative affect.
It is noteworthy that the original versions of all 24 instruments were developed and published before the year 2008 ( and RQ2 under Section S2 in ). Indeed, the Center for Epidemiologic Studies-Depression Scale (CES-D), PSS (14-item, 10-item, and 4-item), Hospital Anxiety and Depression Scale, Spielberger State-Trait Anxiety Inventory, SWL, and Positive and Negative Affect Schedule were published before the year 1990. Between the years 1990 and 2000, instruments measuring symptoms of mental illness other than anxiety and depression emerged such as Alcohol Use Disorder Identification Test (measuring alcohol use), Eating Disorder Examination Questionnaire (measuring symptoms of eating disorders), Impact of Event Scale-Revised (measuring symptoms of posttraumatic stress disorder) and Panic Disorder Severity Scale-Self Report 7 Items (measuring symptoms of panic disorder). During this period, SF-36, an HRQoL instrument, was also being developed. From the year 2000 onward, instruments screening symptoms of anxiety and depression, and measuring multifaceted mental well-being began to arise, such as PHQ-9, GAD-7, K10, MHC-SF, and WEMWBS. The HRQoL instrument WHOQOL-BREF was also developed after the year 2000.
Only 2 instruments (K10 and QIDS-SR) used IRT in the original publication (annotated a in and RQ2 under Section S2 in ). The remaining instruments were typically developed using the factor analytical approach of CTT. It is noteworthy that the emerging PROMIS instruments, which were developed with digital administration in mind using contemporary test construction approaches such as IRT and CAT, have lower usage (less than 4/393, 1% of the total usage in ) compared to other instruments that were originally developed in paper-and-pen format.
Instrument Usagespresents the 24 instruments’ usage from different perspectives across 147 studies. A shows the distribution of the usage percentage of instruments. The top 3 instruments, which are deficits-focused, collectively contributed 46.2% (121/262) of the total usage of the 24 instruments (N=262): PHQ-9 (53/262, 20.2%), GAD-7 (51/262, 19.5%), and Depression, Anxiety, Stress Scale 21 Items (DASS-21; 17/262, 6.5%). B breaks down the instrument’s usage by the record’s publication year. For each instrument, the usage pattern each year appeared to mimic its total usage pattern (A), with the top 2 instruments (PHQ-9 and GAD-7) showing consistently high usage patterns across years since 2010. All included instruments were developed and released before 2010, so there is no need to consider the creation year when calculating usage frequency, as defined in our protocol []. C presents a different perspective on each instrument. This figure shows the usage frequency by instrument (between 2010-2021), in which most instruments have higher usage toward the years 2020 and 2021, most likely influenced by the number of published records in 2020 and 2021 (). Nonetheless, some top instruments (eg, PHQ-9, GAD-7, DASS-21, K10, and PSS-10) have been used almost consistently since 2010 among the included studies. In contrast, instruments measuring mental well-being, such as the SWL, WEMWBS, and MHC-SF, have only been used since 2015.
Figure 5. (A) Total usage of each instrument. (B) The usage percentage of each instrument per publication year of the record. (C) The usage percentage per instrument across years. AUDIT: Alcohol Use Disorder Identification Test; BDI-II: Beck Depression Inventory II; CES-D: Center for Epidemiologic Studies-Depression Scale; DASS-21: Depression, Anxiety, Stress Scale 21 Items; EDE-Q: Eating Disorder Examination Questionnaire; GAD-7: Generalized Anxiety Disorder 7 Items; HADS: Hospital Anxiety and Depression Scale; IES-R: Impact of Event Scale-Revised; K10: Kessler Psychological Distress Scale; MHC-SF: Mental Health Continuum Short-Form; PANAS: Positive and Negative Affect Schedule; PDSS-SR: Panic Disorder Severity Scale-Self Report 7 Items; PHQ-8: Patient Health Questionnaire 8 Items; PHQ-9: Patient Health Questionnaire 9 Items; PSS-4: Perceived Stress Scale 4 Items; PSS-10: Perceived Stress Scale 10 Items; PSS-14: Perceived Stress Scale 14 Items; QID-SR: Quick Inventory of Depressive Symptomatology 16 Items; SF-36: Medical Outcome Survey Short Form 36 Items; STAI: Spielberger State-Trait Anxiety Inventory; SWL: Satisfaction With Life Scale; WEMWBS: Warwick-Edinburgh Mental Well-Being Scale; WHOQOL-BREF: The World Health Organization Quality of Life (abbreviated version). Study Characteristicssummarizes how the 24 instruments were administered in the studies identified (n=147). Among the included studies (grouped by instruments) in , each instrument was administered in various digital modalities and frequencies to the general population. The most common digital modalities for administering these instruments were through web applications accessible by any internet-connected device and mobile apps. The frequency of administration of the included instruments varies from multiple times a day [,] to yearly [,]. Studies that administered the same instrument multiple times per day typically used an experience sampling design such as ecological momentary assessment (EMA) []. The response time frame of instruments was typically adapted by including studies despite the original scales having a default recommended duration (eg, past 2 weeks and past 4 weeks).
Table 2. The attributes of the included studies are grouped by selected instruments.Instrument and associated studiesAdministration modeAdministration frequencyResponse time frameStudy designCountryPHQ-9a [,,-]Laptop or computer, mobile app, web application, or tablet or phone1 day, 1 week, 12 months, 6 months, >1 month, >1 year, every 6 days, fortnightly, monthly, on-demand, or weeklyPast 2 weeks, past 4 weeks, past 6 days, or past weekEMAb, RCTc, cohort study, internet survey, or pre-postUK, Ireland, USA, Canada, Australia, or New ZealandGAD-7d [,,,-,, ,,,,,-,, ,,,,-, -]Laptop or computer; mobile app; web application; SMS, text or email; or tablet or phone1 day, 1 month, 1 week, 11 months, 3 weeks, >1 month, >1 year, every 6 days, fortnightly, monthly, on-demand, or weeklyPast 2 weeks, past 4 weeks, past 6 days, past 7 daysEMA, RCT, cohort study, internet survey, pre-post, or repeated cross-sectionalUK, Ireland, USA, Canada, Australia, New Zealand, or SingaporeDASS-21e [,,,-]Mobile app and web application1 month, 2 weeks, >1 month, >1 year, or weeklyPast 6 months or past weekEMA, RCT, case series, cohort study, or pre-postCanada, Australia, USA, New Zealand, or UKK10f [,,,,, ,,,,,, ,]Web application1 month, 1 week, 3 weeks, 4 times over 90 days, >1 month, 6 times over 10 weeks, or yearlyPast 2 weeks, past 28 days, past 30 days, past 4 weeks, or past month or last monthEMA, RCT, cohort study, internet survey, or pre-postCanada, Australia, UK, Ireland, USA, New Zealand, or UKPSS-10g [,,,,-]Mobile app and web application2 weeks, 3 weeks, >1 month, or yearlyPast month or last month, or past week or last weekEMA, RCT, cohort study, internet survey, or pre-postUSA, Australia, UK, or CanadaHADSh [,-]ACASIi, mobile app, or web application1 month, 1 year, 6 months, >1 month, every 6 months, or on-demandPast week or last weekRCT or cohort studyAustralia, UK, USA, Singapore, or CanadaBDI-IIj [,,,,,-]kWeb application1 month or >1 monthPast 2 weeksRCT, cohort study, or pre-postAustralia, Ireland, USA, or Hong KongSWLl [,,,,,,-]Mobile app or web application1 month, 1 week, 30 days, >1 month, multiple times per day, on-demand, or same dayNowEMA, RCT, cohort study, or pre-postUK, Australia, New Zealand, or USACES-Dm [,,,,-]Laptop or computer, or web application1 month, 1 week, 3 weeks, 6 months, or >1 monthPast week or last weekRCT, cohort study, or repeated cross-sectionalCanada, USA, Australia, UK, or nonspecified countries from Asia and EuropeWEMWBSn [,,,,,-]Mobile app or web application1 month, 4 weeks, >1 month, multiple times per day, or yearlyNow or past 2 weeksEMA, RCT, cohort study, or internet surveyUK or USAPHQ-8o [,,,-]Mobile app, web application, or tablet or phone1 year, 2 weeks, 4 weeks, 6 months, >1 month, or on-demandPast 2 weeksEMA, RCT, cohort study, or pre-postUK, USA, or AustraliaPANASp [,,,,,,]Web application1 week, 2 weeks, 4 weeks, 6 months, >1 month, or dailyGeneral, moment, past few days, past few weeks, today, week, or yearEMA, RCT, cohort study, or pre-postUSA, Canada, UK, or AustraliaQIDS-SRq [,,-]Mobile app; web application; SMS, text, or email; or tablet or phone1 year, 2 weeks, >1 month, daily, or weeklyPast 7 daysEMA, RCT, or cohort studyCanada, UK, or USAAUDITr [,,,,]ACASI or web application1 year, >1 month, >1 year, or every 6 monthsPast year or last yearRCT, cohort study, or pre-postUK, Australia, or CanadaEDE-Qs [,,,,]Web application1 month, 6 months, >1 month, >1 year, or yearlyPast 28 daysRCT, cohort study, or pre-postUK, Australia, Hong Kong, or USAIES-Rt [,,,,]Web application1 year, 6 months, or >1 monthPast 7 daysRCT, cohort study, or internet surveyIreland, Australia, or USASF-36u [,,,,]Web application1 month, 2 weeks, >1 month, or biannuallyPast 4 weeksRCT, cohort study, or pre-postAustralia, New Zealand, USA, or Hong KongPDSS-SRv [,,,,]Web application6 months, >1 month, monthly, or weeklyPast weekRCT, cohort study, or pre-postCanada or AustraliaPSS-4w [,,,,]Mobile app or web application12 months, 6 months, >1 month, monthly, or weeklyLast 30 days, or past month or last monthRCT or cohort studyUSA, Singapore, or UKPSS-14x [,,,,]Web application1 month, >1 month, or dailyPast month or last month, or todayRCT, cohort study, or pre-postUSA, Canada, or UKWHOQOL-BREFy [,,,,]Web application1 year, 2 weeks, or >1 monthPast 4 weeksRCT or pre-postUK, Canada, or USAEQ-5D [,,,]Web application1 month, 12 months, or >1 monthTodayRCT or pre-postAustralia, UK, or CanadaMHC-SFz [,,,]Web application; or SMS, text, or email1 month, >1 month, or multiple times per daySince the last prompt or beepEMA, RCT, or pre-postAustralia or USASTAIaa [,,,]Web application2 weeks, >1 month, or weeklyNowCohort study or pre-postUSAaPHQ-9: Patient Health Questionnaire 9 Items.
bEMA: ecological momentary assessment.
cRCT: randomized controlled trial.
dGAD-7: Generalized Anxiety Disorder 7 Items.
eDASS-21: Depression, Anxiety, Stress Scale 21 Items.
fK10: Kessler Psychological Distress Scale.
gPSS-10: Perceived Stress Scale 10 Items.
hHADS: Hospital Anxiety and Depression Scale.
iACASI: audio computer-assisted self-interview.
jBDI-II: Beck Depression Inventory II.
kThis was administered only to the general public who were at risk or diagnosed with a physical or mental illness, not to the healthy general population.
lSWL: Satisfaction With Life Scale.
mCES-D: Center for Epidemiologic Studies-Depression Scale.
nWEMWBS: Warwick-Edinburgh Mental Well-Being Scale.
oPHQ-8: Patient Health Questionnaire 8 Items.
pPANAS: Positive and Negative Affect Schedule.
qQID-SR: Quick Inventory of Depressive Symptomatology 16 Items.
rAUDIT: Alcohol Use Disorder Identification Test.
sEDE-Q: Eating Disorder Examination Questionnaire.
tIES-R: Impact of Event Scale-Revised.
uSF-36: Medical Outcome Survey Short Form 36 Items.
vPDSS-SR: Panic Disorder Severity Scale-Self Report 7 Items.
wPSS-4: Perceived Stress Scale 4 Items.
xPSS-14: Perceived Stress Scale 14 Items.
yWHOQOL-BREF: The World Health Organization Quality of Life (abbreviated version).
zMHC-SF: Mental Health Continuum Short-Form.
aaSTAI: Spielberger State-Trait Anxiety Inventory.
Comparison With Existing LiteratureThe set of frequently used instruments identified here broadly aligns with findings of past reviews examining outcome measurements. Points of difference are likely due to the unique focus of this review on digital and longitudinal applications of the instruments. The commonly used measurement instruments that Breedvelt et al [] found were included in the initial 68 instruments in the present review, except for the Edinburgh Postnatal Depression Scale [] and General Health Questionnaire-12 []. The Edinburgh Postnatal Depression Scale was excluded because it is outside the scope of the present review. It is noteworthy that Breedvelt et al [] found that the CES-D was the most commonly used instrument in measuring depression, followed by the Beck Depression Inventory II and the PHQ-9, whereas in the present review, we found that the usage of the PHQ-9 is substantially higher than that of any other instrument. This difference may be due to the present focus on digital, repeated measures application, especially given that the PHQ-9 is commonly used in mobile apps for EMA studies [,].
Most instruments measuring mental well-being and HRQoL found in the present review were also identified in an earlier review by Lindert et al [] of well-being measurement scales. An exception, however, is the MHC-SF. The difference in review timeframes between Lindert et al (2007-2012) [] and the present review (2010-2021) may explain this discrepancy. Another plausible explanation is that Lindert et al [] focused on subjective well-being, whereas MHC-SF is conceptually broader than subjective well-being. In contrast, a recent scoping review focusing on the dual-continua model of mental health [] found SWL and MHC-SF to be the most used measurements for positive mental health—both instruments appear in here. Beidas et al [] found 29 validated mental health screening instruments for the general adult population that are free and brief. Some instruments found in the present review, such as the DASS-21, CES-D, K10, and PSS-10, were not included in their review, most likely due to differences in search scopes and review aims.
The present scoping review aimed to advance our understanding of the range of instruments that could function as mental health self-monitoring tools in the digital era. This was achieved by identifying empirical studies that administered instruments in (1) the general population, (2) digital format, and (3) longitudinal designs. To the best of our knowledge, this is the first review that focuses on studies that administered instruments meeting these 3 criteria. The primary outcome was an ordered list of commonly used instruments that measure mental health longitudinally using digital technologies in the general population.
Principal FindingsThe literature review and synthesis generated 2 major findings. First, among the 24 most commonly used instruments, the majority (18/24, 75%) were deficits-focused. PHQ-9 and GAD-7 were by far the most used instruments, with consistent yearly usage between the years 2010 and 2021, measuring symptoms of depression and generalized anxiety, correspondingly. In the present review, the considerably higher usage of PHQ-9 and GAD-7 () was unsurprising because they have been used extensively as longitudinal measures for mental health, from web-based applications to mobile apps for EMA studies [,]. These 2 instruments are also nominated for use in a recent global effort to standardize mental health measures [].
The list of commonly used instruments also included strengths-focused instruments (eg, SWL, WEMWBS, and MHC-SF). Instruments measuring HRQoL (eg, SF-36 and WHOQOL-BREF) are a different type of commonly used instrument—from Keyes’ framework, they can be understood as measuring aspects of deficits-based (eg, anxiety or depression) and strengths-based (eg, positive effects or energy level) mental health constructs in conjunction with ratings of physical health domains and functioning. Among the strengths-focused instruments, SWL, WEMWBS, and MHC-SF, measuring life satisfaction and mental well-being, were found to be the most used instruments.
The predominance of deficits-focused instruments found here is at least partly due to the traditional elevation of illness-focused constructs in psychology and mental health research and practice (not to mention medicine and psychiatry []). The granularity of the deficits-based constructs measured by these instruments may also contribute to their disproportionate usage. For example, deficits-focused instruments are typically unidimensional and high-fidelity [,] as these instruments emphasize the accuracy and specificity of screening for or diagnosing specific mental disorders. In contrast, strengths-focused instruments and HRQoL instruments are typically multidimensional, containing broad constructs of measurement (also known as high bandwidth), addressing the complexity and variability of the domain of mental health or general health [,].
It is important to note that, used in isolation, the population deficits-focused instruments identified here may have limited utility as longitudinal mental health self-monitoring tools in the general population. The prevalence of mental disorders in the general population is relatively low, so deficits-focused instruments measuring symptoms of mental illnesses tend to exhibit floor effects [,,]. While these screening instruments play a role in potentially detecting early symptoms of mental illnesses, they may not be so helpful in providing a general picture of individuals’ mental health over time. The dual continua model of mental health suggests that measuring strengths-based constructs could complement deficits-focused measurement and consequently provide more ecologically valid and statistically sensitive assessments for the general population.
Second, while the 24 instruments synthesized in this review have been frequently administered in digital modalities (1 of the review’s 3 inclusion criteria), they were all developed at least 2 decades ago, and most followed CTT and were designed for paper-and-pen format. Since then, we have seen substantial work using IRT to revalidate the psychometric properties of these instruments [-], but (except for the PROMIS project) less attention has been devoted to new instrument development.
Next-generation mental health assessments could harness the computational power offered by digital technologies, coupled with contemporary psychometric practices such as IRT and computerized adaptive testing, to advance evidence-based assessments. Undoubtedly, a holistic instrument that measures both deficits- and strengths-based constructs may contain a larger number of self-report items compared to deficits- or strengths-based instruments alone. However, the combination of IRT and computerized adaptive testing can significantly reduce the number of questions asked in a survey [], which reduces respondent fatigue burden [], a common phenomenon where participants lose motivation when completing long or repeated surveys, affecting the data quality. This phenomenon is particularly prominent in self-monitoring instruments, which are typically administered multiple times over a period []. Therefore, there is a golden opportunity to lift the capability of evidence-based assessments through digital technologies, which provide an excellent launching platform for a digital tool that encourages people to monitor their mental health.
LimitationsDue to the sheer volume of eligible records for full-text screening, for pragmatic reasons, we randomly selected 4 batches of 250 records (a total of 1000 records) of the 8460 eligible records for full-text screening and synthesis after the preliminary screening. Although we demonstrated that the distributions of instruments’ usages were statistically indicative of the full records, we cannot definitively exclude the possibility that this pragmatic strategy biased our findings. However, given the prominence of a few commonly used instruments at the top of the list (eg, PHQ-9 and GAD-7) and previous studies in the literature, it seems unlikely that we missed any instruments that would have warranted inclusion in the top 30 (). Our primary objective is to develop a novel ranking of the most frequently used instruments that are potentially suitable for mental health self-monitoring in the general population. Future studies can elaborate on and update the results presented in the present study by considering records that were not selected for full-text screening. Additionally, the restriction on English-only instruments in this review may limit the generalizability of the results.
ImplicationsThis study has several implications for future research. First, the finding that deficits-focused instruments predominate the ranking of usage frequency despite their known distributional problems in the general population (floor effects) calls for serious consideration of more holistic instruments. As positive psychology, recovery, and mental health promotion and prevention gain traction [], future research studies should work toward the development of more holistic measures, potentially informed by the dual continua model [-]. Such measures for general population use would adequately capture both deficits (constructs relevant to a significant minority of the population, and of great interest clinically) and strengths (constructs that have the potential to support the majority of the population to advance their positive mental health).
Second, the 24 instruments included in the final synthesis of measures, while commonly used, were originally designed more than a decade ago for use in paper-and-pen format without taking advantage of the great possibilities of digital measurement, such as computerized adaptive testing. Although these instruments have been translated into digital format, we observed that the implications of such di
Comments (0)