The systematic review of organisational-level mental health promotion interventions delivered in workplace settings followed a comprehensive study selection process as outlined in Fig. 1. Initially, 11,236 studies were identified from various databases: Scopus (3337), PubMed Medline (2939), Web of Science (1922), and EBSCOHost Medline (3008). 17 additional records were identified through the citation chaining of records included in the full-text review. After removing 3567 duplicates identified by Covidence and 8 duplicates identified by the reviewers, 7648 studies were screened in the title and abstract stage. Out of these, 7,367 studies were excluded based on relevance and eligibility criteria. Consequently, 281 studies were sought for full-text retrieval and assessed for eligibility. During the eligibility assessment, 275 studies were excluded for various reasons, including Review (1), Protocols (2), Wrong setting (7), Not in English (8), Wrong outcomes (13), No intervention (8), Qualitative study (1), Wrong intervention (171), Wrong study design (30), Wrong population (25), Studies that had already been reviewed in one of our two previous systematic reviews (Greiner et al. 2022; Aust et al. 2024) were therefore excluded to avoid duplication (9). Ultimately, after applying all exclusion criteria, 6 studies met the inclusion criteria and were included in the systematic review. We identified four RCT’s (Falk et al. 2022; Shan et al. 2024; Kiser et al. 2024; Wang et al. 2024) and two non-randomised controlled trials (Cedstrand et al. 2022; Micek et al. 2022).
Fig. 1
The alt text for this image may have been generated using AI.The six included studies covered three sectors: four in healthcare, one in construction, and one in the TICTM setting. Most were conducted in high-income countries (United States and Sweden), with two studies from China, representing middle-income contexts. Notably, three studies (Falk et al. 2022; Kiser et al. 2024; Wang et al. 2024) explicitly referenced the COVID-19 pandemic, either by situating the intervention during remote or crisis conditions or by addressing its psychosocial consequences. Across studies, sample sizes ranged from fewer than 50 to over 300 participants, with follow-up durations varying from 4 weeks to 24 months.
Regarding the healthcare sector, the study by Kiser et al. (2024), implemented an RCT of peer-coaching for U.S. physicians, with 67 participants randomised to the intervention arm, of whom 52 received the coaching intervention, and 71 participants allocated to the control group. The second study is by Micek et al. (2022), a non‑ randomised trial of a remote‑scribe support programme in US primary care with 37 scribe users and 68 as the control. The third study by Shan et al. (2024) is an RCT of Balint group sessions among Chinese hospital nurses with 33 participants in the intervention and 40 in the control group. The fourth study added is by Wang et al. (2024), who implemented an RCT of Psychological First Aid training for Chinese frontline staff (n = 43 in the intervention/46 in the control group).
In the construction sector, we identified one study by Cedstrand et al. (2022) who conducted a non‑ randomised trial of a co‑designed stress‑reduction intervention among Swedish construction workers. The number of participants in the intervention group was 203, and in the control group, 124. In TICTM setting, we also identified one study: Falk et al. (2022) who conducted an RCT comparing sit–stand desks, online well‑being modules, their combination, and control in US university staff. Although the Falk et al. (2022) study primarily measured affective outcomes (positive and negative affect) rather than clinical mental health indicators, it was included because it assessed psychological well-being in a TICTM (telework) setting consistent with the scope of organisational-level mental health promotion. The sample of the intervention arms was desk only (n = 24), programme only (n = 21), and desk + programme (n = 21) versus one control arm with (n = 23). Table 2 provides an overview of each study’s country, design, sample sizes, intervention, and follow‑up timing. Detailed demographics (group sizes, gender, age) appear in Supplement C1, and full intervention protocols (session counts, components, timing) are reported in Supplement C2.
Table 2 Study and sample characteristicsStudy qualityThe methodological quality of the six included studies was appraised using the Quality Assessment Tool for Quantitative Studies (QATQS) developed by the Effective Public Health Practice Project (EPHPP) (Thomas et al. 2004). This tool assesses six domains, which are selection bias, study design, confounders, blinding, data collection methods, and withdrawals/dropouts, each rated as strong, moderate, or weak according to standardised EPHPP criteria. The six domain ratings are synthesised into a global quality rating for each study (strong, moderate, or weak). Among the six studies, one was rated strong (Wang et al. 2024), three moderate (Falk et al. 2022; Kiser et al. 2024; Shan et al. 2024), and two weak (Cedstrand et al. 2022; Micek et al. 2022).
Overall, the quality of evidence ranged from weak to moderate, reflecting common methodological challenges in organisational-level intervention research. Study design emerged as a key differentiator, where randomised controlled trials (RCTs), such as those by Wang et al. (2024) and Kiser et al. (2024), scored higher for methodological rigour and transparent reporting, while the two non-randomised trials (Cedstrand et al. 2022; Micek et al. 2022) were downgraded for selection bias and lack of random allocation, increasing the likelihood of confounding. Blinding represented the weakest domain across studies. As participants and facilitators could not be blinded to group allocation, an inherent limitation in behavioural and organisational research, all studies scored moderate or weak on this criterion.
In contrast, data collection methods were consistently rated strong, as all studies employed validated and reliable psychometric instruments. Handling of withdrawals and drop-outs was inconsistent. Falk et al. (2022) and Wang et al. (2024) provided complete reporting and low attrition (strong), whereas others lacked sufficient detail (moderate). In summary, although methodological quality varied, the overall evidence base can be considered moderate, with the highest rigour observed in RCTs employing validated tools and transparent reporting procedures.
Study outcomesTables 3, 4 summarises each trial’s primary‑outcome effect estimates (MD, OR, or d with 95% CIs and p‑values) and follow‑up timing. Across all sectors, outcomes included burnout, depression, anxiety, perceived stress, and overall well-being. In the healthcare sector, all four studies assessed at least one primary outcome. Kiser et al. (2024); n = 67/71; 6 months) demonstrated significant reductions in overall burnout (− 0.79; p = 0.001) and in the burnout subscale interpersonal disengagement (− 0.94; p = 0.001), a component of the Stanford Professional Fulfilment Index (PFI) that captures emotional detachment from work and colleagues. As secondary outcomes, the study also reported improvements in professional fulfilment (Δ = + 0.59; p = 0.046) and work engagement (Δ = + 0.33; p = 0.04). Micek et al. (2022; n = 37/68; 12 months) reported that remote-scribe support reduced burnout (OR = 0.15; p = 0.02) and lowered perceived EHR-related stress by 1.46 (p = 0.01). As secondary outcomes, the intervention improved ratings of supportive work environment (+ 1.55; p = 0.02) and Joyful Workplace scores (+ 2.83; p = 0.01). Shan et al. (2024; n = 33/40; 3 months) found no significant effects on the primary burnout dimensions of emotional exhaustion and depersonalisation but did observe an increase in the secondary outcome personal accomplishment (+ 9.7; p = 0.003). Wang et al. (2024; n = 43/46; 3 months) reported significant reductions in the primary outcomes: depression (F = 2.87; p = 0.046), and burnout (F = 3.73; p = 0.018) compared with psychoeducation, with no significant changes in anxiety or secondary traumatic stress.
Table 3 Results of the effectiveness of health promotion interventionsTable 4 Results of the effectiveness of mental health promotion interventionsIn the TICTM setting, Falk et al. (2022; n ≈ 19–24/23; 4 weeks) did not assess any of the primary mental health outcomes but measured psychological well-being via affective indicators. The combined sit–stand desk and behavioural-prompt intervention improved positive affect (d = 1.11) and reduced fatigue (d = − 0.65), while the desk-only and module-only arms showed smaller changes. However, the authors noted that the pilot design was not suitable for formal significance testing, and p-values were not reported. Consequently, this study was rated of moderate quality due to unreported statistical testing and potential bias (see Supplementary B, Table B2). In the construction sector, Cedstrand et al. (2022; n = 203/124; 12 & 24 months) found that a co-created organisational intervention limited the 12-month increase in stress (+ 1.4 vs + 6.1; p = 0.015), but effects were not sustained at 24 months.
Across the six trials, considerable heterogeneity was observed in study design, outcome measures, and follow-up length, which precluded meta-analysis. Healthcare interventions, typically randomised and institution-based, showed consistent short-term reductions in burnout, stress, or depression, while the construction and TICTM trials demonstrated more limited or short-lived effects. Follow-up periods ranged from four weeks (Falk et al. 2022) to 24 months (Cedstrand et al. 2022), highlighting sectoral differences in implementation scope and sustainability. Detailed tabulated data and extended synthesis of these sectoral trends are provided in Supplementary Material C3.
Comments (0)