Background:
Working memory impairment is one of the core cognitive phenotypes in bipolar disorder. However, whether the task-based functional magnetic resonance imaging mechanisms underlying working memory processing are consistent between bipolar disorder type I (BD-I) and type II (BD-II) remains unclear. This systematic review summarizes task-based fMRI evidence, with a focus on characterizing the neural correlates of working memory in BD-II and comparing them with BD-I.
Methods:
Following the PRISMA 2020 guidelines, PubMed, Embase, and Web of Science were systematically searched for studies published between January 2011 and June 2025. Eligible studies included task-based fMRI investigations in adults using working memory paradigms that separately reported results for BD-I and BD-II. A total of 22 studies were included for qualitative synthesis.
Results:
Across 22 task-based fMRI studies, the qualitative evidence suggests potentially different patterns of working-memory network engagement in BD-I and BD-II; however, subtype-specific inference is constrained by the very limited BD-II literature, with only three studies including BD-II samples (one BD-II-only study and two direct BD-I vs BD-II comparisons). In BD-I, studies more often reported altered or less efficient recruitment of executive-control regions/networks and reduced task-related suppression of default-mode regions under higher cognitive load and/or emotional distraction. In BD-II (based on sparse evidence), euthymic samples were generally reported to show relatively preserved recruitment and performance, whereas depressive-state or higher-demand conditions were associated with attenuated load-related upregulation. In a single emotional-interference paradigm, BD-II showed stronger inverse DLPFC–amygdala coupling, which may reflect context-specific modulation of fronto-limbic interactions.
Conclusion:
The current task-fMRI literature provides preliminary, hypothesis-generating indications that BD-I and BD-II may not be fully captured by a simple severity-continuum account, but firm subtype-specific conclusions are not yet warranted given the scarcity of BD-II studies and the limited number of direct BD-I/BD-II comparisons. Across studies, BD-I findings have more often been interpreted within neural inefficiency/limited-scalability accounts and reduced task-related DMN suppression, whereas BD-II findings—based on sparse evidence—have been reported as more state- and context-dependent. Larger, harmonized studies with direct BD-I/BD-II comparisons and mood-state stratification are needed to test these provisional patterns.
1 IntroductionBipolar disorder (BD) is a chronic and severe psychiatric condition associated with substantial functional disability and premature mortality, particularly due to elevated suicide risk (1, 2). It remains a leading contributor to global disability, markedly reducing quality of life and long-term socio-occupational attainment (3, 4). Although BD is defined by the occurrence of manic or hypomanic episodes, symptomatic remission does not necessarily translate into functional recovery. This dissociation is partly attributable to cognitive impairment, which is increasingly recognized as a core feature of BD and a stronger predictor of functional outcome than residual mood symptoms (5, 6).
Working memory (WM)—the temporary maintenance and manipulation of information supporting complex cognition (e.g., comprehension, learning, reasoning)—has long been conceptualized as a central cognitive construct (7). WM impairment is frequently reported in BD and may persist even during periods of emotional stability, supporting its role as a core cognitive phenotype of the illness (8, 9). This makes WM a strong target for a network-based synthesis of task-fMRI findings. WM paradigms (e.g., n-back; delayed match-to-sample, DMTS) permit parametric manipulation of cognitive load and can be combined with emotional distraction, offering a principled way to examine affect–cognition interactions relevant to BD (10–12). In addition, WM tasks elicit highly reproducible engagement of large-scale networks, which provides a common mechanistic framework for integrating results across studies and comparing BD subtypes (13). Over the past decades, functional magnetic resonance imaging (fMRI) has provided key insights into the neural substrates of these deficits, repeatedly implicating aberrant activation patterns within the central executive network (CEN) and the default mode network (DMN) (14, 15). Meta-analyses of WM in healthy individuals consistently delineate a robust and reproducible task-positive network. This network encompasses lateral prefrontal and posterior parietal cortices, and further extends to the anterior insula/dorsal anterior cingulate cortex (ACC), as well as the dorsal premotor and pre-supplementary motor areas. Collectively, these findings suggest that efficient WM processing depends on the coordinated recruitment and integration of multiple executive-control and attentional systems, rather than being subserved solely by the canonical dorsolateral prefrontal–posterior parietal circuit (10, 13, 16).
Neurobiologically, successful WM performance depends on coordinated interactions among large-scale intrinsic networks. During task engagement, the brain typically recruits the CEN—anchored in the dorsolateral prefrontal cortex (DLPFC) and posterior parietal cortex (PPC)—while suppressing the DMN, particularly midline hubs such as the medial prefrontal cortex (mPFC) and posterior cingulate cortex (PCC)/precuneus (17). This anti-correlated dynamic is critical, as insufficient DMN suppression may introduce neural interference and contribute to attentional lapses. The CEN–DMN antagonism is further regulated by the salience network (SN), anchored in the anterior insula and dorsal ACC, which has been proposed to detect behaviorally relevant events and facilitate switching between internally oriented DMN activity and externally oriented CEN engagement (18), a control architecture later formalized in the triple-network model linking SN–CEN–DMN interactions to psychopathology (19). Consistent with cognitive control accounts, the CEN’s conceptual origins trace back to Baddeley’s WM model (20) and align with prefrontal “top-down” control theory (21), while network imaging work delineates the CEN as a fronto-parietal lateral control circuit with extensions to frontopolar/vlFPC regions implicated in strategy integration, inferior parietal lobule (IPL) nodes supporting maintenance/buffering, and premotor areas involved in response preparation (19, 22–25). Beyond cortical networks, subcortical circuits including limbic and striatal systems may operate as an “affective–cognitive gate”, filtering distractor signals to protect limited cognitive resources (26). In parallel, the dorsal attention network (DAN) supports sustained top-down attentional orienting under increasing task demands, and cingulo-opercular systems are often linked to stable task-set maintenance across time (27, 28). The DMN tends to operate coherently during rest/internal mentation and typically requires suppression during externally oriented goal-directed tasks, with broader involvement of ventral/rostral anterior cingulate, lateral parietal cortices (adjacent to the angular gyrus), temporal poles, and medial temporal lobe structures (29–32). In addition to these large-scale systems, WM performance can draw on modality-specific support regions; for example, the superior temporal gyrus (STG) has been linked to phonological rehearsal processes (33). In Bipolar I Disorder (BD-I), converging evidence suggests disruptions in both executive recruitment and task-related DMN suppression (34, 35).
Importantly, the neurobiological profile of BD-I appears heterogeneous. While DLPFC hypoactivation is commonly reported, task-based studies have also described paradoxical hyperactivation in executive regions, potentially reflecting inefficient compensatory recruitment under specific cognitive loads, state effects, or illness-related moderators. This intra-subtype variability has been interpreted within a “neural inefficiency” framework, in which prefrontal recruitment may show a non-linear (inverted U-shaped) relationship with task demand—relatively greater activation at lower loads, followed by reduced activation when cognitive demands exceed available capacity (36–39).
Despite these advances, a major limitation in the literature is the persistent tendency to treat BD as a homogeneous entity, potentially obscuring subtype-specific mechanisms. Neuroimaging models have historically been derived predominantly from BD-I cohorts, whereas Bipolar II Disorder (BD-II) has been underrepresented or conflated with BD-I in mixed-sample designs (40, 41). This grouping approach often assumes BD-II is a milder form; however, clinical evidence indicates BD-II confers a substantial and distinct burden, characterized by more frequent and persistent depressive episodes, higher rates of mixed features and rapid cycling, and marked functional impairment, with suicide risk comparable to—or in some reports exceeding—that of BD-I (42–44). BD-II is also more prevalent among women and frequently co-occurs with anxiety disorders, personality pathology, and multiple somatic comorbidities, further underscoring its clinical distinctiveness (44). Consequently, it remains unclear whether WM deficits in BD-II reflect the same neural dysfunctions described in BD-I or instead follow a distinct pathophysiological trajectory linked to its predominant depressive polarity (45). Direct neuroimaging comparisons between BD-I and BD-II remain sparse and yield competing interpretations. Some evidence supports a severity-continuum account, suggesting BD-II shows an intermediate pattern of prefrontal dysfunction between healthy controls and BD-I (38). In contrast, other studies point to neural dissociation, with relatively preserved DLPFC activation but selective alterations in frontopolar and parietal regions in BD-II, implying a distinct mechanistic profile (46). Moreover, while reduced task-related DMN suppression is often highlighted as a hallmark in BD-I (47), it remains uncertain whether this biomarker generalizes to BD-II. Crucially, it is also unknown whether BD-II exhibits the same vulnerability to emotional interference as BD-I, or if it retains a distinct capacity for top-down regulation when facing affective distractors.
WM dysfunction is not unique to BD and has also been widely reported in major depressive disorder (MDD). Behavioural evidence from a systematic review and meta-analysis of the n-back task suggests that individuals with unipolar depression show reduced accuracy at higher memory loads (1–3-back) and prolonged response times across loads, consistent with load-sensitive executive WM impairment (48). Task-based neuroimaging meta-analyses further implicate altered recruitment of cognitive control circuitry (including anterior insula and rostral ACC) during cognitive–emotional challenges in MDD (49). Focusing specifically on WM tasks, a coordinate-based meta-analysis reported abnormal activation patterns in both task-positive control regions and default-mode regions in MDD, including disorder-specific reductions in middle frontal gyrus engagement (50). In addition, reduced suppression of the DMN during WM has been observed in remitted MDD and linked to rumination (51). Given the predominantly depressive polarity of BD-II, this MDD literature provides a clinically relevant comparator for interpreting whether WM-related findings in BD-II.
Given these inconsistencies, this systematic review synthesizes task-based fMRI studies that report BD-I and BD-II findings separately to describe WM-related neural patterns in BD-II and to assess whether the qualitative evidence more strongly aligns with subtype differentiation or a severity-continuum account. To concurrently capture the effects of both cognitive load and emotional context, we have organized the review according to experimental paradigms: standard n-back tasks and their emotionally salient variants (emotional n-back), are analyzed together, supplemented by examination of the DMTS paradigm where inferences about specific encoding and maintenance phases can be drawn. Across all paradigms, our focus extends beyond the classical balance between the CEN and the DMN to also encompass the broader control architecture responsible for coordinating neural recruitment, suppression, and gating—including networks such as the SN, DAN, and prefrontal-limbic coupling.
2 Methods2.1 Systematic review frameworkThis systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines (52). The study protocol was designed to synthesize task-based fMRI evidence on WM and to compare WM-related neural mechanisms between BD-I and BD-II disorders. The review protocol was developed a priori but was not registered in an external registry (e.g., PROSPERO). We acknowledge that the absence of prospective registration may reduce transparency and increase the risk of selective reporting. To enhance reproducibility, complete database-specific search strategies, exact search dates, and database-level retrieval counts are provided in the Supplementary Material (Supplementary Table 1).
2.1.1 Search strategy and data sourcesTo structure the search, we applied a PICO (53) framework adapted for cross-sectional neuroimaging studies, operationalizing the “Intervention” component as an “Indicator” (WM task paradigms), consistent with prior fMRI systematic reviews (54).
We searched PubMed, Embase, and Web of Science from January 1, 2011 to June 30, 2025, combining controlled vocabulary (e.g., MeSH/Emtree terms) with free-text Title/Abstract terms. We restricted eligibility to studies published from January 2011 onward because the primary aim of this review was to examine subtype-related (BD-I vs BD-II) differences in task-based fMRI correlates of WM. In earlier task-fMRI literature, bipolar samples were frequently reported as a single group without consistent subtype stratification or sufficient subtype-specific results to support our subtype-focused synthesis. Accordingly, the 2011 start date was selected a priori as a pragmatic boundary to prioritize studies with more complete diagnostic characterization and reporting relevant to the current review question. Search terms were organized into four concept domains: Participants (BD-I/BD-II), Indicator (WM paradigms such as n-back, Sternberg/DMTS, spatial/verbal WM), Comparator (healthy controls), and Outcomes (task-based fMRI/BOLD activation or connectivity). Searches were restricted to English-language studies in adults (≥18 years), and reference lists of relevant articles were manually screened. The complete database-specific search strategies (as executed), exact search dates (day/month/year), and database-level retrieval counts prior to deduplication are provided in the Supplementary Material (Supplementary Table 1). Searches were last run on 12/10/2025 (12 October 2025) and yielded 146 records from PubMed, 126 from Embase, and 194 from Web of Science (prior to deduplication).
2.1.2 Subtype-specific eligibility criteriaRecords retrieved from all databases were exported and de-duplicated prior to screening. Two reviewers independently screened titles and abstracts, followed by full-text assessment against eligibility criteria. The review process was designed to be fully consensus-based: any disagreements were resolved through discussion between the two reviewers. In the few instances where consensus could not be reached, a third senior reviewer was consulted for a final decision. Given this consensus-oriented workflow, a formal inter-rater reliability statistic (e.g., Cohen’s κ) was not calculated; however, the low number of disputes requiring third-party adjudication serves as a qualitative indicator of high initial agreement between the primary reviewers.The selection process is summarized using a PRISMA 2020 flow diagram.
Studies were included if they: (a) recruited adult participants (≥18 years) with a confirmed diagnosis of BD-I and/or BD-II according to DSM or ICD criteria; (b) employed task-based fMRI to measure BOLD signal changes during WM performance; and (c) reported neuroimaging results for BD-I and BD-II separately, or provided a direct head-to-head comparison. Studies that included mixed BD samples were eligible only if subtype-specific neuroimaging outcomes could be extracted or were reported explicitly.
Exclusion criteria were: (a) non-original research (reviews, meta-analyses, editorials, case reports); (b) studies in which BD subtypes were conflated, unspecified, or not separable; (c) pediatric samples or samples with major neurological comorbidity (or history of severe head trauma); and (d) paradigms primarily indexing response inhibition or sustained attention (e.g., Stroop, Go/No-Go) rather than WM operations, to maintain focus on executive components of WM. Emotion-interference WM variants (e.g., emotional face distractors during n-back) were eligible and were synthesized alongside standard n-back tasks to reflect affect–cognition interaction under WM load.
2.1.3 Formal search stringThe final optimized search string was as follows:
(“Bipolar Disorder”[MeSH] OR “bipolar i”[tiab] OR “bipolar ii”[tiab] OR “bipolar 1”[tiab] OR “bipolar 2”[tiab] OR “bd-i”[tiab] OR “bd-ii”[tiab] OR “bipolar type i”[tiab] OR “bipolar type ii”[tiab] OR “bipolar I disorder”[tiab] OR “bipolar II disorder”[tiab] OR “manic depression”[tiab] OR “manic-depressive illness”[tiab]) AND (“Memory, Short-Term”[MeSH] OR “Working Memory”[tiab] OR “WM”[tiab] OR “n-back”[tiab] OR “n back”[tiab] OR “nback”[tiab] OR “Sternberg”[tiab] OR “delayed matching to sample”[tiab] OR “DMTS”[tiab] OR “delayed response task”[tiab] OR “spatial working memory”[tiab] OR “verbal working memory”[tiab]) AND (“Functional Neuroimaging”[MeSH] OR “Magnetic Resonance Imaging”[MeSH] OR “fMRI”[tiab] OR “functional MRI”[tiab] OR “task-based fMRI”[tiab] OR “task fMRI”[tiab] OR “functional magnetic resonance”[tiab] OR “BOLD”[tiab] OR “brain activation”[tiab] OR “neural activation”[tiab]).
2.2 Data extraction and quality assessmentData from eligible studies were independently extracted by two researchers using a standardized form. Extracted variables included study characteristics (author, year), sample demographics (sample size, age, gender, mood state), task parameters (paradigm, load), and neuroimaging findings (anatomical regions and direction of activation/connectivity changes). Discrepancies were resolved by discussion and when necessary, consultation with a third reviewer. All extracted data were cross-checked for completeness and accuracy prior to synthesis.
Study quality was evaluated using a dual-criteria framework capturing both general methodological rigor and fMRI-specific technical validity. General study quality was assessed with the Newcastle–Ottawa Scale (NOS) (55), applying the case–control NOS to cross-sectional case–control designs and the cohort NOS to the longitudinal cohort study. fMRI-specific methodological rigor was assessed using a checklist adapted from established fMRI reporting recommendations (56) and neuroimaging best practice considerations (57) (Table 1, Supplementary Table 3). The checklist covered five domains: (1) multiple-comparisons correction, (2) head-motion control, (3) sample size and eligibility reporting (transparency), (4) in-scanner task performance reporting, and (5) task engagement/manipulation-check evidence (neural and/or behavioral). Each item was scored binarily (0/1; total 0–5), with “not reported/unclear” scored as 0. Studies were rated High Quality if they met both thresholds (NOS ≥ 7 stars and fMRI technical score ≥ 4), Moderate Quality if they met only one threshold, and Low Quality if they met neither. In addition, studies reporting whole-brain results exclusively at uncorrected thresholds without any multiple-comparisons correction and without a clearly justified corrected ROI/SVC alternative were classified as Low Quality regardless of NOS score (57). Corrected whole-brain inference included voxel-wise FWE/FDR, permutation-based correction, TFCE, or clearly reported cluster-level correction; corrected ROI/SVC approaches were considered acceptable when ROIs were a priori/independently defined and the correction procedure was clearly described.
CategoryItemCriteria for 1 pointCriteria for 0 pointsRationaleStatistical AnalysisMultiple Comparisons CorrectionWhole-brain inference is corrected (voxel-wise FWE/FDR; permutation/TFCE; cluster-level with reported cluster-forming threshold + corrected cluster significance). ROI/SVC acceptable only with a priori/independent ROI and clearly described correction.Whole-brain uncorrected thresholds OR method not stated/unclear.Controls false positives; improves interpretability and reproducibility.Data QualityHead Motion ControlMotion handled with reportable criteria/QC (exclusion thresholds and exclusions; nuisance regressors + QC summary; scrubbing/advanced denoising with criteria).No motion QC/procedures, or vague “motion corrected” only.Motion confounds activation/connectivity; group/state differences can bias clinical findings.Study DesignSample Size and Eligibility Reporting (Transparency)The final analyzed sample size (N) is clearly reported by group and/or timepoint/condition for the primary fMRI analyses and the study provides clear eligibility/inclusion–exclusion criteria for participant selection. If the manuscript explicitly indicates that participants were excluded after acquisition (e.g., due to motion, technical problems, or task non-compliance), the number excluded and reasons (or an explicit statement that no participants were excluded after acquisition) must be reported.The final analyzed N is unclear for the primary fMRI analyses and/or eligibility/inclusion–exclusion criteria are insufficiently described; or Post-acquisition exclusions are explicitly mentioned but the number excluded and reasons (or a clear statement of no exclusions) are not reported.Small samples lower power/precision and reduce replicability.Behavioral DataTask Performance VerificationReports in-scanner accuracy/RT with sufficient detail; between-group/state comparability tested or controlled.Behavioral data missing/insufficient; comparability unclear.Without performance data, neural differences may reflect engagement/effort differences.ValidityTask Engagement/Manipulation Check EvidenceReports the primary within-task manipulation (e.g., high-load > low-load, 2-back > 0-back, parametric load) with sufficient detail (figure/table/peak coordinates or explicit text) in controls and/or the full sample; OR; If only group-difference results are presented, the study provides a clear manipulation check showing that participants performed the task as expected (e.g., appropriate behavioral summaries across load) and explicitly defines the task contrast(s) used for inference.No primary task manipulation effect is shown/describable and no clear manipulation-check evidence is provided (cannot judge whether WM was effectively engaged)Supports construct validity that the paradigm engaged WM circuitry.fMRI-specific methodological checklist.
FDR, False Discovery Rate; fMRI, Functional Magnetic Resonance Imaging; FWE, Family-Wise Error; N, Sample Size (Number of participants); QC, Quality Control; ROI, Region of Interest; RT, Reaction Time; SVC, Small Volume Correction; TFCE, Threshold-Free Cluster Enhancement; WM, Working Memory.
2.3 Data synthesisGiven the limited number of direct BD-I vs BD-II comparisons and substantial heterogeneity in paradigms, contrasts, and analytic approaches (e.g., varying n-back loads, ROI vs whole-brain strategies, and connectivity methods), a qualitative synthesis was conducted rather than a quantitative meta-analysis. Evidence was first organized by paradigm, grouping standard n-back studies together with emotion-interference variants (EFNBACK), and synthesizing DMTS studies separately to enable phase-specific inference (encoding vs maintenance) where available. Within each paradigm, findings were mapped onto large-scale network frameworks, focusing primarily on CEN recruitment and task-related DMN suppression while also extracting results relevant to salience/attention systems and fronto-limbic or subcortical modulatory circuits when reported, to support subtype-level mechanistic comparison.
3 Results3.1 Literature search and selection processThe systematic search of PubMed, Web of Science, and Embase yielded 466 records. After removing 115 duplicates, 351 unique records remained for title/abstract screening. Of these, 276 were excluded because they did not meet the eligibility criteria. Seventy-five full-text articles were retrieved and assessed, and 54 were excluded at the full-text stage. The primary reason for exclusion was failure to specify bipolar subtypes (n = 19), underscoring the frequent conflation of BD subtypes in the literature. Additional reasons for exclusion were the use of non–WM paradigms (n = 20), absence of a healthy control group (n = 6), insufficient or non-extractable data for the outcomes of interest (n = 5), non–task-based fMRI designs (n = 2), significant comorbidity (n = 1), and case reports (n = 1). Reasons were recorded as the primary reason for exclusion for each study. Ultimately, 21 studies met inclusion criteria and were included in the qualitative synthesis, with one additional eligible study identified through reference-list screening, resulting in a total of 22 included studies. The selection process is summarized in the PRISMA flow diagram (Figure 1).

PRISMA flowchart of data searching and selection.
3.2 Characteristics of included studiesSupplementary Table 2 summarizes the characteristics and main findings of the included studies. The final selection consisted of 22 task-based fMRI studies published between 2011 and 2025. After accounting for sample overlap across three reports (59–61) derived from the same cohort, the pooled non-overlapping sample comprised 580 participants with BD and 644 healthy controls (HC). Among participants with BD, 529 had bipolar I disorder (BD-I) and 51 had bipolar II disorder (BD-II). With respect to subtype coverage, the literature predominantly included BD-I samples (n = 19); only one study focused exclusively on BD-II (46), and two studies provided direct head-to-head comparisons between BD-I and BD-II (38, 58).
Regarding clinical status at the time of scanning, 14 studies recruited euthymic participants (36, 38, 47, 58–63, 67, 69–72), four studies assessed participants during depressive episodes (12, 46, 65, 66) and three studies recruited manic participants (35, 64, 67). Mood-state categories were not mutually exclusive because one study enrolled two (67), and two studies did not map cleanly onto a single mood episode (68, 73). Concerning experimental paradigms, n-back tasks were used in 18 studies (12, 35, 36, 38, 46, 47, 58–69) (15 standard n-back; 3 EFNBACK), and the DMTS task was used in four studies (70–73). Additionally, six studies (58, 64, 68–70, 72) assessed functional or effective connectivity, and one study (66) applied graph-theory metrics to characterize network topology. Notably, BD-II evidence was scarce relative to BD-I, constraining subtype-specific inference. Figure 2 provides a hierarchical overview of the included task-fMRI studies, organized by working-memory paradigm, primary contrasts/analysis types, and study-level sample characteristics (mood state, subtype coverage, and medication status).

Schematic overview of included task-fMRI studies by paradigm, contrast, and sample characteristics.
3.3 Behavioral performanceBehavioral outcomes were primarily reported as accuracy and reaction time (RT) during WM performance. Across the 22 included studies, 10 reported no significant BD–HC differences in accuracy and/or RT, 11 reported significant BD-related behavioral differences on at least one index/condition (e.g., accuracy, RT, or d′), and one did not provide extractable group-contrast behavioral results.
3.3.1 N-back/EFNBACKAmong N-back studies, Nine studies reported no statistically significant group differences in accuracy/RT (12, 36, 38, 46, 59–61, 64, 69). In contrast, five studies reported statistically significant group differences that were condition-specific: two studies reported load-sensitive reductions in accuracy and/or RT slowing at higher demand levels (66, 68), and three studies reported significantly lower performance across both 1-back and 2-back conditions (35, 63, 65). In addition, subgroup- or state-specific effects were reported in two studies. One study found that manic BD-I participants showed statistically significant differences in accuracy and/or RT compared to HC (67). The other study (47) stratified euthymic BD-I participants into cognitively impaired (CI) and cognitively preserved (CP) subgroups using RBMT (Rivermead Behavioural Memory Test) and BADS (Behavioural Assessment of the Dysexecutive Syndrome) normative 5th-percentile cutoffs (CI: RBMT screening score ≤7 and/or BADS profile score ≤11; CP: RBMT ≥8 and BADS ≥12). In that study, significant behavioral group differences from HC were reported for the CI subgroup in the 1-back and 2-back conditions, whereas the CP subgroup did not show significant differences. Only three studies included BD-II (one BD-II-only study and two BD-I vs. BD-II comparisons) (38, 46, 58). Across these studies, BD-II showed no statistically significant differences from HC on standard n-back tasks, and one direct comparison reported no statistically significant BD-I vs. BD-II behavioral differences (38). In an EFNBACK study, BD-I—but not BD-II—showed statistically significant RT slowing under emotional distraction at 2-back (58).
3.3.2 DMTSIn DMTS paradigms (n=4), one study reported no significant behavioral differences (72). Other studies reported either difficulty-dependent reductions in accuracy (70), slower RT without accuracy differences (73), or overall lower WM accuracy in BD-I relative to HC (71).
3.4 Functional fMRI resultsAcross the 22 included task-based fMRI studies, task-evoked group differences during WM paradigms were most frequently reported in lateral prefrontal and parietal cortices, with reported effects varying across contrast specification, cognitive load, and clinical state.
3.4.1 N-backIn euthymic BD-I cohorts, two studies reported increased activation in right lateral prefrontal regions—such as the middle frontal gyrus/DLPFC and ventrolateral/frontopolar (vlFPC/frontopolar) areas—relative to healthy controls under specific task contrasts (38, 64). Three euthymic BD-I studies reported reduced activation in bilateral middle frontal regions together with increased activation in temporal cortex and/or ACC (59, 61, 62). One euthymic study reported no significant BD-I vs. HC activation differences for its primary contrast (60). One study reported lower right DLPFC activation in a BD-I CI subgroup compared with a CP subgroup (47). In mood-episode samples, two studies reported reduced lateral prefrontal and parietal activation in manic BD-I for 2-back contrasts (35, 67), and one study reported reduced activation in lateral prefrontal and cerebellar regions in depressed BD-I (65). One longitudinal study reported state-related differences between mania and euthymia within BD-I (67). For midline regions reported in terms of deactivation, two studies reported reduced deactivation in medial prefrontal/orbitofrontal and anterior cingulate regions in euthymic BD-I (47, 63), two studies reported reduced deactivation in manic BD-I (35, 64), and one study reported reduced deactivation in depressed BD-I (65). Posterior midline involvement (e.g., PCC and/or precuneus) was reported in two studies during n-back contrasts (65, 68). Studies also reported connectivity and network-level differences during n-back tasks. Within euthymic BD-I, one study reported reduced coordination across DLPFC–parietal–ACC pathways (61). Effective connectivity differences between mPFC and PCC (including differences in direction/sign) were reported in one study with a mixed/unclear mood-state sample (68). One graph-theory study reported group differences in degree centrality across prefrontal/midline and posterior regions with load-dependent patterns involving parietal nodes (66). Regarding BD subtype, one head-to-head comparison reported that BD-II showed an intermediate right lateral prefrontal activation profile that was not statistically different from either BD-I or HC (38). One BD-II-only study (depressed, unmedicated BD-II) reported reduced parametric load-related recruitment across frontal, parietal, temporal/angular, and posterior regions relative to controls (46).
3.4.2 EFNBACKIn EFNBACK paradigms, two studies reported condition-specific effects involving DLPFC/middle frontal regions together with amygdala and striatal responses (58, 69), and one study reported increased putamen activation to face conditions during 2-back in BD-I relative to controls (12). Under emotional interference, one study reported that during fear-related distraction, BD-II showed significantly stronger negative DLPFC–amygdala functional connectivity than both BD-I and HC, while BD-I did not differ significantly from HC (58). One study reported valence-dependent cingulate-to-amygdala effective connectivity differences (69).
3.4.3 DMTSWithin DMTS paradigms, studies reported phase-specific group effects. During the encoding phase, 2 studies reported reduced recruitment of lateral prefrontal regions
Comments (0)