Congenital adrenal hyperplasia (CAH) due to 21-hydroxylase deficiency is one of the most technically challenging monogenic conditions for molecular diagnosis, owing to the complex genomic architecture of the CYP21A2 locus and the extensive homology between the functional gene and its pseudogene. Over the past decades, diagnostic approaches have evolved from locus-specific assays, such as allele-specific polymerase chain reaction, to the combined use of Sanger sequencing and multiplex ligation-dependent probe amplification (MLPA), which together constituted the historical gold-standard approach strategy capable of identifying pathogenic variants, deletions and chimeric alleles. Although short-read next-generation sequencing expanded variant detection and enabled simultaneous analysis of multiple adrenal genes, its performance remains limited within the RCCX region due to misalignment artefacts and inability to resolve complex structural rearrangements. Recently, long-read sequencing (LRS) has emerged as a single-platform technology capable of resolving the CYP21A2–CYP21A1P module, accurately detecting all variant classes and directly determining cis/trans phase. Comparative studies demonstrated complete concordance with standard methods while revealing additional rearrangements previously undetectable, positioning LRS as a future reference method for high-resolution genotyping. These advances extend beyond diagnostic refinement to impact population-level strategies, where integration of molecular testing into newborn screening algorithms reduces false positives, accelerates referral pathways, and enhances salt-wasting risk prediction. In this review, we summarized the historical development, technical limitations, and clinical implications of current and emerging molecular approaches for CAH diagnosis. We highlight how LRS and integrative analytical tools are reshaping clinical practice, refining genetic counselling, and guiding personalized therapeutic strategies. Collectively, these innovations represent a decisive step toward precision Endocrinology.
IntroductionCongenital adrenal hyperplasia (CAH) encompasses a group of autosomal recessive disorders affecting adrenal steroidogenesis, resulting in impaired synthesis of cortisol and aldosterone. Approximately 90–95% of all CAH cases are caused by 21-hydroxylase deficiency (21-OHD), due to pathogenic variants in the CYP21A2 gene (1). Loss of 21-hydroxylase activity impairs the conversion of progesterone and 17-hydroxyprogesterone (17-OHP) into downstream metabolites, leading to cortisol and aldosterone deficiency, excessive adrenocorticotropic hormone (ACTH) stimulation, adrenal hyperplasia, accumulation of androgen precursors and excess of androgen production, which explains the virilization characteristic of the disorder (1).
The first detailed description of CAH dates back to 1865, documenting the life and post-mortem findings of Giuseppe Marzo, a phenotypic male who possessed internal female reproductive organs and markedly enlarged adrenal glands who died at the age of 44 during an apparent Addisonian crisis (2). In 1950, Wilkins and colleagues (3) introduced cortisone therapy for classical forms, revolutionizing the disease’s natural course. Years later, Bongiovanni and Root (4) identified 21-OHD as the biochemical cause of CAH, linking the disorder to a specific enzymatic defect in the steroidogenic pathway.
The 21-OHD exhibits a broad phenotypic spectrum, determined by the degree of residual enzymatic activity. Affecting 1:13,000 to 1:15,000 live births, the classical forms (salt-wasting-SW and simple-virilizing-SV) are the most severe, presenting with prenatal androgen excess leading to virilized external genitalia in 46, XX newborns (1). The SW form, accounting for 75-90% of the classical cases, additionally causes life-threatening aldosterone deficiency, resulting in salt-wasting crises, representing the most severe phenotype of 21-OHD. Without treatment, both classical forms may cause postnatal progressive virilization, advanced bone age, and early pubarche. In contrast, the nonclassical (NC) form is milder, with normal genitalia at birth and later-onset hyperandrogenic features (5).
Clinical assessment and hormonal profiling are essential for diagnosing 21-OHD, with serum 17-OHP levels serving as the key biochemical marker of the disease. Due to the risk of lethal salt-wasting crisis, 17-OHP measurement is included in many standard newborn screening programs worldwide (6). In classic forms, 17-OHP levels are markedly elevated, particularly in SW cases; however, its concentration alone may be insufficient to distinguish between the SW and SV subtypes (7). In NC forms, 17-OHP may be only mildly elevated, often requiring ACTH stimulation for confirmation (8). Given biochemical limitations in borderline results, and atypical presentations, molecular testing plays an increasingly important role in confirming the diagnosis, predicting disease severity and guiding genetic counseling.
It was not until the 1980s, three decades after the introduction of cortisone therapy, that the cloning and characterization of the CYP21A2 gene marked a turning point in the understanding of 21-OHD. This milestone revealed the presence of a highly homologous pseudogene, CYP21A1P, located within the complex RCCX module, an insight that helped explain the high frequency of recombination events underlying the disorder (9, 10). These early molecular discoveries laid the groundwork for subsequent advances in genetic analysis, which would later refine diagnostic accuracy and deepen our understanding of disease heterogeneity.
Over decades, molecular diagnosis evolved from labor-intensive, low-resolution techniques to increasingly comprehensive approaches. While Sanger sequencing combined with multiplex ligation-dependent probe amplification (MLPA) constituted the classical standard, limitations remained in resolving complex gene rearrangements and allelic phase. Short-read Next-Generation Sequencing (NGS) expanded variant detection but continued to face interference from the pseudogene. Most recently, long-read sequencing has emerged as a transformative single-assay approach, providing near-complete CYP21A2 characterization, including haplotyping, structural variant detection, and cis/trans phasing (11, 12).
As genetic insights increasingly shape diagnostic accuracy and therapeutic decision-making, a comprehensive overview of the molecular foundations of the disorder becomes essential. This review synthesizes current knowledge on the genetic architecture of the CYP21 locus and the evolution of its molecular diagnostic methodologies, highlighting future perspectives toward faster, more accurate, and clinically applicable diagnosis of CAH. Variant nomenclature follows Human Genome Variation Society (HGVS) recommendations and is based on the CYP21A2 reference sequence NM_000500.9 (protein: NP_000491.4). Legacy names are provided where relevant to align with earlier literature.
Genetic background of the CYP21A2 gene and CAH due 21-OH deficiencyThe CYP21A2 gene is located within the RCCX module on chromosome 6p21.3, in class III region of the major histocompatibility complex (MHC). This region spans approximately 30 kb and is characterized by a complex arrangement of tandemly repeated genes, namely C4A/C4B, CYP21A1P/CYP21A2, TNXA/TNXB (Tenascin X), and STK19/STK19B (Serine/Threonin Kinase – formerly RP1/RP2) (13–15). The functional gene CYP21A2, located adjacent to C4B, extends over approximately 3.4 kb and encodes a cytochrome P450 enzyme comprising 495 amino acids, whereas CYP21A1P, located adjacent to C4A, is a pseudogene containing multiple inactivating pathogenic variants that preclude proper protein expression (9, 10) (Figure 1). Both genes contain 10 exons and share remarkably high sequence identity, approximately 98% identity in exons and 96% in introns (9, 10).

Locus RCCX: Schematic representation of the RCCX locus within the MHC region on chromosome 6p21.3, showing the modular arrangement RP–C4–CYP21A1P/CYP21A2–TNX. The functional CYP21A2 gene resides adjacent to its highly homologous pseudogene CYP21A1P, while flanking TNXA/TNXB genes predispose this region to non-allelic homologous recombination, leading to gene conversions, deletions, and chimeric alleles associated with congenital adrenal hyperplasia. Major CYP21A2 pathogenic variants derived from the CYP21A1P pseudogene across the exonic structure of CYP21A2: Schematic representation of the ten exons of the CYP21A2 gene indicating the location of the most frequent pseudogene-derived pathogenic variants associated with 21-hydroxylase deficiency, including p.(Pro31Leu), p.(Gly111fs), c.293-13C>G, p.(Ile173Asn), exon 6 cluster (p.[Ile237Asn], p.[Val238Glu], p.[Met240Lys]), p.(Val282Leu), p.(Leu308fs), p.(Gln319Ter), p.(Arg357Trp), and p.(Pro454Ser). These variants arise predominantly through microconversion events from CYP21A1P and represent the majority of disease-causing alleles worldwide. Adapted from Concolino et al (16). This schematic representation is not drawn to scale. Exon sizes, intergenic distances, and relative gene lengths are depicted for illustrative purposes only. Arrows indicate regions of gene conversion or recombination events within the RCCX locus and do not represent physical distances.
The CYP21A1P pseudogene is inactive because it harbors numerous deleterious variants across its promoter and coding regions, including seven missense substitutions, an exon 3 frameshift, the intron 2 splice change c.293-13C>G, and truncating or frameshift variants in exons 7 and 8. The three exon 6 substitutions [p.(Ile237Asn), p.(Val238Glu), p.(Met240Lys)] constitute the classic “E6 cluster,” while promoter single nucleotide variations (SNVs) further reduce transcription to ~20% of normal. Together, these genetic changes abolish enzymatic activity and render CYP21A1P the primary donor sequence in gene conversion events affecting CYP21A2 (16). The abovementioned high degree of homology and tandem configuration make this region particularly prone to meiotic misalignment, which fosters recurrent recombination through unequal crossover and gene conversion. These processes constitute the predominant mechanisms underlying pathogenic variants in 21-OHD (16–19).
Overall, 90–95% of pathogenic CYP21A2 variants arise from intergenic recombination between the active gene and its pseudogene within the RCCX module. As a result, a limited set of pathogenic variants with well-established phenotypic consequences is consistently observed across all populations (Tables 1, 2). Among these, microconversion events, in which one or more deleterious pseudogene-derived variants are transferred into CYP21A2 during meiosis, represent 70–75% of all cases (5, 16, 19, 20). The remaining 20–25% are attributable to unequal meiotic crossovers, giving rise to extensive genomic rearrangements, including large deletions, duplications, and the formation of chimeric CYP21A1P/CYP21A2 genes. In contrast, only 5-10% of small pathogenic variants in CYP21A2 occur independently of gene conversion events; some of these have founder effects, providing valuable insights into population history and migration patterns (5, 16, 19–22).
Pathogenic variantLocationEnzymatic activity (%)*Associated phenotypeGroupPresence in pseudogeneWhole-Gene deletionEntire gene0SWNull—Large CYP21A2 rearrangements**Exons 3-80SWNull—p.(Pro31Leu)Exon 130-60NCCYesc.293-13C>GIntron 2<2SW/SVAYes8-bp deletion***Exon 30SWNullYesp.(Ile173Asn)Exon 43-7SVBYesE6 Cluster****Exon 60SWNullYesp.(Val282Leu)Exon 718NCCYesp.(Leu308Phefs*21)Exon 70SWNullYesp.(Gln319Ter)Exon 80SWNullYesp.(Arg357Trp)Exon 82SW/SVAYesp.(Pro454Ser)Exon 1020-66NCCYesDescription, genomic location, and residual enzymatic activity of the most frequent CYP21A2 pathogenic variants, including large gene rearrangements (whole-gene deletions and gene conversions) and recurrent point mutations, reported in different populations.
*The effect of each variant on enzymatic activity is listed as percent of wild-type activity when mutant cDNA is transfected into cultured cells and assayed in intact cells. Enzymatic activity values are reported as ranges, reflecting data derived from different experimental and cohorts’ studies. Because these estimates originate from heterogeneous in vitro expression models and assay conditions, direct quantitative comparisons across variants should be interpreted with caution. **CYP21A2 large gene conversion refer to large structural rearrangements involving transfer of pseudogene-derived sequences extending from somewhere between exons 3 and 8 of CYP21A1P to the corresponding point in CYP21A2, yielding a non-functional gene in which the 5’-end corresponds to CYP21A1P and the 3’-end corresponds to CYP21A2, leading to complete loss of 21-hydroxylase activity, and are therefore classified as null alleles. Those are called gene chimeras. Whereas complete deletions generally affect the entire CYP21A2 gene ***8-bp deletion = c.332_339del, ****Exon 6 cluster = p.[Ile237Asn; Val238Glu; Met240Lys] Adapted from Speiser 1992, White 1994 and Concolino 2025 (16, 48, 50).
Country referenceAleles (n)*Del/Conv (%)p.(Pro31Leu) (%)c.293-13C>G (%)c.332_339del (%)p.(Ile173Asn) (%)E6 cluster** (%)p.(Val282Leu) (%)p.L308fs (%)p.(Gln319Ter) (%)p.(Arg357Trp) (%)p.(Pro454Ser) (%)Variants derived from the pseudogene (%)Sweden (31)18629.88.530.02.120.30.07.00.53.23.6NR100Mexico (32)941.0NR47.91.011.7NR8.51.64.22.4NR94.5Chile (71, 72)16419.51.615.81.67.31.12.40.07.94.52.164.4Spain (94)35419.22.617.53.33.71.033.91.49.39.70.593.9Central Europe (30)86430.6NR31.2NR14.5NR3.4NR2.6NRNR92Portugal (40)11225.91.89.82.79.81.825.94.46.31.80.388.4USA (26)36431.90.823.4NR12.61.112.60.33.33.60.589.6Argentina (27)86611.20.720.60.88.22.026.2NR6.74.21.482Argentina (102)12036.10.912.2NR8.60.742.60.44.42.22.894Germany (47)31027.42.630.31.219.72.12.90.34.84.30.798.1USA (28)300520.03.722.92.18.20.323.91.13.57.4NR88.9Brazil (22)8569.00.621.11.87.51.226.62.26.15.41.482.9China (29)310271.9291.012.91.94.22.34.85.5NR90.6Frequency of CYP21A2 mutations across countries.
*Alleles (n): Number of alleles analyzed **Exon 6 cluster = p.[Ile237Asn; Val238Glu; Met240 NR: not reported.
Legacy names: p.(Pro31Leu) – P31L; c.293-13C>G – I2 splice; p.(Ile173Asn) – I172N; p.(Val282Leu) – V281L; p.(Gln319Ter) – Q318X; p.(Arg357Trp) – R356W; p.(Pro454Ser) – P454S.
Population-based genotyping studies have substantially advanced the molecular understanding of 21-OHD by delineating allele frequencies and genotype-phenotype correlations across diverse populations. To date, over 300 pathogenic variants have been identified in CYP21A2, including point pathogenic variants, small insertions/deletions, and large structural rearrangements (23). Across global cohorts, the c.293-13C>G (I2 splice) variant consistently emerges as the most prevalent pathogenic allele, accounting for approximately 20–25% of reported genotypes (24). This is followed in frequency by p.(Ile173Asn) (15–20%), p.(Val282Leu) (10-15%), p.(Gln319Ter) (5-10%), and p.(Pro31Leu) (3-5%), although notable regional variation persists (22, 25–29) (Table 2). Despite substantial geographic variability in individual pathogenic variants frequencies, pseudogene-derives variants consistently account for most disease-causing alleles across populations. In European and North American population, where long-standing newborn screening programs facilitate early detection, large gene deletions and CYP21A1P/CYP21A2 chimeric conversions, predominantly associated with SW phenotypes, constitute 20-30% of pathogenic alleles (25, 28, 30, 31). Conversely, studies from Latin American countries such as Brazil, Mexico, and Argentina, which implemented systematic newborn screening only more recently, report lower frequencies (~8-12%) of large rearrangements (22, 27, 32). This discrepancy is likely attributable to historical underdiagnosis of severe cases in the pre-screening era and the resulting differences in the clinically ascertained cohorts available for genotyping.
Nevertheless, population-based studies continue to reveal rare or population-specific alleles, highlighting the extensive allelic heterogeneity of 21-OHD and the necessity of using ethnically tailored genetic panels for accurate diagnosis and effective genetic counseling (5, 16, 19, 22, 28, 33–35). The novel variants described in multiple studies encompass multiple alteration types including missense variants, p.(Gly424Ser), p.(Gly291Arg), p.(Ser301Tyr), p.(Arg483Gln), p.(Arg342Trp), nonsense variants, p.(Tyr376Ter), p.(Arg445Ter), frameshift variants (c.995-996insA, c.1123delC, c.1367delA), splicing variants and large deletions/conversions (26, 36–39). Functional characterization indicates most novel variants affect protein function through disruption of conserved residues or structural stability. Tardy et al. demonstrated that a series of rare missense substitutions variably impair 21-hydroxylase activity, ranging from near-null to residual levels compatible with nonclassical CAH, by disrupting heme coordination, active-site geometry, or overall protein stability (37). In a large Brazilian cohort, nine novel or rare variants were identified, several of which (p.(Arg408Cys), c.1450_1451insC, p.(His365Tyr), c.293-2C>G, p.(Gly424Ser) exhibited clear founder effects and shared haplotypes with Spanish and Portuguese patients, reflecting the historical Iberian contribution to Brazilian ancestry (22, 27, 40, 41). Founder effects have significantly influenced the global distribution of CYP21A2 variants, shaping the mutational landscape of 21-OHD. In certain populations, specific pathogenic alleles became disproportionately frequent due to their origin in a small ancestral group, followed by genetic drift, geographic isolation, or population expansion. The p.(Val282Leu) variant, one of the most common causes of nonclassical CAH, illustrates this phenomenon, showing high prevalence in Mediterranean and Latin American populations due to Iberian migration, with a strong association with the Human Leukocyte Antigen (HLA) B14 allele (42, 43). The p.(Arg426His) variant demonstrates a similar founder pattern, described in Austrian and Spanish cohorts, where it consistently correlates with the salt-wasting phenotype and severe enzymatic impairment (41, 44). More recently, the p.(Gly424Ser) variant has been identified as a likely founder allele in certain European and South American cohorts (36, 41, 45), exhibiting recurrent haplotypes, strong linkage disequilibrium, and association with both simple virilizing and nonclassical phenotypes. Reports from Spanish and Chinese cohorts have added additional novel alleles supported by shared ancestral haplotypes and structural predictions of impaired protein stability (29, 34). These founder variants not only reflect the evolutionary and demographic history of affected populations but also carry significant implications for genetic counseling, carrier screening, and the interpretation of genotype-phenotype correlations in 21-OHD.
Genotype-phenotype correlationMany studies have investigated genotype-phenotype correlations in large national and multiethnic cohorts. In 21-OHD, a strong correlation between genotype and phenotype has been consistently reported across multiple populations (22, 26, 46, 47). Functional in vitro studies have allowed classification of CYP21A2 pathogenic variants into four groups based on residual enzymatic activity: group 0 (null, 0%), group A (<2%), group B (3-7%), and group C (20-60%), as summarized in Table 1 (18, 31, 48–50). This functional classification provides a useful framework for associating molecular defects with clinical presentation. Functional classification of CYP21A2 variants is based on enzymatic activity measured in vitro; however, these estimates are derived from heterogeneous experimental systems, including different expression models (e.g., COS-1 or HEK293 cells), assay substrates, and normalization strategies. Consequently, reported activity values may vary across studies, and direct quantitative comparisons between variants should be interpreted cautiously, as they represent approximate functional ranges rather than strictly comparable measurements. In general, severe genotypes leading to SW uniformly showed a strong correlation with clinical phenotypes. Reported concordance rates range from 97-100% in null genotypes and 79-96% in A genotypes associated with the SW form, 53-87% in B genotypes associated with the SV form, and 65-100% in C genotypes with the NC form (19, 26, 31, 35, 47).
Most affected individuals are compound heterozygotes, and the clinical outcome is typically determined by the allele that maintains the highest residual enzyme activity. Consequently, the SW form generally results from combinations of two severe alleles (groups 0 or A), whereas the SV form arises in patients homozygous for group B variants, or compound heterozygous of group B variants with groups null or A. The NC phenotype usually manifests in carriers of two mild alleles (group C) or combinations of mild and severe (groups null, A or B) variants (22, 26, 47, 48).
However, the phenotypic spectrum is continuous, and boundaries between SW, SV, and NC forms are not always clear-cut, especially in males or in individuals with intermediate enzyme activity. Although genotype-phenotype correlation in 21-OHD is generally strong, discordant cases are well documented (25, 28) (see topic Pitfalls). Importantly, genotype-phenotype discordance in CAH reflects not only the primary CYP21A2 pathogenic variant but also allelic configuration (cis/trans), promoter strength, splicing efficiency, copy number variation, and extra-adrenal metabolism. Therefore, apparent inconsistencies between genotype and clinical severity frequently result from incomplete molecular characterization rather than a true biological discrepancy.
Molecular diagnostic methodsTo date, a variety of molecular techniques have been used for the genetic analysis of 21-OHD. Overall, these techniques can be classified into three categories, which can be complementary (51):
Analysis of duplications and deletions (Southern blot and MLPA).
Targeted Genotyping Methods for Common CYP21A2 Variants (Allelic specific dot blot, allelic specific Polymerase Chain Reaction (PCR), SnapShot minisequencing).
Whole gene sequencing (Sanger sequencing, Next generation sequencing and Long-read sequencing).
Analysis of duplications and deletionsInitial methodologies for molecular 21-OHD diagnosis relied on Southern blot. This technique enabled the detection of common CYP21A2 deletions or conversions but lacked resolution for single-nucleotide variants. Subsequently, MLPA became a standard method for detecting deletions and large gene conversions.
Southern blotSouthern blot represented a landmark technique in the molecular analysis of CYP21A2. By digesting genomic DNA and hybridizing specific probes, the method separates the two loci into distinct fragments, permitting identification of structural defects typically associated with severe, salt-wasting phenotypes (48). Despite its diagnostic value, Southern blot is labor-intensive and time-consuming, typically requiring 5–10 days depending on laboratory infrastructure, probe characteristics, and autoradiography exposure times. Early implementations relied on radioactively labeled DNA probes, raising significant concerns regarding laboratory safety and the long-term storage and disposal of radioactive phosphorus waste with prolonged half-lives. Subsequent developments introduced fluorescently labeled probes, allowing signal detection by fluorescence imaging systems with the advantages of digital quantification, improved safety, and multiplexing capabilities (52).
Multiplex ligation-dependent probe amplification in CYP21A2 genotypingIntroduced in the early 2000s, MLPA represented a major advance in the molecular diagnosis of CAH providing a rapid, sensitive, and non-radioactive method to detect copy-number variation (CNV) within the CYP21A2 locus. MLPA replaces Southern blot by allowing simultaneous quantification of up to 50 genomic targets through multiplex probe hybridization, ligation, and fluorescent PCR amplification. Fragment analysis by capillary electrophoresis enables detection of heterozygous deletions, duplications, and CYP21A1P/CYP21A2 chimeric alleles arising from unequal crossover. When combined with Sanger sequencing, MLPA provides near-complete genotypic resolution and remains the diagnostic gold standard recommended by European Molecular Genetics Quality Network (EMQN) guidelines (53). Because approximately 2-6% of affected alleles harbor two or more point variants, the combined MLPA-Sanger workflow requires parental segregation analysis to determine whether the variants are in cis or trans, a critical step for confirming compound heterozygosity and ensuring accurate genetic counseling (22, 25, 31, 54, 55). Although MLPA is highly reliable for dosage analysis, it cannot define breakpoints or resolve complex hybrid alleles at nucleotide resolution, being unable to distinguish attenuated chimeras CH-4 and CH-9 from classic chimera CH-6 (21, 56, 57), underscoring the emerging role of long-read sequencing technologies, which can directly phase variants and comprehensively characterize the RCCX locus in a single assay (58).
Targeted genotyping methods for common CYP21A2 variantsTargeted Genotyping Methods for common CYP21A2 variants relies on the fact that approximately 75% of pathogenic variants, especially SNVs and small insertions and deletions (Indels), are derived from nonfunctional pseudogenes by gene conversion events (56). These variants can be detected by specific molecular techniques because they either create or destroy specific DNA sequences or restriction sites. Several different methods and strategies have been described that cover a variable range of pathogenic variants.
The targeted genotype method was
Comments (0)