Genome-wide profiling of polymorphic short tandem repeats and their influence on gene expression and trait variation in diverse rice populations

Short tandem repeats (STRs), comprising repetitive DNA sequence motifs of 1 bp–6 bp, account for a significant portion of polymorphic variation in eukaryotic genomes (Willems et al., 2016; Fotsing et al., 2019; Zhang et al., 2022). STRs are typically unstable and hypervariable, with average mutation rates approximately 10 to 10,000 times higher than those observed in other regions of the genome (Legendre et al., 2007; Press et al., 2014). The vast majority of mutations in STRs are length polymorphisms, which are primarily thought to arise from replication-associated strand slippage (Strand et al., 1993; Lai and Sun, 2003).

Although STRs are ubiquitous across genomes, they were historically considered as nonfunctional sequences until the discovery that the expansion of a CGG repeat in the 5′ untranslated region (5′ UTR) of the Fragile X messenger ribonucleoprotein 1 (FMR1) gene causes fragile X syndrome (Fu et al., 1991; Kremer et al., 1991; Verkerk et al., 1991). To date, dozens of STR loci have been identified as associated with heritable diseases in humans, including Huntington's disease, spinocerebellar ataxias, and various neurological disorders. This indicates the significant role of STRs in contributing to heritable phenotypic variations in natural populations (Mirkin, 2007; Gymrek, 2017). Due to their widespread presence in regulatory regions across eukaryotic genomes, STRs are now believed to play a role in gene expression regulation. In agreement with this idea, numerous expression-associated STRs (eSTRs) have been identified by surveying the association between the length variation of STRs and gene expression in diverse human populations (Legendre et al., 2007; Bilgin et al., 2015; Gymrek et al., 2016; Fotsing et al., 2019; Jakubosky et al., 2020; Shi et al., 2023).

Although our understanding of STRs in nonhuman species remains incomplete, emerging evidence highlights the significant contributions of STRs to complex traits in plants. In Arabidopsis (Arabidopsis thaliana), variation in the length of natural polyglutamine (polyQ)-encoding STRs in the coding sequence of EARLY FLOWERING 3 (ELF3) substantially affects flowering time (Undurraga et al., 2012; Jung et al., 2020). Moreover, the full-length STR that encodes the polyQ tract of PHYTOCHROME AND FLOWERING TIME 1 (PFT1) is critical for proper flowering time and the shade-avoidance response in Arabidopsis (Sureshkumar et al., 2009; Rival et al., 2014). Additionally, accumulating evidences suggest that STRs influence transcript abundance in plants. For instance, a variable STR site in the cis-regulatory region of CONSTANS contributes to the variation of flowering time among diverse Arabidopsis accessions (Rosas et al., 2014). In tobacco (Nicotiana tabacum), the GAA repeat in the 5′ UTR of the pollen tube-expressed gene ntp303 is vital for determining mRNA stability and translation efficiency during pollen tube growth (Hulzink et al., 2002).

In contrast to the extensive investigations conducted in human populations, our understanding of STRs in plants remains limited. A study examining the association between gene expression and STRs in a natural population of Arabidopsis revealed the importance of STRs in regulating gene expression (Reinar et al., 2021). The authors also showed that length variations of protein-coding STRs within structurally disordered regions modulate protein function (Reinar et al., 2023). Rice (Oryza sativa L.) is a major staple food crop worldwide and serves as a model monocot with the smallest genome among the major cereals (International Rice Genome Sequencing Project, 2005). Over the past decade, advances in sequencing technologies have led to the accumulation of large-scale sequencing data for rice genomes and the identification of genomic variants associated with various agronomic traits through genome-wide association studies (GWAS) and other genetic mapping methods (Huang et al., 2010, 2011; Chen et al., 2014; Xie et al., 2015; Ming et al., 2023; Wang et al., 2023; He et al., 2024a). Despite the abundance of STRs throughout the genomes, their functions have often been overlooked. Recently, a pan-genome based survey encompassing 231 rice genome assemblies highlighted the significance of tandem repeats, including STRs, in regulating gene expression and agronomic traits (He et al., 2024b). However, due to the limited sample size, the evaluations of STR characteristics and their contributions to gene expression in rice remain underexplored. Constructing a comprehensive map of STRs from a large-scale population is essential and would enhance our understanding of the role of STRs in transcriptional regulation and trait variation.

In this study, we conducted a genome-wide investigation of polymorphic STRs with motif lengths ranging from 1 bp to 6 bp across 4726 rice accessions and analyzed their characteristics in detail. We generated a transcriptome dataset for 127 rice accessions to evaluate the influence of STRs on gene expression. Additionally, we found that many STRs have undergone genetic divergence, including some that are in high linkage disequilibrium (LD) with GWAS-based lead single nucleotide polymorphisms (SNPs) related to various agronomic traits. Furthermore, we obtained experimental validation for the role of a STR in regulating gene expression and trait variation, highlighting the potential of utilizing STRs to improve agronomic traits via genome editing. Our findings, based on a systematic survey of STRs in a rice population, deepen our understanding of the impact of STRs on transcriptional regulation.

Comments (0)

No login
gif