Machine–learning and gene–synergy networks reveal interpretable drivers of antibiotic resistance in Staphylococcus aureus

Staphylococcus aureus (S. aureus) causes more than 100 000 deaths worldwide annually owing to antimicrobial resistance (AMR) and remains persistent because of its ever-evolving resistance mechanisms [1,2]. It acquires resistance through multiple strategies, including horizontal gene transfer, chromosomal mutations, efflux pump activity, and enzymatic modification or inactivation of antibiotics [1,3]. Moreover, new drug resistance mechanisms have emerged and spread globally, resulting in decreased efficacy of current treatment against common bacteria that cause severe and often deadly infections [4].

Machine learning approaches have been applied to large-scale genome sequencing and drug susceptibility datasets in order to uncover potential genetic determinants that shape AMR [[5], [6], [7], [8]]. These models typically rely on two major categories of genomic features: (1) predefined markers like known resistance genes, which are interpretable but limited by prior knowledge [5,[9], [10], [11]]; and (2) reference-agnostic k-mer representations, which comprehensively capture novel and complex determinants without needing a reference genome [[12], [13], [14], [15]]. However, k-mer-based approaches often result in high-dimensional models with limited biological interpretability [12,16]. Thus, an ideal framework would balance the discovery power of k-mers with the interpretability offered by gene-based features.

Furthermore, resistance is often governed by coordinated genetic networks rather than single genes. In particular, the interplay of regulatory elements – for instance, mecR1/mecI-mediated control of mecA, together with additional two-component systems that modulate cell-wall homeostasis – illustrates how coordinated gene networks rather than single determinants shape AMR [[17], [18], [19]]. These observations motivate approaches that retain genomic context, yield interpretable gene-level features, and explicitly consider multi-gene effects (genetic interactions [GI]/epistasis) that are actionable for combination strategies [20,21].

Therefore, we developed a fine-grained and reference-agnostic gene-context 22-mer (gkmer) representation that links sequence signals to specific genes and functional domains, thus enabling interpretable gene-level features while retaining the discovery power of k-mer analysis. Specifically, we introduced: (1) a two-step random forest pipeline (RF1 and RF2) that transitions from gkmer screening and mapping to interpretable gene-level models; (2) co-information-based gene-synergy networks that capture higher-order (epistasis-like) effects beyond pairwise associations; and (3) protein-structure mapping to anchor features in putative functional regions. Collectively, these innovations establish a generalizable framework that extends k-mer analysis beyond black-box prediction, thus providing a systematic and interpretable approach for modelling AMR, uncovering higher-order GI, and generating mechanistic hypotheses.

Comments (0)

No login
gif