Background Chronic kidney disease (CKD) affects approximately 850 million individuals worldwide and remains a leading cause of morbidity, premature mortality, and escalating healthcare costs. Despite the availability of clinical biomarkers, CKD progression to end-stage renal disease (ESRD) is frequently identified late, limiting opportunities for preventive intervention. Conventional predictive models have relied predominantly on static cross-sectional laboratory values, failing to capture the temporal dynamics of disease trajectory that longitudinal claims data can provide.
Objective This study proposes a novel hybrid machine learning framework — XGBoost-LSTM-Attention (XLA) — that integrates gradient-boosted feature selection with long short-term memory (LSTM) networks and a temporal attention mechanism to improve early prediction of CKD progression from Stage 3 to Stages 4/5 or ESRD using longitudinal claims-based features.
Methods We conducted two complementary analyses. Primary analysis: a cross-sectional validation using real NHANES 2015–2018 data (n=701 CKD Stage 3 adults) predicting significant proteinuria (UACR ≥30 mg/g) from clinical features excluding UACR. Supplementary analysis: an NHANES-calibrated longitudinal cohort (n=8,412) with simulated quarterly measurements demonstrated XLA performance under real-world longitudinal data conditions. All models were evaluated using 5-fold stratified cross-validation.
Results In the primary NHANES cross-sectional analysis, the XLA framework achieved AUC-ROC of 0.684 (95% CI: 0.641–0.727), with all models performing comparably (AUC 0.684–0.710), confirming that cross-sectional clinical features alone provide limited signal for proteinuria prediction and underscoring the necessity of UACR measurement. In the longitudinal supplementary analysis, XLA achieved AUC-ROC of 0.994 versus 0.939 for the best cross-sectional baseline (+5.5%), demonstrating that temporal trajectory features — particularly eGFR slope and RAAS adherence trends — confer substantial incremental predictive value when longitudinal data are available.
Conclusion The XLA framework demonstrates meaningful advantages over traditional approaches when applied to longitudinal claims data. Cross-sectional findings highlight the irreplaceable role of direct UACR measurement in CKD risk stratification. Together, these results provide actionable evidence for both the limitations of static prediction and the promise of trajectory-based approaches in value-based care programs managing large CKD populations.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementThis study did not receive any external funding. The research was conducted independently by the authors.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The primary analysis used NHANES 2015 to 2018 public use data, which were openly available before the initiation of this study and are freely accessible at: https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2015 The secondary analysis used an NHANES calibrated synthetic longitudinal cohort. No private, restricted, or patient identifiable data were used.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Comments (0)