Predictive health index: integrating body composition and heart rate variability metrics with artificial intelligence to predict chronic disease risk and specific chronic non-communicable diseases

Study design and sample population distribution

The aim of this study was to validate the PHI for chronic cardiovascular, metabolic, psychological, neoplastic, and inflammatory disorders. Data were obtained from two large multicenter studies employing non-invasive diagnostic technologies for body composition analysis (BIA-ACC®>, Biotekna, Marcon, Italy) and HRV analysis (PPG® Stress Flow, Biotekna, Marcon, Italy). Data collection occurred between February 1, 2018, and December 31, 2023, including 35,405 individuals [28, 30].

The study population consisted of 35,405 Caucasian individuals (11,443 men and 23,962 women) aged from 18 to 90 years, with a median age of 54 years. Participants were clustered into one healthy control group (CTRL) and five clinical groups based on chronic disease patient history. Ethical approval for data usage and incorporation of the BIA-ACC® and PPG Stress Flow® devices were ensured by the Ethics Committee of the University Research Institute of Maternal and Child Health and Precision Medicine, National and Kapodistrian University of Athens, Athens, Greece [28, 30]. All procedures adhered to the ethical standards of the Committee Responsible for Human Experimentation and the 1975 Declaration of Helsinki, as revised in 1983, while all apparatuses have been CE-certified as non-invasive medical devices employed for diagnostic and monitoring purposes since 2004.

The CTRL included individuals with no NCDs nor declaring medically unexplained symptoms (MUS). The clinical groups consisted of five distinct clusters, specifically:

Cardiovascular disorders (referred to as hypertension, atherosclerosis, myocardial infarction, heart failure, cardiac arrhythmia, cardiomyopathies, stroke),

Metabolic syndrome (referred to as obesity, dyslipidemia, insulin resistance, hyperglycemia, type 2 diabetes mellitus),

Psychological disorders (referred to as of anxiety or depression).

Neoplastic disorders.

Chronic inflammatory disorders (referred to as rheumatoid arthritis, inflammatory bowel disease, ankylosing spondylitis, psoriasis, Crohn’s disease, psoriatic arthritis, Hashimoto’s thyroiditis, celiac disease, or other chronic inflammatory conditions).

Table 1 Summarizes the population demographics and dataset split into training and testing subsets. Notably, women were more than twice represented in the dataset when compared to men, highlighting a gender imbalance in the study population (Fig. 1)Fig. 1figure 1

Age distribution of the study population. This population consisted of 11,443 Caucasian men and 23,962 Caucasian women, aged 18 to 90 years (median age: 54 years). Data were extracted from two large multicenter studies using the non-invasive medical devices BIA-ACC and PPG Stress Flow, between 01/02/2018 and 12/31/2023

Table 1. The median [5th-95th percentile] of age, train and test data size in the studied populations stratified by class of NCDs. Summaries are reported as median [5th–95th percentile] for consistency with the original analytic plan; future extensions of the work will provide [2.5th–97.5th].

Non-invasive devices for data collection

Non-invasive diagnostic devices were employed to collect physiological and health-related data. Dual-frequency bioimpedance measurements were conducted with the BIA-ACC® medical device (BioTekna, Marcon, Italy) for extensive body composition analysis. This device offers an advanced apparatus, as previously described in Peppa et al. [26,27,28, 30]. The parameters provided were HPA axis index (PA), extracellular water percentage (ECW%), intracellular water percentage (ICW%), total body water percentage (TBW%), T-score, S-score, fat mass percentage (FM%), intramuscular adipose tissue percentage (IMAT%), and extracellular matrix percentage (EC matrix%).

HRV analysis was administrated using the medical device PPG Stress Flow® (BioTekna – Italy) [30], an optoelectronic plethysmograph measuring the change in blood volume for analysis and biofeedback of HRV of the autonomic nervous system measured parameters such as average heart rate (BPM), root mean square of successive differences (RMSSD), standard deviation of NN intervals (SDNN), low-frequency/very low-frequency ratio (LF/VLF), and total power.

Table 2. All parameters used to validate the PHI.

Table 2 Includes the parameters employed to calculate the PHIStatistical analysesStatistical analysis and machine learning model

Analyses were performed in R (v. 4.2.2, The R Foundation for Statistical Computing). Continuous variables are presented, unless otherwise specified, as median [5th–95th percentile]; categorical variables as n (%). Statistical significance was set at p < 0.05. To construct the PHI, five independent binary classification random forest models (one for each clinical category compared to CTRL) were trained using the randomForest package (Liaw & Wiener, 2002). No hyperparameter optimization was performed; the model was therefore trained with the package default settings. Variable importance was assessed using mean decrease in Gini impurity. The dataset was divided via a 75%/25% stratified random split at the subject level (Training/Test), maintaining proportions between groups. Validation included the Mann-Whitney U test on predicted score distributions, ROC curves, and AUC (area under the ROC curve). The probability of disease was estimated based on predicted values using a generalized additive model (GAM). No 95% confidence intervals for AUC were calculated in this version; such an analysis is planned for future methodological extension.

The PHI was assessed by means of statistical and ML methods aiming to obtain valuable results and validate the predictive performance, focusing on the use of random forest models to predict health outcomes based on specific physiological parameters.

Step-by-step description

1.

Dataset preparation and loading

·Load the complete dataset containing both CTRL and CASE data.

·Randomly split the dataset into two parts:

·Trainingset (75% of the CTRL dataset + 75% of the CASE dataset)

·Testset (25% of the remaining CTRL dataset + 25% of the remaining CASE dataset)

2.

Random forest algorithm model training

 ·Use the “training set” to train the random forest algorithm:

·Input: PPG Stress Flow + BIA-ACC® parameters

·Output: predicted value: probability of presence or absence of pathology/disorder

3.

Validation metrics

 ·Validate the trained random forest model employing the test set using:

Dataset preparation and loading

Comprising both CTRL and case (CASE) data, the complete dataset was merged and randomly split into two parts, as follows: (a) training set (75% of the CTRL dataset + 75% of the CASE dataset) and (b) test set (25% of the remaining CTRL dataset + 25% of the remaining CASE dataset) operational accuracy of the model and to ensure that all clinical categories remain proportionally represented.

Handling of missing data: Main analyses were conducted using complete-case analysis: records missing one or more required parameters were excluded during pre-processing. No imputation of missing data was performed. This choice is discussed among the limitations.

Model training

The training set was employed to train a random forest algorithm using as input the features derived from the BIA-ACC® and PPG Stress Flow®> metrics. The target variable was identified and set, referring to the presence or absence of pathology. Next, a random forest classifier is initialized using a machine learning library [35, 36]. Training is initiated by running the model with the training set, using the defined features and target variable. This training enables the random forest algorithm to identify patterns and feature relations that influence predicting pathology. In addition to the single 75/25 stratified train–test split, we implemented 10-fold cross-validation to further assess the robustness and generalizability of the random forest models. Cross-validation was conducted at the subject level to prevent data leakage, with folds stratified to preserve the proportion of cases and controls within each clinical category. Performance metrics from each fold were averaged to obtain mean estimates with 95% confidence intervals, which are reported alongside the test-set results.

Validation metrics

In this step, the trained random forest model obtained using the test set through various statistical and graphical methods. For all analyses, a p-value of 0.05 was considered statistically significant, while bioinformatic analyses were performed using R software (version 4.2.2, The R Foundation for Statistical Computing). The probability of disease is estimated based on random forest predicted values using a GAM. Model performance was quantified using a comprehensive set of metrics: accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1-score, and AUC. Confusion matrices were also derived for each clinical category. To provide robust estimates, 95% confidence intervals were computed for all metrics, including AUC, via stratified bootstrapping with 1000 resamples. ROC curves with 95% confidence bands were plotted for each model. Variable importance was assessed using the mean decrease in Gini impurity, as implemented in the randomForest package, and features were ranked accordingly.

Furthermore, a box-whisker plot is generated to visually represent the distribution of the predicted probabilities, thereby providing a graphical representation of the data spread. The disorder probability is calculated by fitting the disorder with predicted values obtained with the random forest method using a generalized additive model (GAM) [37]. GAM can be considered a generalized linear model (GLM), in which the linear predictor is given by a user-specified sum of smooth functions of the covariates, plus a conventional parametric component of the linear predictor.

Additionally, the ROC curve metric is plotted to evaluate the model’s performance in terms of sensitivity and specificity, illustrating the trade-off between true and false positive rates. Finally, the AUC value is calculated to quantify predictive accuracy, with a higher AUC indicating better model performance. These validation steps ensure that the model’s predictions are reliable and accurate [38].

For each clinical category, specific statistical tests were conducted and the model’s performance metrics were calculated to evaluate the effectiveness of the random forest in predicting the presence or absence of disorders and diseases [Predictive Health Index, PHI]. Additionally, key predictive variables were derived from the analysis. Variable importance identified the most influential features per case, including age, HPA axis index, and specific bioimpedance and HRV metrics providing an interpretation of their role in the context of the analyzed disorders. In this light, predicted probability distribution was conducted presenting the correlation between the model’s predicted values and the associated probabilities are examined, highlighting the linearity of the relations.

Outcome definition

For each of the five clinical categories, independent binary models (CASE vs. CTRL) were trained. No multi-label classification was implemented.

Comments (0)

No login
gif