Predictive performance of ISS and NISS for clinical outcomes in severely injured trauma patients: a retrospective registry study

The present study provides a comprehensive evaluation of the Injury Severity Score (ISS) and the New Injury Severity Score (NISS) in predicting clinical outcomes and resource utilization across a cohort of 1,112 severely injured trauma patients (NISS ≥ 16). Most patients demonstrated discordant severity scores (NISS > ISS), with NISS reclassifying 76% of patients to a higher severity score. This upward shift in predicted risk did not consistently translate into improved discriminative performance. To our knowledge, few studies have compared the predictive performance of NISS and ISS for clinical outcomes in a cohort restricted to severely injured trauma patients [8].

In-hospital mortality

For in-hospital mortality, no statistically significant differences in discriminative power were observed between the two scoring systems in the whole cohort or any specific subgroup. Both scores reached their highest AUC values for mortality in the thorax and extremity subgroups (AUC > 0.80).

Earlier studies have indicated that NISS has similar or better predictive value for mortality in trauma patients. Large-scale registry studies by Lavoie et al. [19] and Harwood et al. [4] found that NISS demonstrated better discrimination predicting mortality in trauma patients. In contrast, several smaller scale studies have found no statistically significant difference between NISS and ISS in predicting mortality [3, 5, 7, 8]. This suggests that while large-scale registry studies can detect the marginal statistical benefit of NISS over ISS, this difference may be too subtle to be captured in smaller-scale studies or to provide a meaningful advantage in routine clinical survival benchmarking.

Hospital and ICU length of stay

NISS demonstrated a significant discriminative advantage for prolonged hospital stay in the extremity subgroup (AUC 0.654 vs. 0.585, p = 0.003). This likely reflects the cumulative surgical and rehabilitative burden associated with multiple orthopedic injuries [20, 21]. Unlike the ISS, which limits the contribution of a single anatomical region, the NISS aggregates multiple severe injuries within the same region, effectively capturing the increased resource utilization associated with complex orthopedic trauma.

Conversely, both scores showed fair and comparable prediction for prolonged ICU stay. Notably, both systems showed better discrimination for prolonged ICU stay than prolonged hospital stay in all groups, indicating that overall injury severity is more related to ICU stay than total hospital stay.

Prior literature is discrepant on the predictive power of NISS and ISS on prolonged hospital and ICU stay. Our findings align with studies by Harwood et al. [4], Lavoie at al. [12] and Ede et al. [10] which found that NISS demonstrated better discrimination for predicting prolonged hospital stay. Conversely an earlier study by Tamim et al. [5] found ISS to better predict prolonged hospital stay than NISS in trauma patients. Some studies have also found NISS to be a better predictor for prolonged ICU stay than ISS [4, 8].

Blood transfusions and intubation

A significant finding of this study is the superior discriminative performance of ISS over NISS in predicting blood transfusions across the total cohort (AUC: 0.669 vs. 0.630, p = 0.019), as well as in the head (AUC: 0.737 vs. 0.672, p = 0.004) and thorax (AUC: 0.656 vs. 0.613, p = 0.006) subgroups. This finding suggests that anatomic diversity required by the ISS calculation may serve as a superior proxy for the requirement of blood transfusions. We found that ISS also demonstrates better discrimination for intubation in patients with head injuries (AUC: 0.817 vs. 0.787 p = 0.038). This is likely due to head injury severity being the most important driver for requiring intubation [22]. While the NISS captures localized injury density, which reclassifies most patients to a higher injury severity, this increased severity does not translate to higher rates of blood transfusions or intubation.

Our findings differ from an earlier study which found no significant difference between discrimination for blood transfusions in patients with musculoskeletal injuries between NISS and ISS [10]. No prior study has analyzed the discriminative ability of NISS and ISS in predicting intubation in patients with head injuries, making this a novel finding of our study. A previous study by Jin et al. [7] found no significant difference between scoring systems in predicting intubation in patients with thoracic trauma and a study by Honarmand et al. [23] found NISS to demonstrate better discrimination in trauma patients.

Model calibration and stability

While both scores exhibited moderate discrimination in the whole cohort, both also failed calibration for multiple outcomes. This suggests that while both scoring systems are effective in ranking patients by risk, they lack the precision required to accurately predict absolute risk. This finding was also observed in the head trauma subgroup. In contrast, both scores showed acceptable calibration in thorax and extremity injury subgroups for most outcomes.

Calibration estimated by the Hosmer-Lemeshow (H-L) test should be interpreted with caution, especially when comparing calibration between different subgroups and studies with different sample sizes. The H-L test is known to be sensitive to large sample sizes, often yielding significant p-values for clinically negligible deviations [24]. This statistical phenomenon can partially explain the finding of both scoring systems failing calibration in the whole cohort for multiple outcomes but having acceptable calibration for most outcomes in the smaller thorax and extremity subgroups.

Beyond sample size, the improved calibration in these subgroups likely reflects the homogeneity of patients with specific injury patterns. Severely injured trauma patients form a diverse cohort where identical severity scores can represent vastly different physiological trajectories. By isolating specific injury patterns, we evaluate a more clinically uniform population where the relationship between trauma scores and outcomes becomes more consistent. In these homogenous groups, anatomic scores align more closely with the actual clinical course, allowing them to function as more reliable indicators of absolute risk.

Strengths and limitations

This study has several limitations. Its retrospective, registry-based nature meant that analyses were constrained by the availability and accuracy of recorded data. Additionally, the single-center setting may limit the generalizability of our findings to other trauma systems that differ in patient populations, management protocols, or available resources.

Despite these limitations, the study has notable strengths. The cohort was prospectively collected, representing a comprehensive sample of severely injured patients treated at a tertiary trauma center. We had access to comprehensive data for the whole cohort and a sample size which enabled subgroup analysis.

Comments (0)

No login
gif