Deep phenotyping of blood cell data reveals novel clinical biomarkers

Introduction The complete blood count with differential (CBD) is one of the most commonly performed blood tests worldwide, used in nearly all areas of medicine. Although modern CBD analyzers generate flow-cytometry based single-cell measurements, the resultant CBD markers are limited to coarse summary features, such as total cell counts and average cell sizes. This means, the markers cannotdetect subtle cell population shifts that may signal early-stage pathogenesis. To test this, we evaluate whether AI-based analysis of the raw single-cell data underlying the CBD can be used to develop novel, clinically prognostic biomarkers, across patient settings.

Method We developed two complementary methods for biomarker discovery using CBD tests and evaluated them with longitudinal data from an academic medical center. To create interpretable biomarkers, we clustered cells into physiologically meaningful sub-populations and performed robust statistical summarization. In tandem, self-supervised autoencoders were developed to extract novel non-linear markers. We evaluated the utility of these clustering (CLS) and autoencoder (AE) markers for patient prognostication across a range of outcomes (mortality, inpatient admission, and future disease development).

Results Our study included 242,623 CBD samples from 127,545 patients. Both clustering and embedding approaches successfully generated hundreds of new clinical biomarkers. Many biomarkers showed strong prognostic associations for all-cause mortality, inpatient admission, and development of anemia, cancer, or cardiovascular disease, with associations remaining significant after adjustment for demographics and clinical CBD markers. A large subset of these prognostic markers also showed high novelty – having low correlations to existing CBD markers, while also exhibiting significant correlations with broader physiologic signals, such as inflammatory, hormonal, infectious, and coagulopathic markers.

Conclusion Collectively, these results demonstrate how modern AI techniques can allow for deeper phenotyping of routine clinical blood counts, generating novel biomarkers that capture more subtle physiologic signals than what are currently clinically utilized.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study was not specifically supported by grant funds. BHF lists broader research program support from the Brotman Baty Institute.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study was performed under a research protocol approved by the University of Washington Medicine Institutional Review Board, under a waiver of informed consent due to minimal risk.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Comments (0)

No login
gif