Explainable rule-based prediction of cultivation media for microbes

Available online 14 October 2025

Computational and Structural Biotechnology Journal

Author links open overlay panel, , Abstract

Knowledge of microbial growth preferences remains dispersed—often confined to research articles or human experts—making new experiment design heavily reliant on manual expertise and literature searches. While previous computational efforts have explored media prediction through phylogenetic similarity or leveraged genomic data for trait modeling, they often produce predictions whose underlying biological rationale is not transparent or rely on biased features (e.g. incomplete genome annotations, etc.). To address this need for greater interpretability, we analyzed the recently introduced KG-Microbe dataset, a harmonized resource of microbial organismal traits, to elucidate growth media preferences. We employed explainable methods by developing a simple, rule-based classifier from these traits and rigorously compared its performance and interpretative power to that of a high-performing black-box model. While the black-box model showed slightly higher accuracy, the transparency of the rule-based system and its ability to generate verifiable, biologically plausible rules make it a more sustainable and insightful framework. To further explore feature importance, we applied SHAP to the black-box model and compared the results with a rule-based feature-importance method. This uncovered limitations of post-hoc interpretations and confirmed the reliability of the rule-based approach. Finally, leveraging the resulting rule set—together with insights from a large language model (LLM) and domain experts—we propose strategies to advance microbial research.

Graphical Abstract