Urinary tract infections (UTIs) are among the most common bacterial infections, affecting over 150 million people globally each year.1 UTIs can be chronic and recurrent2 and can lead to significant complications, including pyelonephritis and sepsis, especially in children, women and elderly individuals. In severe cases, the mortality rate may reach 50%, and in cases of septic shock, it may even reach 80%.3
The most common agents among gram-negative uropathogens are uropathogenic Escherichia coli (E. coli) and Klebsiella pneumoniae (K. pneumoniae).4,5 These two gram-negative bacterial species inhabit the gut and form part of the normal microbiota, even in healthy individuals. On CHROMID® CPS® Elite medium, uropathogenic E. coli typically presents as pink to burgundy colonies, while K. pneumoniae appears blue-green to blue-grey after at least 16 hours of incubation at 37 °C.6,7 Despite this colour-based differentiation, the process of colony identification remains manual and relies on expert interpretation, which can be variable and is a bottleneck in the rapid reporting of results.
Urine culture is a diagnostic tool that detects and quantifies pathogenic microorganisms and determines their antibiotic susceptibility patterns. It is the most commonly employed method for diagnosing UTIs due to its ease of use and optimal results.8,9 The midstream urine method is the most widely used because it is simple and practical. Collecting the middle portion of the urine stream helps reduce contamination from skin and genital microbiota that can affect the initial flow.10 Alternative methods, such as suprapubic aspiration and straight catheterisation, may result in lower contamination rates than the midstream method; however, urine culture remains the gold standard for diagnosing UTIs.11
While urine culture is used for diagnosing complicated and upper UTIs,12 there are two major challenges in managing the treatment process. The first challenge is the increasing resistance of microorganisms to antibiotics, which is largely attributed to their misuse and overuse. Several studies have highlighted the emergence of antibiotic resistance among certain pathogens.13,14 The side effects of broad-spectrum antibiotics and the emergence of antibiotic resistance have led researchers to explore new methods for managing them.15 Although guidelines such as those from the Infectious Diseases Society of America (IDSA) and The European Association of Urology (EAU) recommend first-line agents with a narrow spectrum of activity (eg, nitrofurantoin, fosfomycin, or trimethoprim-sulfamethoxazole) for uncomplicated UTIs, in clinical practice, broader-spectrum antibiotics are frequently prescribed empirically, particularly in hospital settings and for complicated infections, without waiting for urine culture results.16 This practice is common for both community-acquired and hospital-associated UTIs. However, the result of the culture is typically available between 48 and 72 hours after sampling.17 This delay forces empiric therapy, which contributes to overuse and resistance. If the treatment is appropriate according to the antibiogram results, the physician usually proceeds with the antibiotic or de-escalates to a narrower spectrum agent, a core strategy of antimicrobial stewardship. De-escalation is usually preferred to reduce the risk of increased antibiotic resistance.18 Consequently, the prolonged diagnostic timeline (48–72 hours) inherently delays this critical stewardship intervention, forcing an extended period of empirical, often broad-spectrum therapy. Any such delay not only increases individual patient risk but also undermines institutional antimicrobial stewardship efforts. Furthermore, the excessive utilisation of antibiotics may precipitate the emergence of bacterial resistance, particularly within hospital settings and intensive care units (ICUs), which can ultimately result in mortality in severe cases. Indeed, it has been claimed that highly antibiotic-resistant bacteria are among the most common causes of death in hospitals.19 Therefore, bridging this critical time gap between sample collection and species-directed therapy is paramount. Artificial intelligence (AI), capable of analyzing digital images of growing colonies within hours, emerges as a transformative tool to directly address this delay, potentially compressing the diagnostic timeline and informing clinical decisions much earlier.
The second challenge is the time requirement for the diagnosis. The results of a urine culture may take between 16 and 72 hours20 to be available, and this prolonged period is associated with increased risks. For instance, in sepsis management, each hour of delay in effective antimicrobial therapy has been shown to significantly increase mortality.21 This delay could be a life-threatening factor for some specific hospitalised patient groups, such as immunocompromised individuals and elderly individuals.22 Moreover, a further 16 hours are required to complete the antibiotic susceptibility test (AST) following the detection of the pathogen. Therefore, studies have attempted to develop ways to achieve quicker results.23,24
The field of AI in healthcare has recently attracted the attention of numerous researchers.23,25 The number of studies examining the application of automation and AI in medical laboratories is increasing, particularly in the field of microbiology.26–28 In the context of microbiology, deep learning models such as YOLO (You Only Look Once) can be trained on large datasets of digital colony images. These models learn to detect and classify pathogens based on distinguishing visual features, including colony colour, morphology, texture, and spatial distribution, that are typically interpreted by human specialists. This allows for the automation of a key, time-consuming step in the diagnostic workflow. Several studies have evaluated the effectiveness of AI-assisted systems for the rapid and automated detection of specific pathogens under specific conditions.28–31 The implementation of automation in these studies has facilitated the generation of rapid definitions with the assistance of AI, resulting in the expeditious transmission of results to clinicians via laboratory programmes. It appears that AI may prove to be a transformative force in these studies.32,33 However, to the best of our knowledge, few studies have been conducted using common media, such as chromogenic agar, to detect multiple bacterial species.
This study aims to develop an AI approach to rapidly identify E. coli and K. pneumoniae colonies in urine cultures. Unlike studies focusing on single pathogens or proprietary media, this approach leverages the standard laboratory workflow to provide early, simultaneous detection of the two most prevalent UTI pathogens. Detecting these two major UTI pathogens at their earliest growth stage on widely used chromogenic agar could make antibiotic susceptibility results available more quickly, enabling clinicians to initiate targeted therapy sooner. Successful implementation of such a model holds promise not only for faster pathogen detection but also for seamless integration into automated laboratory systems, standardizing interpretation, reducing technologist hands-on time, and ultimately accelerating the reporting of antibiotic susceptibility results to clinicians.
Materials and MethodsThis study uses an image processing algorithm, YOLOv12s (You Only Look Once version 12), to rapidly identify E. coli and K. pneumoniae via chromogenic agar Petri dish images of urine cultures cultivated on the Copan Walk Away Specimen Processor Lab (WASP) system at the ISLAB-2 Central Core Laboratory located in a research hospital.
WASP SystemThe WASP Lab (by Copan WASP®, Italy) was developed to process culture tests in an automated robotic system. Culture samples are processed by robots with a 1 µL loop (1 colony means 1000 cfu/mL), and high-resolution digital images of the Petri dish are acquired before and after 16 or 18 hours (time can be set) of cultivation for urine cultures using a 16Mpixel tri-linear colour camera performing push-broom line scanning under a calibrated white LED lighting system. The images of bacterial growth are analysed by a specialist, and the specialist decides to perform further analysis if there is no other bacterial contamination. If there are three or more bacteria, the result is confirmed as contamination. Further analysis includes the identification and then the AST to be performed by a technician. E. coli is identified by a specialist on the basis of the colour of the colonies observed in chromogenic agar (chromID agar, bioMérieux, Marcy l’Etoile, France). K. pneumoniae is identified via matrix-assisted laser desorption/ionisation–time-of-flight mass spectrometry (MALDI-TOF MS) (bioMérieux, Marcy l’Etoile, France). MALDI-TOF MS is recognised as the current gold standard for resolving microbial identification issues in clinical microbiology, due to its rapid and accurate identification capabilities. For E. coli, identification was done by the appearance of typical colonies in the chromID CPS Elite medium, produced by bioMerieux (Marcy-l’Etoile, France). The chromogenic medium enables the direct identification of E. coli, depending on the activity of the species-specific β-glucuronidase enzyme, which causes the colonies to turn pink to burgundy. This approach is a standard practice in clinical microbiology. The referenced validation study for this specific medium demonstrates a high sensitivity (98.1%–99.0%) and specificity (99.1%) for the direct identification of E. coli based on this chromogenic phenotype at 16 hours.34 All E. coli isolates in this study exhibited this typical morphology, and their antimicrobial susceptibility profiles were consistent with E. coli.
Data CollectionIn this study, 1547 chromogenic agar Petri dish images of urine cultures, of which 850 E. coli and 697 K. pneumoniae, were collected from the WASP Lab system (Figure 1). The urine samples were collected from patients admitted to various departments (eg, Intensive Care Units, Internal Medicine, and Outpatient Clinics) of the research hospital between November 1st, 2023, and February 6th, 2024. The images were captured with high resolution by the WASP system shortly after the 16-hour incubation period. All samples were routine clinical specimens submitted for culture, and the study utilized anonymized, residual clinical data approved by the ethics committee; thus, no specific patient recruitment was performed. Figures 2a and 3a illustrate the growth of E. coli and K. pneumoniae at the 16th hour, respectively, on Petri dishes. In these Petri dishes, each dot represents a bacterial colony. The data indicate that the mean number of bacterial colonies per Petri dish image is 50. However, some samples presented a colony count of less than 10. Additionally, some samples presented the presence of both bacterial types and displayed different colours, such as gold.
Figure 1 Standard treatment process for Escherichia coli and Klebsiella pneumoniae.
Figure 2 An example of the model output for predicting Escherichia coli. (a) An example sample agar Petri dish image of E. coli urine cultures is presented. To develop the model, each colony was plotted into bounding boxes and labelled on the basis of the bacterial name: E. coli with green boxes and K. pneumoniae (not shown in this figure) with blue boxes. The red cross indicates the colony selected on the Copan WASP software by the microbiology specialist for antibiogram analysis. (b) Representation of the model outputs, using the sample in a. (c) Zoomed view of the model output in b.
Figure 3 An example of the model output for predicting Klebsiella pneumoniae. (a) An example sample agar Petri dish image of a K. pneumoniae urine culture is presented. To develop the model, each colony was plotted into bounding boxes and labelled on the basis of the bacterial name: K. pneumoniae with blue boxes and E. coli (not shown in this figure) with green boxes. The red cross indicates the colony selected on the Copan WASP software by the microbiology specialist for antibiogram analysis. (b) Representation of the model outputs, using the sample in a. (c) Zoomed view of the model output in b.
AI Image ProcessingA total of 1547 images were used to develop the YOLOv12 image processing model. Two specialists reviewed the images and labelled them as ID1, referring to E. coli, and ID2, referring to K. pneumoniae. We divided the dataset into two parts: training and testing. 100 image samples were randomly selected from the dataset for training the model: 50 for E. coli and 50 for K. pneumoniae, and were called TrainDB. The remaining 1447 samples were used for testing the AI model: 801 for E. coli and 646 for K. pneumoniae, and were named as TestDB.
In YOLO architectures, each object is represented by coordinate-based rectangular boxes (x, y, width, height). These annotations were converted to class labels and coordinate information as required by YOLOv12. YOLOv12 is an advanced model that offers higher accuracy compared to previous convolutional neural network (CNN)-based versions and successfully performs real-time object detection. As an attention-centric variant of the YOLO family, it incorporates efficient attention mechanisms that enhance feature extraction and localization accuracy.35 This architecture provides a superior balance of speed and accuracy, making it highly suitable for the precise and rapid identification of bacterial colonies on agar plates, which is critical for clinical integration and effective antibiotic stewardship. Experts created rectangular bounding boxes for the colonies in each image in TrainDB, completely encompassing the colony and minimizing the background. Before model training, all images were resized to 640×640 pixels to match YOLOv12’s recommended input size. During training, the images were augmented using slight rotations (90°, −90°, and 180°), Gaussian blur, small-angle random rotations between −15° and +15°, horizontal and vertical shifts of up to 10%, ±10% scaling, up to 10° shear transformation, vertical (20%) and horizontal (50%) flips, as well as color adjustments, where hue was varied by 1.5%, saturation by 70%, and value by 40%. These augmentations were applied to increase the model’s generalization capacity.
Although the number of TrainDB images in this study is small, each Petri dish image contains dozens of colonies. Because YOLO architectures learn at the object (colony) level, rather than the image level, the 100 images in the training set provide a large sample that represents all variations in colonies. Data augmentation increased the diversity in colony location, orientation, and sharpness, thus minimizing the model’s data gap. As shown in Figure S4, the total number of colonies for both classes is quite high and this object-level sample diversity is also confirmed by the class and spatial distribution.
In the class frequency plot, label “1” represents E. coli (≈37,000 samples), and label “2” represents ID2 K. pneumoniae (≈32,000 samples). A single dot, in the upper right, represents an outlier sample. This outlier could be an enormous colony or an incorrectly drawn bounding box. This outlier was used in training without being removed from TrainDB. The 2D density map is shown in the lower left panel. These represent the normalized coordinates of the bounding box center of each colony. This distribution indicates that the colonies are concentrated in the central region of the images, consistent with the Petri dish structure. The lower right panel shows the normalized width and height values of the bounding boxes (range from 0 to 1). It demonstrates that the colonies are small and consistent in size. These analyses show that TrainDB is balanced, structurally consistent, and suitable for model training for both classes.
In addition, an Out-of-Distribution (OOD) External Test Set was conducted using 91 samples (60 samples for ID1: E. coli and 31 samples for ID2: K. pneumoniae) from an independent study to assess the generalisability36 of the findings. The OOD External Test Set was obtained from another published study in which samples were used to predict positive, negative, and uncertain urine tests.37
No preprocessing or data augmentation was applied to TestDB and the OOD External Test Set; the images were given directly to the model, and during the prediction phase, the YOLOv12 model automatically scaled the images to 640×640 pixels. This external dataset was used to assess the model’s generalizability in performing colour-based classification on an independent image set, acknowledging that species-level confirmation was not available.
Proposed AI ModelYOLOv12 is a real-time object detection35 model based on convolutional neural networks and provides higher accuracy than previous YOLO models. The YOLOv12 architecture can be trained in detection, segmentation, exposure, and classification modes. In this study, the YOLOv12 model was not trained from scratch; instead, we fine-tuned the pretrained YOLOv12 weights provided by Ultralytics using our annotated TrainDB dataset. While the YOLOv12 architecture can be trained to classify a Petri dish, we utilised it in object detection mode to identify different types of bacterial colonies within a single dish. For transparency and reproducibility, our trained model, sample Petri dish images, and the visual interface code of our application are shared on https://github.com/serpilustebay/chromogenic-urine-bacteria-detection. The best hyperparameters are detailed in the Supplemental Data Box File. For transparency and reproducibility, the complete set of preprocessing steps and model implementation scripts is available upon request and can be shared via GitHub. As a result of the training process, the coordinates of the colonies in each image, along with the probabilities of their belonging to the corresponding bacterial class, were obtained. The image-level classification was performed using a majority voting method based on the class labels (ID1 or ID2) of the colonies detected by the model. The majority voting method is used to make the final decisions on the agar Petri dish image classification. The Petri dish images were classified according to the most detected colony ID, where E. coli was represented by ID1 and K. pneumoniae was represented by ID2 (Figures 2 and 3). It is important to note that this single-label, majority voting approach is a simplification for this initial validation study. In a clinical setting, samples containing a significant mixture of pathogens would require a multi-label classification framework to accurately report the presence of each pathogen.
The prediction model was tested using 1447 Petri images, of which 800 were labelled E. coli, and 647 K. pneumoniae. The model results were evaluated by comparing them with the labels provided by experts. In addition, external validation was conducted using 91 samples from an independent study37 to assess the generalisability of the findings.
Performance Evaluation MetricsWe evaluated the performance of the model using accuracy, precision, recall, F1 score, specificity, false positive rate (FPR), false negative rate (FNR), and Hamming loss metrics. Accuracy measures the overall correctness of a model by calculating the ratio of correctly predicted instances to total instances (Accuracy = (TP + TN) / (TP + TN + FP + FN)). Precision, also known as the positive predictive value, indicates the proportion of true positive results among all positive predictions made by the model (Precision = TP / (TP + FP)). Recall or sensitivity measures the model’s ability to identify all relevant instances by calculating the proportion of true positive results out of the actual positives (Recall = TP / (TP + FN)). The F1 score is a harmonic mean of the precision and recall (F1 = 2 × (Precision × Recall) / (Precision + Recall)). Specificity represents the proportion of true negative results among all actual negatives (Specificity = TN / (TN + FP)). The false positive rate (FPR) corresponds to the fraction of negative samples incorrectly classified as positive (FPR = FP / (FP + TN)), while the false negative rate (FNR) denotes the proportion of positive samples incorrectly classified as negative (FNR = FN / (FN + TP)). Hamming loss measures the proportion of incorrectly predicted labels, offering a useful complementary metric, especially in multi-label classification tasks (Hamming Loss = (FP + FN) / Total Samples).
Ethical ApprovalThe study was conducted in accordance with the principles outlined in the Declaration of Helsinki. The authors obtained ethical approval (ID 2024/188) from Sancaktape Sehit Prof Dr Ilhan Varank Training and Research Hospital to conduct the study. Patients’ confidentiality was protected under the hospital’s policies.
ResultsThe YOLOv12 colony detection model was trained using the TrainDB dataset. The training process consisted of 50 iterations, during which different hyperparameter combinations were tested in each iteration. All experiments were conducted on a local workstation equipped with a 12th Generation Intel Core i5-1240P processor and 16 GB of RAM, running the Windows operating system. The variation of fitness values obtained during the hyperparameter search phase over the iterations is shown in Figure S2. As a result of this process, the best hyperparameters were determined, and the model was tuned by using these optimal values.
Figure S3 shows the loss functions and the changes in mAP50 and mAP50–95 across epochs during the training and validation phases of the YOLOv12 model tuned with the best hyperparameters. The train/val loss and mAP curves show that the model exhibits steady convergence throughout the training process, with decreasing losses and steadily increasing accuracy metrics. These findings confirm that the YOLOv12 model performs balanced learning without overfitting.
The YOLOv12 model was able to detect colonies on the sample and predict the probability of each detected colony being E. coli or K. pneumoniae. Figures 2 and 3 present the classification results of Petri images according to majority voting. The model was also able to detect more complex samples where multiple bacterial types exist in the same agar Petri dish (Figure 4). Figure 4 reveals the predictive capability of the YOLOv12 model, which proves the potential of the automated bacterial colony detection process, reducing the workload of the laboratory, as traditionally, detection is performed by an expert. Our results also demonstrate a transparent AI process, which explains the generation of the output. Most AI studies are criticised for being black boxes.36,38,39
Figure 4 An example of the model output for a mixed culture. (a) A sample agar Petri dish image containing both Escherichia coli and Klebsiella pneumoniae colonies is presented. To develop the model, each colony was plotted into bounding boxes and labelled on the basis of the bacterial name: E. coli with green boxes and K. pneumoniae with blue boxes. The red crosses indicate the colonies selected on the Copan WASP software by the microbiology specialist for antibiogram analysis. (b) Representation of the model outputs, using the sample in a, where the model correctly identifies both bacterial species. (c) Zoomed view of the model output in b.
The performance results of the model on TestDB are presented in Table 1. These results were obtained by combining colony-level predictions using the majority voting method. An image-level classification accuracy of 0.99 (99%) was achieved. In addition, high values were obtained for all key classification metrics, including precision (0.99), recall/sensitivity (0.99), specificity (0.99), and F1 score (0.99). The extremely low false positive rate (FPR = 0.001) and false negative rate (FNR = 0.004) indicate that the model is minimally prone to error. The Hamming loss of 0.002 also confirms that the misclassification rate is very low. Overall, the proposed model exhibited stable and highly sensitive classification performance on TestDB. Figure S1 Confusion matrix for the proposed model.
Table 1 Performance Metrics of the Proposed Model on Different Datasets
Although the model outperformed the other methods, testing of 1,447 Petri dish images revealed four instances where the model incorrectly classified the types of bacteria (Figure 5). Specifically, there were two occasions where E. coli colonies were misidentified as K. pneumoniae, as shown in Figure 5a, and two instances where K. pneumoniae colonies were misidentified as E. coli, as shown in Figure 5b. Aside from these four incorrect classifications, there was only one case in which the model could not make any predictions (Figure 5c). One of the reasons for this misidentification is that the model detected round colonies, which clearly shows that there are more standalone K. pneumoniae colonies in Figure 5a and more standalone colonies of E. coli in Figure 5b. However, here, there was also a rare case where the E. coli colonies were gold coloured, which was unusual for E. coli at the 16th hour. A few non-lactose varieties of E. coli can be produced in different colours on different agar.40,41 Thus, it is understandable why the model could not detect that. This was the case in Figure 5c; the model was unable to predict anything. However, the expert classified the image as E. coli.
Figure 5 Examples of misclassified Petri dish images. In these images, the red cross indicates a colony selected by a microbiology specialist for antibiogram analysis, while the green check mark indicates a colony selected and processed by a laboratory technician on the Copan WASP software. (a) An image that was identified as E. coli by experts, but the AI model misclassified it as K. pneumoniae. (b) An image that was misclassified as E. coli by the AI model, although experts had identified it as K. pneumoniae. (c) An image identified as E. coli by experts, which the AI model was unable to classify as either E. coli or K. pneumoniae.
Additional validation was performed on the OOD External Test Set, obtained from a source completely independent of TrainDB and TestDB, to assess the generalizability of the model. It consists of 91 Petri dish images collected under different laboratory and imaging conditions. The proposed model achieved 100% accuracy on this OOD External Test Set, correctly classifying all samples. Precision, recall, specificity, and F1-score values reached 1.00, while FPR, FNR, and Hamming loss were calculated as 0. It should be emphasized, however, that this 100% accuracy reflects agreement with colour-based assumptions rather than biologically confirmed species-level identification. It is crucial to interpret this perfect result as validation of the model’s consistent colour-classification performance, not as definitive proof of species identification accuracy.
To determine the position of our approach in terms of performance in the existing literature, we conducted a comparison with DenseNet201, ResNet50, MobileNetV3Large, Xception, and InceptionV3, which are widely used deep learning methods in image processing. These networks share similar convolutional layers but differ in their structure and use. DenseNet improves gradient traversal by using the output of all previous layers. ResNet50 trains deeper networks and improves performance by using skip connections. MobileNetV3Large is a lightweight CNN designed for mobile devices. It employs deep convolutional decomposition to reduce computational costs. Xception offers enhanced efficiency through the utilisation of deep decomposed convolutions, which reduce the number of required parameters. InceptionV3 facilitates the acquisition of features at disparate scales through the utilisation of assorted convolution kernels and factorisation techniques. MobileNetV3Large is the lightest model developed for use in mobile applications, comprising approximately 5.4 million parameters and consisting of 88 layers. DenseNet201 has approximately 20 million parameters and consists of 201 layers, representing the model with the largest number of parameters in this list. Xception comprises approximately 22 million parameters and consists of 71 layers. InceptionV3 has approximately 24 million parameters and consists of 48 layers, while ResNet50 (approximately 25 million parameters and 50 layers) is the model with the largest number of parameters in this list. To ensure the integrity of the comparison, all models were employed in the training phase, utilising the ImageNet weights. A fully connected layer (dense) with two outputs was incorporated into the final layer of the models. Furthermore, to prevent the neural networks from overlearning (overfitting), we employed the Dropout Layer regularisation technique. All methods were optimised with the same search space and the Tree-structured Parzen Estimator (TPE) method with the optimal parameters to achieve the highest accuracy. The results obtained are listed in Table 2.
Table 2 A Comparison of the Deep Learning Model Performances
In general, the high accuracy and reliability of YOLOv12 make it the most suitable model for this study, and its superior performance in real-time applications suggests that the model can be used to develop a novel detection system by integrating autonomous robotics with antibacterial susceptibility testing. The development of an autonomous detection system could result in significant savings in terms of time and cost.
DiscussionThe primary objective of this study was to develop an accurate AI model for pathogen identification. Our finding of 99% accuracy on the internal test set directly fulfils this objective, demonstrating that the YOLOv12 architecture can reliably learn the distinct phenotypic signatures of E. coli and K. pneumoniae on chromogenic agar. This high level of accuracy is the foundational requirement for any subsequent clinical integration. To our knowledge, this is the first study to report an AI-assisted method for identifying E. coli and K. pneumoniae from nonspecific chromogenic agar images of urine cultures.
A core motivation for this work was to address the critical diagnostic delay in UTI management. The proposed YOLOv12 model enables a rapid bacteria type identification, producing a classification decision within seconds (speed: 15.8 ms preprocessing, 474.3 ms inference, and 13.5 ms postprocessing) per image as soon as the 16-hour incubation image is captured. By meeting this objective of speed, we establish the technical feasibility of shortening the timeline to targeted therapy by approximately 18 hours, a key step for antimicrobial stewardship42 and the time requirement for the diagnosis.20,42 Indeed, several studies have shown that using automation increases efficiency and reduces costs by reducing time and staff requirements.29,43 For example, Faron et al30 compared the turnaround time (TAT) of technicians and automated WASPLab software for analysing urine cultures on chromID agars and reported that the software reduced the TAT by 4 hours and 42 minutes for negative cultures and 3 hours and 28 minutes for positive cultures.
This study proposes an automated diagnostic workflow integrating our YOLOv12 model. Following standard urine culture inoculation, the system captures agar plate images at 16 hours (or 18 hours if growth is delayed). YOLOv12 analyzes these to provide rapid phenotypic identification of E. coli or K. pneumoniae/Klebsiella, Enterobacter, Serratia, Citrobacter (KESC)-group colonies. This result can automatically trigger appropriate antibiotic susceptibility testing (AST). An AST of gram-negative bacteria could be completed in approximately 11.7 hours in Vitek2 (bioMérieux, Marcy l’Etoile, France).44 After completing all tests, the clinician can review the results approximately 24–30 hours later. By providing pathogen identification ~18 hours earlier than conventional methods, this system enables clinicians to make targeted therapy decisions faster, which is critical for high-risk patients (AIDS, cancer chemotherapy and any immunosuppressive medicine recipients).45
Robots in automated systems inoculate samples according to the type specified by the laboratory director. Only the BD Kiestra total laboratory automation system (BD Kiestra B.V., Drachten, Netherlands) and the WASPLab system (Copan Italia S.p.A., Brescia, Italy) utilise robots for inoculation.31 Therefore, a robotic system can standardise the culture and help specialists evaluate it accurately. In the present study, the agar images were inoculated using automated robots, and streaking lines were applied according to the standard protocol. The implementation of standardised procedures may enhance the efficacy of AI in such studies.
Study LimitationsThe proposed YOLOv12 model accurately detects the two most common gram-negative bacteria. The model was able to detect multiple bacterial types on a single agar Petri dish. However, there was only one occasion where the model could not classify the bacterial type, which was labelled E. coli by the expert (Figure 5c). This was due to its rare gold colour at the 16th hour on chromID agar. E. coli can rarely be observed in different colours, such as white and gold.41 Therefore, we excluded that image from the dataset. Nevertheless, if the model is trained with a greater number of rare case images, it would be possible to develop the model’s ability to detect those cases and, in turn, determine the most suitable AST type rapidly. Since Vitek2 systems use different types of AST cards for gram-positive and gram-negative bacteria, even detecting the bacterial group, whether gram-positive or gram-negative, could save time in an automated system. While this study achieved its objective of proof-of-concept identification based on colony phenotype, a key limitation for clinical translation is the reliance on this phenotypic ground truth. Therefore, the primary conclusion of this study is that the model accurately classifies colony phenotypes corresponding to key diagnostic pathways, rather than providing conclusive species-level identification. The reliance on phenotypic ground truth limits the certainty with which we can attribute the model’s performance to species identification versus colour/shape recognition. Therefore, a primary objective for future work must be to validate the model against a fully MALDI-TOF-confirmed dataset to establish species-level accuracy beyond chromogenic morphology.
Furthermore, there were four occasions where the model misclassified the images (Figure 5b and c). This was due to the model being developed; the model identifies colonies in a round box shape and uses the majority voting method to classify an image, which forces the model to classify images into a single class. However, some samples contained multiple bacterial types. Although the model assigned a single class via the majority voting method, the experts stated that these samples should not be considered incorrectly classified but should have been assigned to both classes. Considering this situation, a better approach for the automated AST system would be to treat such samples as a multilabel problem. The multi-label classification problem is an artificial intelligence approach that allows samples to be assigned to more than one class, rather than being assigned to a single class. When there are more than 10 colonies of two or more bacterial species detected in an image, the AI model can activate the multilabel mechanism and label them accordingly.
The restricted scope for pathogen detection can be considered another limitation of this study. Despite the KESC group providing the same colours on chromogenic agar, it is challenging to differentiate between them solely on the basis of colour. However, a notable advantage of this approach is that the same type of AST is employed for this group. Accordingly, the model developed in this study can be employed to select the optimal type of AST for detecting the most prevalent bacterial strains associated with UTIs, specifically E. coli and K. pneumoniae.
Another limitation is the data used for the external validation. We have not been able to find data on chromogenic images of the bacteria identified by MALDI-TOF MS to test our model. We had to use the images of the study to classify the urine cultures as positive, negative or contaminated. They did not name the bacteria. Therefore, we assumed that all green colonies were K. pneumoniae-like to test our data.37 However, the most critical consideration for the external validation is the reliance on presumed species labels based on colony morphology. Therefore, the 100% accuracy primarily demonstrates robust generalizability of the phenotypic classification rule, not of microbiological species identification.
ConclusionTo conclude, this paper has shown that the YOLOv12 model can effectively and rapidly perform phenotypic classification of the most common gram-negative uropathogens on chromogenic agar. This includes not only E. coli but also the KESC group, which share the characteristic blue-green phenotype and often uniform AST profiles, thereby broadening the clinical utility of the system. The primary value of this approach lies in workflow acceleration; by providing reliable phenotypic classification at the 16–18 hour mark, it has the potential to shorten the time to actionable information for clinicians. This transparency is afforded by the model’s output of visual bounding boxes with per-colony probabilities, allowing for specialist review. It is crucial to acknowledge that the model’s current clinical utility remains hypothetical. Its promise is contingent upon future validation using isolates with definitive species identification (eg, MALDI-TOF MS) and more robust, diverse training data. This includes external validation on datasets with confirmed identities, as the perfect performance reported here was against colour-based assumptions. While this AI model is not a replacement for gold-standard identification or AST, it could serve as a critical trigger to expedite these downstream processes. Therefore, if validated under these conditions, such a model could form a foundational component of future antimicrobial stewardship programs by potentially reducing reliance on prolonged empirical therapy. This impact, however, requires direct confirmation through prospective clinical studies that measure the effects on antibiotic use and patient outcomes.
From a risk management perspective, shortening the diagnostic timeline addresses a critical vulnerability in current practice. Wider adoption of validated AI-assisted microbiology could therefore improve patient outcomes, enhance hospital efficiency, and contribute to policy goals aimed at combating antimicrobial resistance. Future research must focus on the essential steps of expanding validation with definitive ground truth, incorporating additional pathogens, and evaluating the real-world impact of such systems within diverse healthcare settings, implementing a multi-label framework to accurately identify poly-microbial cultures.
AbbreviationsE. coli, Escherichia coli; K. pneumoniae, Klebsiella pneumoniae; UTI, Urinary tract infection; AI, artificial intelligence; WASP, Walk Away Specimen Processor Lab; AST, antibiotic susceptibility test; MALDI-TOF MS, matrix-assisted laser desorption/ionisation–time-of-flight mass spectrometry; FPR, false positive rate; FNR, false negative rate; KESC, Klebsiella spp., Enterobacter spp., Serratia spp., and Citrobacter spp; YOLO, You Only Look Once.
DisclosureThe authors declare no conflicts of interest in this work.
References1. Yang J, Eyre DW, Lu L, Clifton DA. Interpretable machine learning-based decision support for prediction of antibiotic resistance for complicated urinary tract infections. Npj Antimicrob Resist. 2023;1(1):1–9. doi:10.1038/s44259-023-00015-2
2. Tamadonfar KO, Omattage NS, Spaulding CN, Hultgren SJ. Reaching the End of the Line: urinary Tract Infections. Microbiol Spectr. 2019;7(3). doi:10.1128/microbiolspec.BAI-0014-2019
3. Rodríguez-Guerrero E, Moya-López J, Expósito-Ruiz M, Navarro-Marí JM, Gutiérrez-Fernández J. Preliminary reading of antibiogram by microdilution for clinical isolates in urine culture. Eur J Clin Microbiol Infect Dis. 2024;43(3):517–524. doi:10.1007/s10096-024-04747-5
4. Vachvanichsanong P, McNeil EB, Dissaneewate P. Extended-spectrum beta-lactamase Escherichia coli and Klebsiella pneumoniae urinary tract infections. Epidemiol Infect. 2020;149:e12. doi:10.1017/S0950268820003015
5. Flores-Mireles AL, Walker JN, Caparon M, Hultgren SJ. Urinary tract infections: epidemiology, mechanisms of infection and treatment options. Nat Rev Microbiol. 2015;13(5):269–284. doi:10.1038/nrmicro3432
6. Yarbrough ML, Wallace MA, Marshall C, Mathias E, Burnham CA. Culture of Urine Specimens by Use of chromID CPS Elite Medium Can Expedite Escherichia coli Identification and Reduce Hands-On Time in the Clinical Laboratory. J Clin Microbiol. 2016;54(11):2767–2773. doi:10.1128/JCM.01376-16
7. Payne M, Roscoe D. Evaluation of two chromogenic media for the isolation and identification of urinary tract pathogens. Eur J Clin Microbiol Infect Dis. 2015;34(2):303–308. doi:10.1007/s10096-014-2235-3
8. Wojno KJ, Baunoch D, Luke N, et al. Multiplex PCR Based Urinary Tract Infection (UTI) Analysis Compared to Traditional Urine Culture in Identifying Significant Pathogens in Symptomatic Patients. Urology. 2020;136:119–126. doi:10.1016/j.urology.2019.10.018
9. Senthinathan A, Craven BC, Morris AM, Penner M, Tu K, Jaglal SB. Examining antibiotic prescribing and urine culture testing for urinary tract infections (UTIs) in a primary care spinal cord injury (SCI) cohort. Spinal Cord. 2023;61(6):345–351. doi:10.1038/s41393-023-00899-x
10. Cao Y, Gao F, Chen W. Comparison of different urine culture methods in urinary tract infection. Transl Androl Urol. 2022;11(2):260–267. doi:10.21037/tau-22-73
11. Ippoliti R, Allievi I, Rocchetti A. UF-5000 flow cytometer: a new technology to support microbiologists’ interpretation of suspected urinary tract infections. MicrobiologyOpen. 2020;9(3):e987. doi:10.1002/mbo3.987
12. Long B, Koyfman A. The Emergency Department Diagnosis and Management of Urinary Tract Infection. Emerg Med Clin North Am. 2018;36(4):685–710. doi:10.1016/j.emc.2018.06.003
13. Kotov SV, Pulbere SA, Alesina NV, et al. The problem of antibiotic resistance in patients with urinary tract infection. Urol Mosc Russ. 2021;2021(1):5–12.
14. Thaulow CM, Lindemann PC, Klingenberg C. Antibiotic resistance in paediatric UTIs in Norway. Tidsskr Den Nor Laegeforening Tidsskr Prakt Med Ny Raekke. 2021;141(10):889. doi:10.4045/tidsskr.20.0889
15. Şirin MC, Cezaroğlu Y, Sesli Çetin E, Arıdoğan B, Trak D, Arslan Y. Antibacterial and antibiofilm efficacy of colistin & meropenem conjugated silver nanoparticles against Escherichia coli and Klebsiella pneumoniae. J Basic Microbiol. 2023;63(12):1397–1411. doi:10.1002/jobm.202300440
16. Koh SWC, Tsm N, Loh VWK, et al. Antibiotic treatment failure of uncomplicated urinary tract infections in primary care. Antimicrob Resist Infect Control. 2023;12(1):73. doi:10.1186/s13756-023-01282-4
17. Lammers RL, Gibson S, Kovacs D, Sears W, Strachan G. Comparison of test characteristics of urine dipstick and urinalysis at various test cutoff points. Ann Emerg Med. 2001;38(5):505–512. doi:10.1067/mem.2001.119427
18. Goebel MC, Trautner BW, Grigoryan L. The Five Ds of Outpatient Antibiotic Stewardship for Urinary Tract Infections. Clin Microbiol Rev. 2021;34(4):e0000320. doi:10.1128/CMR.00003-20
19. Liu JY, Dickter JK. Nosocomial Infections: a History of Hospital-Acquired Infections. Gastrointest Endosc Clin N Am. 2020;30(4):637–652. doi:10.1016/j.giec.2020.06.001
20. Bailey AL, Burnham CAD. Reducing the time between inoculation and first-read of urine cultures using total lab automation significantly reduces turn-around-time of positive culture results with minimal loss of first-read sensitivity. Eur J Clin Microbiol Infect Dis. 2019;38(6):1135–1141. doi:10.1007/s10096-019-03512-3
21. Kumar A, Roberts D, Wood KE, et al. Duration of hypotension before initiation of effective antimicrobial therapy is the critical determinant of survival in human septic shock. Crit Care Med. 2006;34(6):1589–1596. doi:10.1097/01.CCM.0000217961.75225.E9
22. Tilahun M, Gedefie A, Sahle Z. Asymptomatic Carriage Rate, Multidrug Resistance Level, and Associated Risk Factors of Enterococcus in Clinical Samples among HIV-Positive Patients Attending at Debre Birhan Comprehensive Specialized Hospital, North Showa, Ethiopia. BioMed Res Int. 2023;2023:7310856. doi:10.1155/2023/7310856
23. Roux-Dalvai F, Gotti C, Leclercq M, et al. Fast and Accurate Bacterial Species Identification in Urine Specimens Using LC-MS/MS Mass Spectrometry and Machine Learning. Mol Cell Proteomics MCP. 2019;18(12):2492–2505. doi:10.1074/mcp.TIR119.001559
24. Gur’ev AS, Tigasson M, Shalatova OY, et al. Fast antibiotic susceptibility testing of urine microflora using a microbiological analyzer based on coherent fluctuation nephelometry. Braz J Microbiol Publ Braz Soc Microbiol. 2022;53(1):195–204. doi:10.1007/s42770-021-00671-4
25. Ustebay S, Sarmis A, Kaya GK, Sujan M. A comparison of machine learning algorithms in predicting COVID-19 prognostics. Intern Emerg Med. 2023;18(1):229–239. doi:10.1007/s11739-022-03101-x
26. Hartmann R, Jeckel H, Jelli E, et al. Quantitative image analysis of microbial communities with BiofilmQ. Nat Microbiol. 2021;6(2):151–156. doi:10.1038/s41564-020-00817-4
27. Croxatto A, Prod’hom G, Faverjon F, Rochais Y, Greub G. Laboratory automation in clinical bacteriology: what system to choose? Clin Microbiol Infect. 2016;22(3):217–235. doi:10.1016/j.cmi.2015.09.030
28. Goneau LW, Mazzulli A, Trimi X, Cabrera A, Lo P, Mazzulli T. Evaluating the preservation and isolation of stool pathogens using the COPAN FecalSwabTM Transport System and Walk-Away Specimen Processor. Diagn Microbiol Infect Dis. 2019;94(1):15–21. doi:10.1016/j.diagmicrobio.2018.11.020
29. Gao J, Chen Q, Peng Y, Jiang N, Shi Y, Ying C. Copan Walk Away Specimen Processor (WASP) Automated System for Pathogen Detection in Female Reproductive Tract Specimens. Front Cell Infect Microbiol. 2021;11:770367. doi:10.3389/fcimb.2021.770367
30. Faron ML, Buchan BW, Samra H, Ledeboer NA. Evaluation of WASPLab Software To Automatically Read chromID CPS Elite Agar for Reporting of Urine Cultures. J Clin Microbiol. 2019;58(1):e00540–19. doi:10.1128/JCM.00540-19
31. Quiblier C, Jetter M, Rominski M, et al. Performance of Copan WASP for Routine Urine Microbiology. J Clin Microbiol. 2016;54(3):585–592. doi:10.1128/JCM.02577-15
32. Porte L, Alfaro MJ, Varela C, Reyes J, Weitzel T. Automated urine culture system with reduced turnaround time: a prospective real-world evaluation. Diagn Microbiol Infect Dis. 2025;112(3):116826. doi:10.1016/j.diagmicrobio.2025.116826
33. Vandenberg O, Durand G, Hallin M, et al. Consolidation of Clinical Microbiology Laboratories and Introduction of Transformative Technologies. Clin Microbiol Rev. 2020;33(2):e00057–19. doi:10.1128/CMR.00057-19
34. Rigaill J, Verhoeven PO, Mahinc C, et al. Evaluation of New bioMérieux Chromogenic CPS Media for Detection of Urinary Tract Pathogens. J Clin Microbiol. 2015;53(8):2701–2702. doi:10.1128/JCM.00941-15
35. Khanam R, Hussain M. A Review of YOLOv12: attention-Based Enhancements vs. Previous Versions. arXiv. 2025. doi:10.48550/arXiv.2504.11995
36. Adadi A, Berrada M. Peeking Inside the Black-Box: a Survey on Explainable Artificial Intelligence (XAI). IEEE Access. 2018;6:52138–52160. doi:10.1109/ACCESS.2018.2870052
37. da Silva GR, Rosmaninho IB, Zancul E, et al. Image dataset of urine test results on petri dishes for deep learning classification. Data Brief. 2023;47(47):109034. doi:10.1016/j.dib.2023.109034
38. Burkart N, Huber MF. A Survey on the Explainability of Supervised Machine Learning. J Artif Int Res. 2021;70:245–317. doi:10.1613/jair.1.12228
39. Tjoa E, Guan C. A Survey on Explainable Artificial Intelligence (XAI): toward Medical XAI. IEEE Trans Neural Netw Learn Syst. 2021;32(11):4793–4813. doi:10.1109/TNNLS.2020.3027314
40. Akter L, Haque R, Salam Md A. Comparative evaluation of chromogenic agar medium and conventional culture system for isolation and presumptive identification of uropathogens. Pak J Med Sci. 2014;30(5):1033–1038. doi:10.12669/pjms.305.5243
41. Santos ACM, Fuga B, Esposito F, et al. Unveiling the Virulent Genotype and Unusual Biochemical Behavior of Escherichia coli ST59. Appl Environ Microbiol. 2021;87(16):e0074321. doi:10.1128/AEM.00743-21
42. Dromigny JA, Ndoye B, Macondo EA, Nabeth P, Siby T, Perrier-Gros-Claude JD. Increasing prevalence of antimicrobial resistance among Enterobacteriaceae uropathogens in Dakar, Senegal: a multicenter study. Diagn Microbiol Infect Dis. 2003;47(4):595–600. doi:10.1016/S0732-8893(03)00155-X
43. Yarbrough ML, Lainhart W, McMullen AR, Anderson NW, Burnham CAD. Impact of total laboratory automation on workflow and specimen processing time for culture of urine specimens. Eur J Clin Microbiol Infect Dis. 2018;37(12):2405–2411. doi:10.1007/s10096-018-3391-7
44. Richter SS, Dominguez EL, Hupp AA, Griffis M, MacVane SH. Evaluation of MicroScan and VITEK 2 systems for susceptibility testing of Enterobacterales with updated breakpoints. J Clin Microbiol. 2025;63(6):e0004825. doi:10.1128/jcm.00048-25
45. Kreitmann L, Helms J, Martin-Loeches I, et al. ICU-acquired infections in immunocompromised patients. Intensive Care Med. 2024;50(3):332–349. doi:10.1007/s00134-023-07295-2
Comments (0)