The study employed a comprehensive dataset of anterior segment slit-lamp images encompassing 12 common ocular conditions, systematically categorized according to standard ophthalmological references including EyeWiki [25], Wills Eye Manual [26], and Digital Reference of Ophthalmology [27]. These conditions were further grouped by anatomical region: (1) pupillary zone pathologies, comprising cataract, intraocular lens, and lens dislocation; (2) corneal disorders, including keratitis, corneal scarring, corneal dystrophy, and corneal tumors; and (3) conjunctival abnormalities, consisting of pinguecula, pterygium, subconjunctival hemorrhage, conjunctival cyst, pigmented nevus, and conjunctival tumors. Because tumorous lesions may involve both corneal and conjunctival tissues, they were consolidated into a unified category of corneal/conjunctival tumor. Representative images of the 12 conditions are provided in Fig. 1.
Fig. 1
Example images of 12 ocular anterior segment diseases
All images were acquired at the Second Affiliated Hospital of Zhejiang University using standardized Topcon SL-D701 slit-lamp biomicroscopes equipped with DC-4 digital cameras between November 2016 and July 2024. Inclusion criteria required images with adequate clarity to allow definitive lesion identification and visualization of at least one major anterior segment region. Exclusion criteria included images containing non-target pathologies, slit-beam illumination, or cobalt blue light.
LabelingA dedicated annotation team was established to ensure high-quality and consistent labeling of all slit-lamp images. The team consisted of one junior ophthalmologist (JO) with over 3 years of clinical experience, one senior ophthalmologist (SO) with more than 5 years of experience, and one specialized ophthalmologist with more than 10 years of expertise. The JO and SO independently annotated the images using the VGG Image Annotator tool, and all annotations were subsequently reviewed and confirmed by the specialized ophthalmologist to resolve discrepancies. For each of the 12 target lesions, bounding boxes were applied to delineate lesion boundaries, accompanied by categorical labels specifying lesion type.
Development of the SSOD systemAs illustrated in Fig. 2a and previously described [28], our screening framework builds upon the Mean-Teacher semi-supervised object detection (SSOD) paradigm, enhanced with two specialized modules designed to address key challenges in medical image analysis. The overall training objective combines supervised and unsupervised components:
Fig. 2
The network architecture of the SSOD system. (a) the overall structure of the SSOD algorithm, (b) the Category Control Embed (CCE) module, (c) the Out-of-distribution Detection Fusion Classifier (OODFC) module
where \(\:\) balances the contributions of the supervised and unsupervised losses. The supervised loss is defined as
.
while the unsupervised loss incorporates consistency regularization with refined pseudo-labeling:
$$\:_}=_^\left|\right|_}\left(_\right)-_}\left(T\left(_\right)\right)|^$$
where \(\:T\left(_\right)\) represents a transformed version of \(\:_\), and \(\:_}\:\)denotes the model’s prediction function.
The first module, Category Control Embed (CCE), mitigates class imbalance by constructing a dynamic Foreground Information Library that selectively oversamples rare categories while undersampling dominant ones. Foreground segments extracted from labeled data are blended with unlabeled images to generate balanced synthetic training samples, formulated as:
$$\:_^\text}=\beta\:_^}+\left(1-\beta\:\right)_,\:_}^}=\text\left(_,_\right)=_,$$
where \(\:_^}\) is an augmented foreground segment, and \(\:_\) is a randomly selected region from an unlabeled image fused with \(\:_^}\), ensuring equitable class representation during model optimization (Fig. 2b).
The second module, Out-of-Distribution Detection Fusion Classifier (OODFC), tackles the open-set problem by introducing an auxiliary detector to identify previously unseen categories within unlabeled data. Its predictions are fused with the teacher model outputs using a category-specific adaptive threshold:
$$\:_=\text\left(0,\text\left(1,^\cdot\:\left(A_-1\right)}\right)\right)$$
where \(\:_\) determines whether to retain the label of an unknown category, \(\:A_\) is the Average Precision of category.
? from the fully supervised teacher network, and \(\:\gamma\:\) controls the growth rate of the exponential function. This design reduces misclassification of novel lesions while maintaining high accuracy for known categories (Fig. 2c).
Both modules are seamlessly integrated into a Mean-Teacher [29,30,31,32] pipeline that employs ResNet-50 [33] as the backbone, Feature Pyramid Network (FPN) [34] as the neck, and Fully Convolutional One-Stage Object Detector (FCOS) [35] as the detection head. Within the teacher–student paradigm, the teacher model is updated via Exponential Moving Average (EMA),
$$\:}_=}_+\left(1-\right)}_^}$$
where \(\:\alpha\:\) is a decay coefficient and \(\:_^}\) denotes the current model parameters, ensuring stable evolution and high-quality pseudo-label generation from both original and CCE-augmented data. By unifying CCE and OODFC within the SSOD framework, the proposed system effectively addresses the dual challenges of class imbalance and open-set recognition while maintaining computational efficiency, thereby achieving superior performance in semi-supervised detection of ocular anterior segment lesions.
In our experiments, a rigorous data partitioning scheme was implemented. The training set included 31% of single-lesion images spanning 12 pathological categories, 50% of unknown-class samples, and all unlabeled images, while the validation set comprised the remaining 69% of single-lesion images, the other 50% of unknown-class samples, and all multi-lesion cases. Single-lesion images facilitated focused learning of core pathological features, whereas multi-lesion images were reserved exclusively for validation to evaluate performance on complex, clinically representative scenarios. The partial inclusion of unknown-class data in training enabled controlled adaptation to open-set conditions. This partitioning strategy ensured comprehensive exposure to fundamental lesion patterns while maintaining a realistic open-set evaluation environment. Detailed data distribution, including sample counts per category and split, is provided in Table 1.
Table 1 Statistics of the slit-lamp image datasetEvaluation of the lesion detection ability of the SSOD systemWe implemented a comprehensive dual evaluation framework combining quantitative metrics and clinical assessment to rigorously evaluate model performance. Quantitative evaluation employed recall and Average Precision (AP) metrics following standard object detection protocols with the MMDetection object detection benchmark [36]. In our experiments, AP is computed as the exact area under the Precision-Recall curve by numerical integration, then averaged over Intersection over Union (IoU) thresholds, categories, and object-area ranges. Recall was reported as a global average recall. Mean Average Precision (mAP), defined as the mean of AP values across all classes, was used as an overall indicator of model performance across the dataset. In addition, precision, recall, and F1-score were reported at a confidence threshold of 0.3, which was selected to balance sensitivity and specificity in a manner consistent with clinical screening requirements.
Three model configurations were systematically compared: SSOD_1, treating ocular trauma as an unknown class; SSOD_2, treating intraocular lens, corneal dystrophy, and subconjunctival hemorrhage as unknown classes, selected as representative diseases from distinct anatomical regions; and YOLOv8 [37] as a benchmark. Performance was evaluated separately on single-lesion and multi-lesion images, with the latter assessing model capability in complex clinical scenarios.
For qualitative clinical evaluation, the SSOD models were compared with YOLOv8 using three criteria: diagnostic accuracy (scored 1 for correct, 0 for incorrect diagnosis, reflecting misdiagnosis), lesion comprehensiveness (scored 1 for complete detection, 0 for missed lesions, reflecting underdiagnosis), and localization precision (scored 1 for accurate bounding boxes, 0.5 for partially accurate boxes with acceptable positional deviations, and 0 for clearly inaccurate boxes). The composite clinical score, summing these three components, ranged from 0 to 3. Evaluations were performed separately on single-lesion and multi-lesion images.
To establish clinical benchmarks, three junior ophthalmologists and one senior ophthalmologist, independent of the original annotation team, assessed 127 single-lesion images covering all 12 disease categories and 15 randomly selected multi-lesion images from the validation set. This blinded expert evaluation enabled direct comparison between human and model performance under identical assessment criteria.
Statistical analysisFor quantitative evaluation, AP and recall metrics were computed to assess lesion detection performance of the SSOD and YOLOv8 systems. In addition, to facilitate operating-point–based comparison, precision, recall, and F1-score were further calculated at a confidence threshold of 0.3.
For clinical evaluation, manual scoring of three key diagnostic measures—diagnostic accuracy, lesion comprehensiveness, and localization precision—was performed to compare the performance of SSOD, YOLOv8, and ophthalmologists.
Comments (0)