Impact of Transfer Learning on Convolutional Neural Networks for Odontogenic Tumor Diagnosis

Images constitute the fundamental input in any computer vision–based classification project, particularly in medical and histopathological applications [9,10,11, 25]. Each image encodes morphological information that a convolutional neural network (CNN) can learn to recognize, classify, or segment—such as cellular patterns, tissue organization, textures, and color variations [9, 10]. In histopathology, the use of digitized slides enables CNNs to distinguish tumor types according to their visual characteristics, emulating the diagnostic reasoning process of pathologists but on a larger and faster scale [10, 26]. The quality, quantity, annotation accuracy, and preprocessing of these images directly influence model performance [27, 28]. Data augmentation, for instance, remains an essential strategy to simulate the natural variability observed in clinical settings and to increase robustness against artifacts or staining variations [29, 30].

ImageNet [17] is a large-scale database containing millions of labeled natural images from thousands of categories. For more than a decade, it has been widely used as a reference dataset for training deep neural networks in visual recognition tasks [31]. Models pre-trained on ImageNet acquire a broad representation of the visual world, learning to recognize generic patterns such as edges, textures, and spatial structures. When these pre-trained weights are transferred to new domains, the network does not start from random initialization but from a set of generalized visual features already optimized for pattern recognition [31]. This strategy, known as transfer learning, offers several advantages: the initial layers of a CNN capture universal visual features that can be reused for specialized domains such as medical imaging [32], allowing efficient learning even when the available dataset is relatively small. Pre-trained models usually converge faster, achieve higher accuracy, and generalize better, acting as an implicit regularization mechanism that reduces overfitting [32, 33].

In the present study, the images represented real clinical cases of odontogenic tumors, and the networks pre-trained on ImageNet allowed the models to leverage prior visual knowledge to improve classification performance across three histological entities: AOT, ameloblastoma, and ameloblastic carcinoma. This approach is particularly relevant given that these lesions belong to a morphological continuum—from benign to malignant—where the visual differences between classes can be subtle.

A clearer justification of the clinical value of this study involves recognizing that differentiating AOT from ameloblastoma is generally straightforward for most experienced pathologists when representative tissue sections are available. However, this does not diminish the interest in evaluating how AI behaves when classifying AOT, since non-representative, fragmented, or poorly preserved biopsies may occasionally obscure classic features. More importantly, the true diagnostic challenge lies in distinguishing ameloblastoma from ameloblastic carcinoma, an area characterized by overlapping architectural features, variable cytologic atypia, and documented interobserver variability—including findings already reported by our group [14]. AI-based tools, particularly CNN-driven classification and future AI-assisted segmentation, hold potential as objective adjuncts in these complex scenarios by consistently highlighting architectural and cytological cues that may be subtle or inconsistently appreciated.

The clinical utility of AI becomes even more apparent when considering rarer odontogenic entities such as adenoid ameloblastoma, primordial odontogenic tumor, and dentinogenic ghost cell tumor—lesions that are infrequently encountered in routine practice and may not be readily recognized by general pathologists. In this context, the development and benchmarking of AI models contribute not only to supporting difficult differential diagnoses but also to expanding diagnostic reliability in uncommon tumors where familiarity is limited. This perspective aligns with the broader goal of enhancing diagnostic consistency through computational tools while acknowledging that the models remain experimental and require further validation before clinical integration.

Transfer learning thus plays a critical role in enhancing feature discrimination, enabling CNNs to capture architectural and cytological nuances that may escape manual interpretation. Such behavior is consistent with previous studies showing that ImageNet pretraining enhances learning efficiency and diagnostic precision in medical imaging tasks involving limited and heterogeneous datasets [13, 14, 34,35,36].

The application of artificial intelligence in histological image classification has grown exponentially, largely due to the transformative impact of convolutional neural networks on medical image analysis [9, 25]. These models can identify microscopic patterns that are often imperceptible to the human eye, such as variations in epithelial organization, nuclear morphology, and stromal characteristics [14, 34,35,36]. In the context of odontogenic tumors, CNNs can automatically analyze regions of interest (ROIs) from digitized slides, detecting subtle differences in epithelial architecture, cell arrangements, and stromal textures that are essential for distinguishing among histological subtypes. This capability not only emulates but can also augment the diagnostic reasoning process of oral and maxillofacial pathologists, allowing large-scale, reproducible, and time-efficient analyses that support clinical decision-making.

Only a few studies have implemented CNNs using histological slides for the diagnosis of odontogenic lesions, and most of them have focused on benign or inflammatory cysts rather than true neoplasms. Florindo et al. [37] applied Bouligand–Minkowski fractal descriptors combined with machine learning to differentiate odontogenic keratocysts (OKCs) from radicular cysts, achieving 98% discrimination accuracy for OKCs versus radicular cysts and 68% for OKC subtypes. More recently, Cai et al. [13] developed an InceptionV3-based model for whole-slide image classification and prognosis of OKCs, distinguishing syndromic from non-syndromic lesions with diagnostic and prognostic AUCs of 0.935 and 0.840, respectively. Similarly, Rao et al. [38] used VGG16 and DenseNet169 to classify cystic lesions, reporting 93% accuracy in OKC versus non-OKC discrimination. Among the few studies addressing neoplastic entities, Giraldo-Roldán et al. [14] classified ameloblastomas and ameloblastic carcinomas using ResNet50, DenseNet, and VGG16, reaching 0.98 accuracy and 0.98 AUC. Additionally, Kim et al. [39] investigated the concordance between clinical and histopathological diagnoses of oral lesions using ChatGPT-4 and a Bayesian model (ORAD), obtaining 41.4% and 45.6% diagnostic concordance, respectively.

Taken together, these studies highlight the growing potential of deep learning in oral pathology but also reveal a gap: most investigations are limited to binary or small-class problems. The present study expands this field by addressing a more challenging multiclass scenario and by analyzing how data variability and model complexity interact under transfer learning, providing methodological evidence for future diagnostic frameworks in oral and maxillofacial pathology.

The ability of deep learning models to achieve multiclass diagnosis is crucial for this clinical application, as odontogenic tumors often share overlapping histological patterns such as palisaded epithelial arrangements, cellular nests, and calcification foci [2, 40]. However, subtle but diagnostically significant differences—such as the degree of cellular atypia, nuclear polarity, and stromal invasion—can be quantitatively captured by CNNs trained on high-quality, well-annotated datasets. Preprocessing strategies such as color normalization and class balancing help mitigate biases caused by staining variability and unequal sample sizes [27, 28], while data augmentation and transfer learning further enhance generalization and prevent overfitting, particularly in rare pathologies with limited case numbers [30]. In this context, the approach adopted in this study demonstrates that even with heterogeneous sample distributions, model robustness can be achieved through a combination of architectural optimization and careful dataset preparation.

The clinical implications of applying artificial intelligence to the diagnosis of odontogenic tumors are substantial. By reducing subjectivity and interobserver variability, AI-assisted systems can increase diagnostic consistency and confidence, particularly in borderline or ambiguous cases. Moreover, the ability to rapidly process large volumes of histological data enables pathologists to focus their expertise on complex or uncertain diagnoses, while automated models handle more routine classifications. Such approaches also hold promise for use in resource-limited settings, where access to specialized oral pathologists is often scarce, thereby contributing to the democratization of accurate and timely diagnosis.

The findings of this study illustrate how artificial intelligence can complement the expertise of oral and maxillofacial pathologists, helping to overcome human limitations in visual interpretation, reduce diagnostic time, and support more precise and patient-centered decision-making. Beyond quantifying model performance, this work represents an incremental step in the ongoing development of AI-based diagnostic frameworks for oral pathology. By documenting and validating methodological advances, it contributes to the accumulation of reproducible knowledge that can guide subsequent studies and improve future diagnostic architectures. Such continuity is essential to sustain collaborative, data-driven progress toward clinically reliable and explainable systems, integrating ensemble strategies, multimodal approaches, and interpretability tools as key components of the next generation of medical AI.

The study is limited by the cohort size (n = 64) due to tumor rarity; however, this was mitigated by a large-scale dataset of 445,107 patches using strict patient-wise splitting to provide variability for feature learning. Although scanned at 20×, accuracy was preserved through high-zoom digital annotation. Class imbalance was addressed via balanced metrics. While full resections and IHC were not available for all cases, diagnoses strictly followed WHO criteria confirmed by expert pathologists. Finally, as the study focused on quantitative architectural benchmarking, qualitative interpretability analyses (e.g., Grad-CAM) were outside the current scope and are reserved for dedicated future investigation.

Although the models demonstrated promising diagnostic performance, this study remains experimental. The approach has not yet been tested in clinical workflows, prospective settings, or external multi-institutional deployments. Therefore, the findings should not be interpreted as evidence of clinical readiness. Instead, they support the feasibility of applying CNN-based methods to odontogenic tumor classification and highlight methodological considerations for future development.

Comments (0)

No login
gif