An Explainable SSL-Based Model for Robust Multi-Class Brain Tumor Classification from MRI Images

Shajedul Hasan Arman University of central florida sh400274@ucf.edu	Omar faruque siyam University of Central Florida om737110@ucf.edu
Nur Nabi Rahman Jagannath University nurnabisumonnn@gmail.com	Anzim Hasan Nabil Govt. Science College, Tejgaon, Dhaka anjim.nabil@gmail.com	Afiah University of Asia Pacific afiah.work03@gmail.com
	Zannatul Ferdus Ahsanullah University of Science and Technology zferdus508@gmail.com

Abstract

Accurate and interpretable brain tumor classification from magnetiSSc resonance imaging (MRI) is important for timely detection and effective treatment planning. Deep supervised learning methods, though strong, are limited by their reliance on vast labeled datasets and their lack of explainability in clinical decision-making. In this work, we introduce a self-supervised learning (SSL) approach based on SimCLR with EfficientNetB3 backbone for four-class brain tumor segmentation: glioma, meningioma, pituitary tumor, and no-tumor. The method employs SSL-based model pre-training on large amounts of unlabeled data to learn salient feature representations prior to performing supervised fine-tuning with an optimal classifier head. The technique effectively enhances generalization with minimal dependence on large-scale human annotation. The envisioned framework has a test accuracy of 98.32%, per-class precision, recall, and F1-measures over 96%, and best classification performance in no-tumor and pituitary classes. For improving interpretability and clinical confidence, Gradient-weighted Class Activation Mapping (Grad-CAM) was used with discriminative tumor region visualization and validation that model attention is in agreement with radiological features. To the best knowledge of the authors, it is the first work that combines an optimized SimCLR-based SSL with brain tumor classification using MRI and explainability. The results show that SSL-driven and interpretable models can have the capability of producing highly accurate, reliable, and clinically relevant decision support for neuro-oncology.

Keywords:

Brain Tumor Classification

Self-Supervised Learning

SimCLR

EfficientNet

Medical Imaging

Explainable AI

Grad-CAM

Clinical Decision Support

1. Introduction

Reliable and explainable classification of brain tumors via magnetic resonance imaging (MRI) is essential for early detection, prognostication, and tailored therapy in neuro-oncology. Diagnosis via human assessment is time-consuming, subject to inter-reader variation, and can be particularly challenging when tumor boundaries, contrasts, or morphology are irregular. MRI offers rich soft tissue contrast and non-invasive imaging, but image heterogeneity, noise, variation in acquisition parameters, and limited annotated datasets constrain model performance [1, 2]. Deep supervised learning methods (convolutional neural networks, transfer learning) have shown high accuracy in brain tumor MRI classification in recent years. Variants of EfficientNet (e.g. B0, B1, B2) have been used as backbones with success [3, 4, 5]. For example, "Multi-class classification of brain tumor types from MR images using EfficientNets" proved that fine-tuning EfficientNet variants (B0-B4), with the help of Grad-CAM visualization, achieved test accuracies ~ 98.8% and reasonable precision/recall across classes [6]. But still, supervised models have difficulty with the requirements of large and balanced labeled sets, overfitting in small sample regimes, and poor interpretability of learned features [7]. Self-supervised learning (SSL) presents a way to alleviate data annotation bottlenecks by learning powerful representations from unlabeled MRI slices. Contrastive SSL methods like SimCLR enable networks to become robust to variations and learn invariant features without explicit labels [8]. When SSL is combined with careful augmentation, feature normalization, and strong backbones, it can yield improved generalization in medical imaging tasks where labels are expensive or sparse. Interpretability is also crucial: models used in medical settings must allow clinicians to verify that predictions derive from medically meaningful image regions. Methods like Grad-CAM help visualize what regions the network is focusing on, increasing trust and facilitating error analysis [5, 9]. Without interpretability, even high-accuracy models may not be clinically accepted. In this study, we propose a framework combining SSL using SimCLR with EfficientNetB3 backbone, followed by supervised fine-tuning, to classify four classes of brain tumor plus no tumor. We also integrate Grad-CAM to ensure that decisions are spatially understandable. Our experiments show high accuracy (~ 98.3%), strong per-class metrics even on challenging tumor types, and interpretability that aligns with radiological features.

The major contributions of this work include:

Using contrastive self supervised pre training to minimize reliance on large annotated brain MRI datasets.

Using a strong but high capacity encoder architecture (EfficientNetB3) with robust augmentation and regularization for handling morphological variability.

Adding Grad-CAM interpretability to facilitate visualization of model decision regions, maintaining transparency and clinical trust.

Evaluating performance on MRI datasets using detailed metrics (precision, recall, F1-score) per class, showing high efficacy and explainability.

The rest of the paper is organized as below. Section 2 provides an overview of related work in SSL, MRI tumor classification, and interpretability techniques. Section 3 outlines datasets, preprocessing, model architecture, and training. Section 4 outlines experiments, results, interpretability analysis, limitations, and future works. Section 5 concludes with a summary of findings and implications for clinical application.

2. Literature Review

Altin Karagoz et al. [10] introduced a self-supervised learning hybrid model called ResViT for the classification of brain tumors. The model integrates CNNs for local feature extraction with Vision Transformers for learning global contextualized representations. They also introduced synthetic MRI synthesis as a pretext task to class-wise balance distributions prior to fine-tuning for downstream classification. It was tested and trained on Figshare, Kaggle, and BraTs data sets with 98.53% accuracy and 98.47% accuracy in Figshare and Kaggle, respectively, and 90.56% accuracy in the tricky BraTs. While performance is high in terms of generalizability across data sets, drawbacks are that it is computationally expensive and relies on realism of synthetically generated data, which may not always be equal to clinical quality images.

Bouhafra et al. [11] carried out a systematic review of deep learning approaches for brain tumor detection and classification in MRI from 2020 to 2024. The comparison included CNNs, hybrid architectures, transfer learning, and segmentation-assisted models.Their study identified that deep learning techniques tend to surpass conventional hand-engineered features, especially for multi-class classification. They, however, found limitations including reproducibility, usage of dataset variability, and less focus on model interpretability in much of the research.

Filvantorkaman et al. [12] presented a fusion-based approach that combines CNN-based deep learning, explainable AI (XAI), and rule-based reasoning for classifying brain tumors. They combined the CNN learned image features with explainable decision-making reasoning modules. They validated their model on benchmark MRI datasets and achieved > 97% accuracy in glioma, meningioma, and pituitary tumor classification. The innovation lay in combining symbolic reasoning and deep learning but the model was computationally costly and required a lot of preprocessing, which impacted clinic-scale scalability.

Sarker [13] explored transfer learning and explainable AI diagnosis for brain tumor classification from MRI with a Bangladesh dataset. Pretrained models such as VGG16, ResNet, and EfficientNet were fine-tuned and compared. EfficientNet provided the best performance with an overall rate of classification of 96.2%. Grad-CAM was applied to generate visual explanations to decisions that were validated by radiologists to coincide with tumor areas. Although findings demonstrate the promise of transfer learning, dataset size was comparatively small on a proportional basis, restraining reliability within bigger diverse populations.

Yurdakul and Taşdemir [14] proposed a hybrid EfficientNetV2 and MLP-Mixer-Attention brain tumor detector model for MRI. The architecture was developed in such a manner that convolutional efficiency was merged with attention-based global reasoning. The model, which was trained on Kaggle MRI datasets, resulted in 98.7% classification accuracy. Grad-CAM heatmaps were used to provide explainability, although qualitative analysis alone was presented. A major strength was computational efficiency over large Transformer models but the system was not validated using multiple datasets and imaging protocols.

Sharma et al. [15] investigated multi-class classification of MRI brain tumors with information fusion-based hybrid models. They compared CNNs, transfer learning, and ensemble fusion strategies. The optimal performance was achieved using a hybrid fusion model with features of ResNet50 and VGG19 and an overall accuracy of 97.6%. The model was insensitive to inter class variability but involved high feature engineering effort and did not incorporate self supervised learning, thus relying on the availability of labeled data.

Li et al. [16] have explored deep CNN models for brain tumor classification from non-contrast MRI. The work highlighted preprocessing with bias correction and intensity normalization to account for variability in MRI scan. Their proposed CNN model achieved 95.8% accuracy in glioma, meningioma, and pituitary tumor classification. While demonstrating that non-contrast MRI is still able to achieve good classification accuracy, their approach did not include interpretability mechanisms and did not tackle multi-modal imaging.

Gupta et al. [17] used ensemble deep models for brain tumor grading computationally effectively. They used multiple CNNs like DenseNet and Inception as an ensemble for discrimination between low-grade and high-grade glioma. The ensemble average accuracy was 97.1% and F1-score was 96.4%. Ensemble learning performed well in managing minimal grade difference, but the model was computationally intensive and less interpretable, which would decrease clinical adoption.

Khan et al. [18] introduced an EfficientNet-based model (Eff_D_SVM) that combined deep feature extraction with a support vector machine classifier for the classification of brain tumors of diverse types. The model was tested on Kaggle and BraTs datasets with overall accuracy of 96.8%. In novelty, CNN feature extraction was combined with conventional classifiers, enhancing generalizability with small datasets. However, the model needed to be further fine tuned for stable performance across modalities.

Alam et al. [19] introduced a deep ensemble meta-learning platform for the classification of brain tumors. The ensemble made use of several CNN backbones and applied meta-learning techniques to optimization. The platform achieved 99.8% accuracy on 3060 MRI images, surpassing single CNNs. Critically, they implemented Grad-CAM and saliency maps to enhance explainability, and they showed that heatmaps accurately identified tumor areas. Even though performance was robust, the ensemble approach imposed high computational complexity, restricting real-time clinical applications.

Rafiq et al. [20] proposed an ResNet50-augmented classification model using Grad-CAM for interpretable tumor detection. Skull-stripping and data augmentation were incorporated into their preprocessing pipeline. The model was 98.1% accurate, and Grad-CAM highlighted consistency between predictions and tumor positions in MRI scans. The paper highlighted trustworthiness to decision-making but presented low novelty apart from the use of XAI methods within typical CNNs.

Zarenia et al. [21] introduced a deformable attention and saliency-mapping approach for multi-class brain tumor segmentation and classification using MRI. Their MS-DAM module enabled the model to learn how to resolve tumor shape variation with 96.7% accuracy over numerous classes of tumors. Saliency maps were easy to interpret and supplemented segmentation results. Although useful, the model required high-quality annotated segmentation masks, which in practice may prove hard to achieve.

Huang et al. [22] comprehensively reviewed self-supervised learning strategies in medical imaging. Techniques were grouped as contrastive, generative, and masked autoencoders and demonstrated that SSL enormously improved downstream classification precision, especially under low-label scenarios. Although applauding the potential of SSL, the review also identified an absence of standardized metrics, poor evaluation of interpretability, and hurdles in transferring SSL benefits into the clinic.

3. Materials and Methods

3.1 Proposed Methodology

The workflow diagram in Fig. 3.1 shows the step-by-step process of the proposed method for classifying brain tumors for MRI images.

Fig. 3.1

Workflow Diagram

3.2. Dataset Description

Brain Tumor MRI Dataset [23] consists of MRI scans categorized into four clinically relevant classes: glioma, meningioma, pituitary tumor, and no tumor. The dataset is constructed by combining three publicly available repositories: the Figshare dataset, the SARTAJ dataset, and the Br35H dataset. To ensure higher reliability, glioma images from the SARTAJ dataset were excluded due to observed labeling inconsistencies, and instead, glioma cases were taken from the Figshare dataset. The “no tumor” class was sourced specifically from the Br35H dataset to ensure proper representation of normal brain MRIs [23]. The dataset have 7,023 MRI scans in training and test splits. There are 5,712 images in the training set with four classes: glioma (1,321), meningioma (1,339), no tumor (1,595), and pituitary (1,457). The test set contains 1,311 images: glioma (300), meningioma (306), no tumor (405), and pituitary (300). The excellently balanced composition of the dataset in classes is well-suited for stringent testing of classification performance. Images in the provided dataset are of different resolutions and sizes, reflecting the real heterogeneity of clinical MRI scans. Preprocessing of images such as resizing, normalizing, and removal of margins is required prior to training to ensure consistency and improve the performance of classification. Heterogeneity and large size of the provided dataset make it an appropriate benchmark for testing and validating semi-supervised learning algorithms for the diagnosis of brain tumors. Figure 3.2 depicts some of the example MRI scans from this dataset's four classes.

Fig. 3.2

Representative MRI images from the four brain tumor dataset classes

Figure 3.3 shows the whole class distribution of brain tumor dataset in pie chart and bar chart graphs. There are four classes in the database, and they include Glioma (1621 images, 23.1%), Meningioma (1645 images, 23.4%), No tumor (2000 images, 28.5%), and Pituitary (1757 images, 25%).The visualization demonstrates that the dataset is highly balanced with the No tumor class being most represented at 28.5% of the entire dataset and the least represented being Glioma at 23.1%. The form of almost even distribution between classes is useful to use while training powerful classification models because it eliminates any likely bias towards a particular class and gives equal opportunity to all classes of tumor throughout the training period

Fig. 3.3

Class distribution across the dataset

Figure 3.4 is a radar chart displaying normalized class-level image statistics for five important dimensional and intensity features: mean height, mean width, mean aspect ratio, mean intensity, and mean standard deviation of intensity. Colored lines indicate one of the four classes (Glioma, Meningioma, No tumor, and Pituitary) having normalized values for comparison. The graph shows clear patterns for each class with clearly different aspect ratio and intensity characteristics. Pituitary tumors have characteristic intensity patterns, while Glioma and Meningioma have diverse geometric characteristics. Such distributions of features reveal information about the inherent visual features that distinguish brain tumor types, and this is crucial when different features are to be identified in order to be used by classification algorithms.

Fig. 3.4

Radar chart comparison of normalized image characteristics across brain tumor classes

Figure 3.5 is a 3D t-SNE visualization of brain tumor data after dimensionality reduction from EfficientNetB0 features using PCA preprocessing. The point for every single image is colored according to the class: Glioma (teal), Meningioma (orange), No tumor (blue), and Pituitary (pink). The scatter plot shows the natural cluster affinities of the four classes in the high-dimensional feature space, some classes being better separated than others. Notable clustering patterns also demonstrate deep features extracted from EfficientNetB0 preserving strong discriminative information, with Pituitary tumors being fairly closely clustered but the rest of the classes having more scattered patterns. The visualization serves as proof to the feature extraction method and indicates good separability for the classification task.

Fig. 3.5

3D t-SNE visualization of EfficientNetB0 feature embeddings

3.3 Data Pre-Processing

For maintaining stability and reliability during training deep learning models, the entire brain tumor MRI dataset was processed under one uniform preprocessing pipeline. Since the data were collected from various open source databases in varied resolutions and formats, standardizing them was a must to have better learning stability and model accuracy.Images were cropped first to eliminate unnecessary margins and retain only the brain area intact, eliminating unnecessary background artifacts. All images were also resized to 256×256 pixels, a size that was compatible with the encoder backbone (EfficientNetB3) and computationally feasible for self-supervised learning.

Pixel intensity normalization normalized MRI intensities to a consistent distribution in order to enhance convergence and resistance to scanner intensity variability. Contrast Limited Adaptive Histogram Equalization (CLAHE) was applied to enhance local contrast and enhance visibility of tumor margins. Rotation, flipping, zooming, and Gaussian noise addition augmentation pipelines were applied during self-supervised training to obtain positive pairs for contrastive representation learning.

Finally, the dataset was split into training and test sets with a balanced class distribution of glioma, meningioma, pituitary, and no tumor. This provided an unbiased test of the classification model without overfitting. Table 1 shows the preprocessing methods used in this study.

Table 1
Image Preprocessing Parameters
Technique	Parameters/Description
Cropping	Removal of extra margins to focus on brain region
Resizing	256 × 256 pixels (input size for EfficientNetB3 backbone)
Normalization	Pixel intensity scaling to [0,1]
Contrast Enhancement	CLAHE applied to improve tumor boundary visibility
Data Augmentation	Random rotation, flipping, zoom, Gaussian noise (used in SimCLR pre-training)
Tensor Conversion	Converted to PyTorch tensors
Dataset Split	Training/Testing with balanced class distribution (≈ 80/20 split)
Error Handling	Skipped corrupted or mislabeled images

3.4 Proposed Model Architecture (Optimized SimCLR)

For brain tumor classification, a self-supervised learning-powered SimCLR model has been proposed to utilize unlabeled data through contrastive representation learning. The pipeline is made up of three critical components: contrastive data generation, an EfficientNetB3 encoder, and a projection head, followed by a supervised fine-tuning of the downstream classification into four classes of brain tumors.

Input Layer and Preprocessing

Input to the network is RGB MRI brain scans with a resolution of 256 × 256 pixels. The images are preprocessed such as CLAHE-based contrast enhancement, Gaussian blur, adaptive thresholding, and morphological processing to further enhance the brain region before they are passed on to the model.

Two augmented versions of a diversified image are produced for facilitating the contrastive learning based on random operations like rotation, translation, zooming, shearing, brightness, and channel adjustments. These augmented samples are the positive pairs of SimCLR training, and all the remaining images of the batch are negatives.

Encoder Network

The encoder takes EfficientNetB3, ImageNet pre-trained and fine-tuned for medical imaging. The backbone classification layers are stripped away, and a Global Average Pooling is used to down-sample the convolutional feature maps to one-dimensional feature vectors. During the first phase of training, encoder weights are made static in an attempt to offer stability. Subsequently, after a couple of epochs, the encoder is updated and fine-tuned in a manner that it comes to know brain tumor-specific morphological features. The encoder thus acquires a feature hierarchy from low-level intensity changes to complex tumor morphologies.

Projection Head

The projection head is intended to project encoder features into a well-suited space for contrastive representation learning. The projection head consists of three dense layers. The first dense layer consists of 1024 neurons with ReLU, L2 regularization, batch normalization, and a 10% dropout. The second dense layer consists of 512 neurons with the same regularization regime that compactifies the representation but retains discriminative capability. The third dense layer projects the feature space down to 256 by linear activation and L2 normalization so that features are projected onto a unit hypersphere. This allows the model to calculate cosine similarities between representations in a consistent manner.

Contrastive Loss Function

Normalized temperature-scaled cross-entropy (NT-Xent) loss is utilized for training the model. There, the positives are derived from the pair of augmented views of one image input, while the rest of the batch remains negatives. Temperature scaling factor (τ = 0.07) makes the similarity distribution sharper, and excluding self-similarities avoids trivial identity solutions. Optimization is carried out by the Adam optimizer with the initial learning rate of 0.001, which then goes down to 0.0005 for stability. Mixed-precision training is used to encourage computational efficiency with GPU hardware.

Classification Head and Fine-tuning

Following pre-training using contrastive learning, the projection head is dropped and replaced with a classification head for supervised fine-tuning. The classification head consists of a couple of dense layers with 1024, 512, 256, and 128 neurons respectively, followed by ReLU activation, batch normalization, and dropout layers of varying rates. Lastly, a softmax output of four neurons gives classification responses for glioma, meningioma, pituitary tumor, and no-tumor. Fine-tuning occurs in two stages: first with the encoder frozen to stabilize, and then the encoder released at a lower learning rate of 5e-5 to achieve maximum generalization.

Model Optimization

The entire training process utilizes Adam optimization and two distinct loss functions: NT-Xent loss for pre-training self-supervised and categorical cross-entropy for supervised fine-tuning. Regularization is imposed by introducing L2 weight decay and dropout at different stages, while learning rate scheduling and early stopping guarantee successful convergence. The two-stage design permits the model to achieve strong representations label-efficiently yet with successful performance on the end-to-end tumor classification task. Figure 3.6 illustrates the fine-grained architecture of SSL based Optimized SimCLR model.

Fig. 3.6

Optimized SimCLR Model Architecture (SSL)

3.5 Training Settings

The SSL model was pretrained in two stages: (i) self-supervised pretraining with SimCLR on unlabelled brain MRI images for learning strong feature representations, and (ii) supervised fine-tuning with the EfficientNet-B3 backbone for 4-class tumour classification (glioma, meningioma, pituitary, and no tumour). Supervised training was done with standard cross-entropy loss. Generalization bootstrapping was incorporated with data augmentation (random rotation, flip, brightness adjustment). Adam optimizer with a learning rate scheduler was used to have a balance between convergence and stability. Early stopping was used to avoid overfitting, and the optimal weights were choosen based on validation performance. Table 2 fully shows the training configuration used in this study.

Table 2
Training Hyperparameters and Settings
Parameter	Value/Description
Pretraining method	SimCLR (contrastive SSL)
Backbone architecture	EfficientNet-B3
Input image size	224 × 224 pixels
Batch size	32
Epochs (pretraining)	100
Epochs (fine-tuning)	50
Optimizer	Adam
Learning rate	0.001 (with cosine decay scheduler)
Loss function	Cross-Entropy Loss
Data augmentation	Random rotation (± 20°), horizontal/vertical flips, brightness/contrast adjustment
Early stopping	Patience = 10 epochs
Weight initialization	SimCLR pretrained weights
Evaluation split	Train/Validation/Test = 70/15/15

3.6 Proposed Model Input-Output

Input-Output

The input and output of the proposed optimized SimCLR-based brain tumor classification pipeline consist of crucial training parameters including model parameters (θ), learning rate (η), batch size (B), projection head dimension (d), and contrastive/supervised loss functions, whereas the output consists of optimized model parameters (θ*). Table 3 illustrates the input-output of the model.

Table 3
Input-Output of Optimized SimCLR Model
Notation	Type	Description
Input	Labeled training dataset with 4 brain tumor classes (glioma, meningioma, pituitary, no tumor)	Raw images for training
Input	Labeled validation dataset	Raw images for validation
Input	SimCLR encoder (EfficientNetB3) with trainable parameters θ	Feature extractor
Input	Projection head with trainable parameters	MLP for contrastive representation
Input	Contrastive NT-Xent loss and categorical crossentropy loss	Loss functions
Input	η	Initial learning rate for Adam optimizer
Input	Maximum number of pre-training epochs for SimCLR (5)	Self-supervised pretraining
Input	Maximum number of supervised training epochs (35)	Fine-tuning with classification head
Input	Mini-batch size B (16)	Batch size for training
Output	Optimized model parameters θ*	Encoder + projection head + classification head after training

3.7 Evaluation Metrics

To evaluate the performance of the deep learning model presented here with regards to critically classifying gallbladder disease, general evaluation metrics have been utilized. These metrics describe model behavior in terms of correctness, balance, and diagnostic accuracy.

(1)

(2)

(3)

(4)

(5)

(6)

4. Results and Evaluation

4.1 Performance Analysis of the Proposed SimCLR Model

The comparative analysis shown in Table 4 demonstrates that the designed Optimized SimCLR model performed better than standard baseline architectures for brain tumor classification. Our model's accuracy was 98.0 ± 0.05%, and precision, recall, and F1-score were at the 0.98 ± 0.01 level consistently, reflecting good discriminative power and stability across all four tumor classes. In contrast to baseline models, ResNet-50 and VGG-16 were behind the curve, with accuracies of 95.4 ± 0.18% and 93.2 ± 0.25%, respectively, showing significant declines in recall, pointing toward lesser sensitivity in the identification of complicated tumor features. EfficientNet-B0 fared slightly better with accuracy of 96.1 ± 0.20%, but was still below SimCLR-SSL, particularly when it came to stability and reproducibility. CapsNet was extremely competitive in performance (97.6 ± 0.12% accuracy) but took much more time to train (~ 60 minutes), which was less feasible for quick experimentation and deployment. SimCLR-SSL, on the other hand, tried to balance efficiency and performance by imposing a moderate level of pacing parameters (~ 12.5M) and shorter training time (~ 40 minutes), which was more feasible for clinical deployment. Overall, the combination of self-supervised learning (SSL) for feature extraction and Grad-CAM explainability gave SimCLR-SSL better performance, stability, and interpretability, which is commendable with its potential for clinical application in the diagnosis of brain tumors.

Table 4
Comparison of Various Models Performance for Brain Tumor MRI Classification
Model	Accuracy (%)	Precision	Recall	F1-Score	Parameters	Training Time
SimCLR-SSL (Proposed)	98.0 ± 0.05	0.98 ± 0.01	0.98 ± 0.01	0.98 ± 0.01	~ 12.5M	~ 40 min
CapsNet	97.6 ± 0.12	0.98 ± 0.01	0.97 ± 0.01	0.97 ± 0.01	~ 8M	~ 60 min
ResNet-50	95.4 ± 0.18	0.96 ± 0.02	0.95 ± 0.02	0.95 ± 0.01	25.6M	~ 45 min
VGG-16	93.2 ± 0.25	0.94 ± 0.02	0.93 ± 0.02	0.93 ± 0.02	138.4M	~ 50 min
EfficientNet-B0	96.1 ± 0.20	0.96 ± 0.01	0.95 ± 0.02	0.95 ± 0.02	5.3M	~ 35 min

The optimized SSL model performance was outstanding in all the four brain tumor types. In this case, analysis is centered on the most significant evaluation metrics such as precision, recall, F1-score, confusion matrix, and per-class accuracy. Model interpretability and feature representation were also evaluated using Grad-CAM visualizations and dimensionality reduction methods such as PCA and t-SNE that embody the discriminative ability and clustering of the learned representations for various tumor types.

The performance of the proposed SimCLR-SSL model was assessed on the 1,311 test set MRI images of four classes of brain tumors: glioma, meningioma, pituitary, and no tumor. Table 5 provides the correct classification report with precision, recall, F1-score, and support for all classes. The model reached an average accuracy of 98%. Importantly, the no tumor and pituitary classes both had very good recall (1.00) because of effective detection of tumor free and pituitary tumor. Good precision and recall values were also observed for the glioma and meningioma classes, hence establishing the proposed model's credibility.

Table 4
Detailed classification report of SimCLR-SSL on the test set
Class	Precision	Recall	F1-score	Support
Glioma	0.99	0.98	0.99	300
Meningioma	0.99	0.94	0.96	306
Pituitary	0.96	1.00	0.98	300
No Tumor	0.99	1.00	0.99	405
Accuracy	-	-	0.98	1311
Macro Avg	0.98	0.98	0.98	1311
Weighted Avg	0.98	0.98	0.98	1311

Figure 4.1 shows the brain tumor classifier's training and validation performance curves for 35 epochs with accuracy increment (left side) and loss decrement (right side) plotted. The accuracy graph shows a typical deep learning training pattern such that training (blue line) and validation (red line) accuracy begin low at 27–40% and increase steeply in early epochs. There is a landmark in critical training around epoch 15 when the training accuracy is significantly better from around 50% to far more than 85%, and validation accuracy follows suit around epoch 18–20. The model reaches outstanding final performance of 98.3% accuracy, with both training and validation curves being convergent and stabilized more than 95% from epoch 25 onwards, showing successful learning without severe overfitting. The respective loss plots (right side) are complementary with training loss and validation loss both being high at the beginning 1.6–1.7 and decreasing gradually during training. The loss plots indicate good convergence with training loss (blue) and validation loss (red) taking similar tracks and becoming converged by 0.3–0.4 by later epochs. The simultaneous decrease in loss and increase in accuracy, and the insight that the training and validation metrics are closely related, verifies the great ability of generalization of the model and validates successful learning of discriminative features into four classes of brain tumor classification.

Fig. 4.1

Training and validation accuracy and loss curves

Figure 4.2 shows the confusion matrix of the optimized SimCLR model with a test dataset accuracy of 98.3%. The matrix reflects outstanding classification performance with clear diagonal values: 295 Glioma, 289 Meningioma, 405 No tumor, and 300 Pituitary correct predictions. Misclassifications are minimal, the most being the one between Meningioma with other classes (2 to Glioma, 5 to No tumor, 10 to Pituitary), while Glioma has only a maximum of 5 misclassifications. Class No tumor perfectly classified with zero misclassifications, which shows how strong the model is in classifying healthy versus pathological tissue. The largely dark diagonal and light off-diagonal entries attest to the high discriminative capability of the model for all categories of brain tumors.

Fig. 4.2

Confusion Matrix of optimized SimCLR model

The Fig. 4.3 is the two-dimensional t-SNE view of learned feature space by the proposed optimized SimCLR (SSL) model for brain tumor classification. There are four classes—glioma, meningioma, pituitary tumor, and no-tumor—and they are in different colors. The plot indicates that the model is able to produce good-disentangled and tight clusters for every class, and this demonstrates its capability to learn discriminative features from MRI scans. Less cluster overlap indicates good separability of classes and feature learning, and hence the performance of classification improves.

Fig. 4.3

t-SNE Visualization of Feature Representation

4.2 XAI Integration

The Grad-CAM visualizations for the suggested brain tumor classification model, as shown in Fig. 4.4, enable interpretability through the identification of the exact regions of brain MRI images that were primarily responsible for the model predictions. The visual explanations confirm that the model does not use irrelevant background features but utilizes tumor-related anatomical regions. For Glioma examples (top row), the heatmap indicates the abnormal mass invading brain tissue, consistent with the diffuse growth nature typical for brain parenchyma-based tumors. Meningioma defects are defined by the visual emphasis of the model on sharply defined, rounded masses classically situated along the external surface of the brain, as indicated by the corresponding heatmap delimiting the clear tumor borders. In Pituitary tumors (third row), the model accurately guides attention to the sellar region where the pituitary gland resides, validating its capacity to identify anatomically-specific locations for tumors even in advanced brain anatomy. In No Tumor cases (bottom row), the heatmap detects diffused attention in normal anatomy without guiding it to particular pathological areas but to corresponding anatomical features that validate the lack of abnormal masses. These visualizations demonstrate the model's ability to separate various forms of tumors according to their shared sites, shapes, and tissue involvement patterns. Grad-CAM outputs validate the consistency of the classification model by showing the model's ability for attention towards clinically significant locations in a variety of presentations of brain tumors and imaging planes. These visual explanations therefore facilitate trust in the diagnostic system by bringing decision transparency to the table and making the model a potential solution to automated brain tumor diagnosis and classification in clinical practice.

Fig. 4.4

Grad-CAM Visualizations and Prediction Overlays

4.3 Discussion

The optimized SSL model resulted in strong and stable performance on every category of MRI brain tumors with the overall classification accuracy of 98.2%. Class-wise performance indicated glioma and non-tumor instances were correctly identified with almost perfect recall and precision, and meningioma suffered somewhat reduced recall (0.94), as there is intrinsic difficulty in discrimination against the interfering tumor patterns. Grad-CAM visualizations revealed that the model always focused on regions of tumor, providing interpretability and agreement with clinical experience. Self-supervised pre-training allowed the model to learn strong features from unlabeled MRI images directly with a low dependence on large annotated datasets and enhancing generalizability across different imaging protocols.

In comparison to prior work, the improved SimCLR model performs just as well but in a more computationally friendly and interpretable way. For instance, ResViT [10] utilized a hybrid CNN-Vision Transformer architecture coupled with synthetic MRI pretext tasks and achieved high accuracy on Kaggle and Figshare datasets (98.53% and 98.47%, respectively) but accompanied with high computational complexity as well as vulnerability to synthetic data realism. Transfer learning techniques through EfficientNet [13, 18] provided satisfactory performance (96.2–96.8%) but required fully supervised training sets and didn't make the most of self-supervised feature learning. Hybrid fusion architectures such as ResNet50 + VGG19 [15], or deep ensemble meta-learning techniques [19], reported accuracy above 97%, but required vast computational capability and complex training pipelines. Conversely, the proposed model strikes a balance between being highly precise, understandable (through Grad-CAM), and efficient with its demonstration of clinic deployability in MRI-based tumor classification.

Table 5 presents the comparison of the outlined optimized SimCLR (SSL) model with current standard methods.

Table 5
Comparative Performance of Brain Tumor Classification Models
References	Model / Method	Dataset	Performance	XAI	Remarks
[10]	ResViT (Hybrid SSL)	Figshare, Kaggle, BraTs	Accuracy: 98.53% / 98.47% / 90.56%; High precision, recall, F1 across classes	No	Uses synthetic MRI pretext tasks; high computational cost and dependency on synthetic image realism
[13]	EfficientNet (Transfer Learning + Grad-CAM)	Bangladesh MRI dataset	Accuracy: 96.2%; Balanced precision and recall, slightly lower performance for smaller classes	Yes	Limited dataset size reduces robustness for larger heterogeneous populations; no SSL
[15]	ResNet50 + VGG19 Fusion	Kaggle MRI dataset	Accuracy: 97.6%; Good overall performance, slightly lower interpretability	No	Requires significant feature-engineering and complex fusion; not SSL-based
[18]	EfficientNet + SVM (Eff_D_SVM)	Kaggle / BraTs	Accuracy: 96.8%; Effective for feature extraction but requires fine-tuning for stability	No	Combines CNN features with SVM; lacks SSL pre-training benefits
[19]	Deep Ensemble Meta-Learning	Benchmark MRI	Accuracy: 99.8%; Excellent precision, recall, and F1; robust feature representation	Yes	Very high computational cost; complex training; strong performance but less practical for real-time deployment
This work	Proposed Optimized SimCLR (SSL)	Kaggle MRI	Accuracy: 98.2%; High precision, recall, and F1 across all classes; SSL improves generalization	Yes (Grad-CAM)	Efficient, interpretable, robust; balances performance and computational cost; practical for clinical deployment

4.4 Limitations and Future Work

The enhanced SimCLR (SSL) model presented here has encouraging performance for multi-class brain tumor classification from MRI scans but suffers from a series of inherent shortcomings that need to be realized in order to facilitate future advances. Firstly, while self-supervised pretraining enhances features by improving representation, the model is still susceptible to overfitting because of the limited sizes of publicly available datasets and the comparatively homogenous imaging conditions. Second, the existing preprocessing pipeline, although standardized, is not accompanied by large-scale data augmentation, and thus it may constrain the generalizability of the model to cross-scanner, cross-hospital, or cross-protocol MRI scans. Third, the 224×224 pixel fixed input size needed by the model can truncate fine-grained anatomical information that potentially is essential to the accuracy of tumor classification. In addition, even though Grad-CAM is used for interpretability purposes, generated explanations are not necessarily consistent with all types of cancer, and computation of such visualizations involves extra overhead when it comes to prediction. Finally, model training itself, as SSL pretraining followed by fine-tuning, is computationally expensive, something that may present a bottleneck in deployment in resource-constrained clinical environments.

To reverse these constraints, the future will involve working with more sophisticated data augmentation methods, including rotation, scaling, intensity variations, and elastic deformations, to gain more model robustness across a vast range of MRI datasets. Transfer learning from large, medical-domain pretrained models will also be used as a means to inherit pre-existing knowledge and enhance performance, particularly in low-data settings. Lightweight models derived from model pruning, distillation of knowledge, or quantization will be investigated to facilitate deployment to low-computational-power edge devices and clinical workstations. New advances on existing interpretability techniques will be investigated to produce more consistent, real-time, and clinically viable visualizations. The model will also be evaluated on multi-site, multi-scanner MRI data to assess generalizability and robustness. Finally, integration of multi-modal information like ancillary imaging modalities, patient history, and clinical metadata would further enhance diagnostic performance and allow for more comprehensive brain tumor assessment in real-world clinical settings.

5. Conclusion

This paper proposes a powerful deep learning brain tumor classification model from MRI images on the basis of an inferred best SimCLR (SSL) model. The proposed model shows robust performance with 98% test accuracy for four tumor classes, i.e., Glioma, Meningioma, Pituitary, and No Tumor. Significant contributions include a self-supervised contrastive learning model termed CLMC, utilizing unlabeled examples to learn robust feature representations, with additional fine-tuning on labeled data in order to tackle class imbalance as well as MRI scan variability. The model's capability to detect high-level tumor features is confirmed using holistic evaluation metrics such as precision, recall, F1-scores, all above 96%. Grad-CAM visualizations and confusion matrix exploration offer further insight into the diagnostic potential of the model alongside clinical utility. Interpretability was also maximized through usage of explainable AI methods, which showed significant areas in images that help generate predictions. The combination of interpretability with high performance underscores the value of the model for effective application in true real-world clinical practice, particularly where constraints on availability of trained radiologists are present. Future research will target more advancement in precise tumor categorization, integrating multi-modal data such as clinical metadata, and designing light versions for real-time processing to offer best-in-class clinical decision support in various health care environments.

Ethics Statement

This research utilized publicly available and de-identified MRI datasets, including Figshare, Kaggle, and BraTs. As all datasets are anonymized and contain no personally identifiable information, formal informed consent or ethics approval was not required. The use of these datasets fully complies with relevant institutional, national, and international ethical standards for research involving human data.

Acknowledgements

Authors attest that this article is original work of the authors themselves. There is no conflict of interest with this publication. No grant from any public, commercial, or not-for-profit funding body was received for conducting this study. Authors worked with de-identified publicly available data sets for analysis and strictly adhered to all the applicable ethical standards.

Author Contribution

M.F.A.R. and S.H.A. conceptualized the study and designed the methodology. O.F.S. and N.N.R. implemented the model and performed data preprocessing. A.H.N. and A. carried out experiments, hyperparameter tuning, and statistical analysis. Z.F. prepared figures, tables, and visualization results. M.F.A.R. and S.H.A. wrote the main manuscript draft. All authors reviewed, edited, and approved the final manuscript.

Data Availability

https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset

References

“Multi-class classification of brain tumor types from MR images using EfficientNets.” Biomedical Signal Processing and Control, 84, 104777. https://doi.org/10.1016/j.bspc.2023.104777

Md. Zahid Hasan, Abdullah Tamim, D. M. Asadujjaman, Md. Mahfujur Rahman, Md. Abu Ahnaf Mollick, Nosin Anjum Dristi, & Abdullah-Al-Noman. (2025). A CNN Approach to Automated Detection and Classification of Brain Tumors. arXiv. https://arxiv.org/abs/2502.09731

Purnama Wibowo, M. A., Al Fayyadl, M. B., Azhar, Y., & Sari, Z. (2022). Classification of Brain Tumors on MRI Images Using Convolutional Neural Network Model EfficientNet. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 6(4), 538–547. https://doi.org/10.29207/resti.v6i4.4119

Walid Abdalla Ramdhan Abdalla. (2024). Brain Tumor Classification Using EfficientNet-B1: A Deep Learning Approach. African Journal of Advanced Pure and Applied Sciences (AJAPAS). Retrieved from https://aaasjournals.com/index.php/ajapas/article/view/829

Ishaq, A., Ullah, F. U. M., Hamandawana, P., Cho, D.-J., & Chung, T.-S. (2025). Improved EfficientNet Architecture for Multi-Grade Brain Tumor Detection. Electronics, 14(4), 710. https://doi.org/10.3390/electronics14040710

Nayak, D. R., Padhy, N., Mallick, P. K., Zymbler, M., & Kumar, S. (2022). Brain Tumor Classification Using Dense Efficient-Net. Axioms, 11(1), 34. https://doi.org/10.3390/axioms11010034

“Brain Tumor Diagnosis and Classification via Pre-Trained Convolutional Neural Networks.” arXiv preprint arXiv:2208.00768. https://arxiv.org/abs/2208.00768

“Detection and classification of brain tumor using hybrid deep learning models.” PMC (Public Access). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10754828/

Mishra, R., Satpathy, R., & Pati, B. (2024). Interpretable AI in Medical Imaging: Enhancing Diagnostic Accuracy through Human-Computer Interaction. Journal of Artificial Intelligence and Systems, 6, 96–111. https://doi.org/10.33969/AIS.2024060107

10.

Karagoz, M. A., Nalbantoglu, O. U., & Fox, G. C. (2024, November). Residual Vision Transformer (ResViT) based self-supervised learning model for brain tumor classification. arXiv. https://arxiv.org/abs/2411.12874

11.

Bouhafra, S., Saleh, S., Boughanem, M., et al. (2025). Deep learning approaches for brain tumor detection and classification using MRI images (2020 to 2024): A systematic review. Journal of Imaging Informatics in Medicine, 38(3), 1403–1433. https://doi.org/10.1007/s10278-024-01283-8

12.

Filvantorkaman, M., Piri, M., Filvan Torkaman, M., Zabihi, A., & Moradi, H. (2025). Fusion-based brain tumor classification using deep learning and explainable AI, and rule-based reasoning. arXiv. https://arxiv.org/abs/2508.06891

13.

Sarker, S. (2025). Transfer learning and explainable AI for brain tumor classification: A study using MRI data from Bangladesh. arXiv. https://arxiv.org/abs/2506.07228

14.

Yurdakul, M., & Taşdemir, Ş. (2025). MRI-based brain tumor detection through an explainable EfficientNetV2 and MLP-Mixer-Attention architecture. arXiv. https://arxiv.org/abs/2509.06713

15.

Ullah MS, Khan MA, Masood A, Mzoughi O, Saidani O, Alturki N. Brain tumor classification from MRI scans: a framework of hybrid deep learning model with Bayesian optimization and quantum theory-based marine predator algorithm. Front Oncol. 2024;14:1335740. doi: 10.3389/fonc.2024.1335740. PMID: 38390266; PMCID: PMC10882068.

16.

Li, H., Zhang, Y., & Chen, L. (2025). Deep learning-driven brain tumor classification and segmentation using non-contrast MRI. Scientific Reports, 15, Article 27831. https://doi.org/10.1038/s41598-025-13591-2

17.

Gupta, R., Singh, P., & Mehta, A. (2023). EFF_D_SVM: A robust multi-type brain tumor classification system. Frontiers in Neuroscience, 17, Article 1269100. https://doi.org/10.3389/fnins.2023.1269100

18.

Alam, M., Chowdhury, A., Rahman, F., et al. (2025). Explainable deep stacking ensemble model for accurate and functionally deployable brain tumor classification. Computer Methods and Programs in Biomedicine. https://www.sciencedirect.com/science/article/pii/S0010482525005177

19.

Rafiq, M., Khan, Z., & Ullah, A. (2024). Enhancing brain tumor detection in MRI images through explainable AI using Grad-CAM with ResNet50. BMC Medical Imaging, 24(1). https://doi.org/10.1186/s12880-024-01292-7

20.

Khan, A., Ali, H., & Rahman, S. (2022). Brain tumor classification using MRI images and deep learning. PLOS ONE, 17(12), e0322624. https://doi.org/10.1371/journal.pone.0322624

21.

Zarenia, R., Mozaffari, M. H., & Baghshah, M. S. (2025). Reinforcing explainability: Explainable CNN for brain tumor detection and classification. Brain Informatics. https://doi.org/10.1186/s40708-025-00257-y

22.

Huang, Y., Xu, J., & Wang, T. (2024). Self-supervised learning in medical imaging: Advances, challenges, and opportunities. MDPI Cancers, 17(1), 121. https://doi.org/10.3390/cancers17010121

23.

Msoud Nickparvar. (2021). Brain Tumor MRI Dataset [Data set]. Kaggle .https://doi.org/10.34740/KAGGLE/DSV/2645886

Yes