Corresponding author: Nouran Reda Ragab
City, Country: Mansoura, Egypt
Abstract
Generative AI is significantly transforming medical imaging by helping to overcome major obstacles such as data scarcity, the high cost of image annotation, and privacy concerns. This research explores the utilization of Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) for tasks including image generation, denoising, scan reconstruction, and the segmentation of anatomical structures. Using the publicly available BRATS2020 dataset, we implemented models such as GANs, VAEs, and the U-Net architecture. Performance was evaluated using metrics including the Structural Similarity Index (SSIM), the Dice Coefficient, and the Fréchet Inception Distance (FID). The results demonstrated a significant improvement in image quality and segmentation accuracy. A primary application was the automation of brain tumor segmentation from MRI scans, a task that is traditionally time-consuming, labor-intensive, and subject to inter-observer variability. This research presents an automated method using a U-Net convolutional neural network, which is specifically designed for biomedical image segmentation. The process involved data preparation, implementation of the U-Net model, and a thorough training phase. The model's performance was assessed using the Dice Similarity Coefficient (DSC) and Intersection over Union (IoU). The U-Net model achieved promising results in accurately segmenting brain tumor regions, demonstrating high potential as an effective tool for automated tumor outlining. On the BRATS2020 test data, the model achieved an accuracy of 99.18%, indicating high efficiency and superior performance compared to current state-of-the-art research. This work contributes to the growing field of deep learning in medical imaging by providing a detailed framework and reproducible results for brain tumor segmentation.
Keywords:
Brain tumor
generative AI
deep learning
U-Net architecture
data segmentation
1.Introduction
Overview of Generative AI in Medical Imaging,The advent of generative artificial intelligence (AI) has significantly transformed the field of medical imaging [1]. As a subfield of AI focused on creating novel content, generative AI has unlocked a wide range of possibilities for healthcare. It provides medical professionals with powerful tools to enhance diagnostic accuracy, develop personalized treatment plans, and ultimately improve patient outcomes [2]. By leveraging sophisticated deep-learning algorithms, generative AI has fundamentally changed how medical images are analyzed, interpreted, and utilized in clinical settings [3]. Generative AI encompasses techniques that enable machines to analyze existing data and generate new, valuable content [4]. In medical imaging, models such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) have attracted considerable interest [5]. GANs operate using two competing neural networks: a generator that creates synthetic images and a discriminator that distinguishes them from real images [6]. In contrast, VAEs learn a compressed, low-dimensional representation of input data, allowing them to generate new images by sampling from this learned latent space. Despite these notable advancements, traditional methods still grapple with challenges such as data scarcity, high annotation costs [7], privacy constraints, and algorithmic bias. The primary goals of this study are to: Develop a generative AI system for segmenting, enhancing, and synthesizing medical images [8], Evaluate the model's performance through qualitative assessments by physicians and quantitative metrics, including the Dice coefficient and Fréchet Inception Distance (FID), Establish ethical guidelines for the application of generative AI in clinical environments. Brain tumors represent a serious global health challenge, with some types being highly aggressive and difficult to treat [9]. Fast and accurate diagnosis is critical for determining the most effective treatment strategies [10]. Magnetic Resonance Imaging (MRI) is essential in this process, providing detailed, non-invasive views of the brain. By combining different MRI sequences—such as T1, contrast-enhanced T1 (T1CE), T2, and FLAIR—clinicians can obtain a comprehensive picture of the tumor, including its core, active regions, and surrounding edema. However, manual segmentation of these tumors is a slow, labor-intensive process that suffers from inter-observer variability, where even experts may disagree on boundaries. This inconsistency can directly impact treatment planning. Consequently, there is a growing need for automated tools that can perform rapid and reliable segmentation to minimize human error and save valuable time.Deep learning has emerged as a transformative technology in medical imaging. Among its many successful architectures, the U-Net model is particularly prominent for segmentation tasks. Originally designed for biomedical image analysis, its encoder-decoder structure with skip connections effectively combines contextual and localization information, enabling it to identify complex structures even with limited training data.To advance this field, competitions like the Brain Tumor Segmentation (BraTS) challenge provide standardized datasets of MRI scans with expert-annotated tumor labels. This study utilizes the BraTS2020 dataset, a well-established benchmark, to train and evaluate a U-Net-based model for brain tumor segmentation. The dataset includes four MRI sequences for each patient: T1-weighted, T1 contrast-enhanced (T1CE), T2-weighted, and FLAIR. Example images from the dataset are shown in Fig. 1 [11].
In this paper, we walk through our entire process: from preparing the MRI data to building, training, and evaluating our model. goal? To create a tool that accurately maps brain tumours, helping doctors diagnose and treat patients more effectively. We’ve kept the method clear and reproducible, showing just how powerful U-Net can be for this task.
2. Related Work
The Multimodal Brain Tumor Image Segmentation Benchmark (BraTS) [12] This seminal paper presents the BraTS benchmark, a foundational dataset and challenge for brain tumor segmentation. It outlines the dataset's key features, the methodology for performance evaluation, and the inherent difficulties in segmenting tumors from multimodal MRI scans. Over the years, the BraTS challenge has been instrumental in driving progress by providing a standardized platform for comparing algorithms. Key Benefits: Standardized Dataset: Serves as a common reference for developing and validating brain tumor segmentation methods. Multimodal Data: Includes multiple MRI sequences (T1, T1CE, T2, FLAIR), providing richer information for accurate segmentation. Expert-Validated Annotations: The ground truth segmentations are meticulously labeled by neuroradiologists, ensuring high reliability. Community-Driven Progress: Fosters collaboration and competition, accelerating innovation in brain tumor analysis. Limitations & Challenges: Class Imbalance: Tumor regions are often much smaller than healthy tissue, which can impede algorithmic learning. Annotation Variability: Despite expert labeling, slight inter-observer variability can exist. Benchmark Focus: The paper's primary contribution is the introduction of the benchmark itself, not a novel segmentation technique.
U-Net: Convolutional Networks for Biomedical Image Segmentation [13] This is the original paper that introduced U-Net, a groundbreaking convolutional neural network designed for biomedical image segmentation. The big innovation? Its unique U-shaped structure with skip connections merges fine-grained details from the early layers with broader context from deeper layers. This combination allows for sharp, precise segmentation—even when training data is limited—thanks to the heavy use of smart data augmentation. Advantages:Built for Medical Imaging: Tailored for biomedical tasks, it delivers strong results even with small datasets. Pinpoint Accuracy: Skip connections help retain spatial details, making segmentation boundaries crisper.Data-Efficient: Thanks to aggressive augmentation, it works well without needing massive amounts of labeled data. End-to-End Segmentation: Outputs a full pixel-wise segmentation map in one go. Drawbacks & Challenges: 2D Limitation: The original version works on 2D slices, which isn’t ideal for 3D medical scans (though the later 3D U-Net fixes this). Heavy on Compute: Can be resource-hungry, especially for high-resolution images or 3D versions. Tuning Required: Needs careful tweaking of settings to perform its best on different datasets. How It Connects to Our Project: Same Core Design: Our project explicitly uses U-Net as its backbone, proving just how foundational this architecture is. Customized for Brain Tumors: We adapt U-Net for BraTS data, showing how flexible it is for real-world medical tasks.Technical Tweaks: We delve into specifics (like TensorFlow/Keras setup, layer choices, and loss functions) while staying true to U-Net’s core idea. Original Results: The paper showed U-Net surpassing previous benchmarks on tasks like segmenting neuron structures from electron microscopy and living cells from microscopy—proving it’s a game-changer for precise biomedical segmentation.
nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation [14]nnU-Net: The Self-Driving U-Net for Medical Imaging What Makes nnU-Net Special? Imagine a U-Net that automatically sets itself up for any medical segmentation task—no tedious tweaking required. That’s nnU-Net. It’s like an AI that configures its own architecture, preprocesses data smartly, and even picks the best training tricks, all while topping leaderboards across different medical imaging challenges. Advantages: No More Guesswork: Hate tuning hyperparameters? nnU-Net does it for you, making it both beginner-friendly and battle-te sted.Benchmark Slayer: It’s not just good; it’s often the best, topping competitions left and right. One-Size-Fits-All: Works magic on MRIs, CT scans—you name it—whether it’s lungs, tumors, or tiny cells. Replicable Science: No more "works on my machine" issues; it standardizes the whole pipeline. But It’s Not Perfect…Heavy GPU Diet: All that automation comes at a cost: you’ll need serious compute power. Mystery Box: It works amazingly, but sometimes even experts can’t fully explain why. Steep Learning Curve: "Automated" doesn’t mean "effortless"—you still need to know what you’re doing. How It Relates to Our Brain Tumor Project: The Gold Standard: If we want to see how competitive our model is, nnU-Net’s BraTS performance is the benchmark to beat. Specialist vs. Generalist: We’re focused solely on brain tumors, but nnU-Net’s principles (smart preprocessing, ensembles) are still valuable. Track Record: nnU-Net isn’t just good—it’s dominant. It’s won or placed top in countless medical imaging contests, often outperforming custom-built solutions. Proof that a well-oiled, auto-tuned U-Net can be unstoppable.
Brain Tumor Segmentation Using Convolutional Neural Networks in MRI Images [15]A Deep Dive into CNN-Based Brain Tumor Segmentation. This paper tackles brain tumor segmentation using CNNs with small 3x3 kernels—a clever way to build deeper networks that learn richer, more complex features. The authors also stress two key ingredients for success: intensity normalization (to make MRI scans consistent) and data augmentation (to keep the model from overfitting). Advantages: Small Kernels, Big Depth: Tiny 3x3 filters allow them to stack more layers, capturing fine details and coarse patterns better than shallow nets. Normalization = Stability: By standardizing MRI intensities, the model works reliably across different scanners and protocols. Augmentation for Generalization: Flipping, rotating, and tweaking the training data helps the model handle real-world variability. Solid BraTS Results: Proved CNNs could hold their own in brain tumor segmentation, paving the way for later advances. Lacks/Limitations: Stuck in 2D: Like our project, it slices 3D MRI volumes into 2D images, missing some 3D context. Heavy on Compute: More layers mean more GPU hours, which isn’t always practical. Simple Architecture: It’s a straightforward CNN—no fancy skip connections or multi-scale tricks like U-Net uses. How It Stacks Up Against Our Work: Shared Foundations: Both use CNNs, preprocess MRI data, and rely on augmentation. Great minds think alike! U-Net vs. Plain CNN: Our project uses U-Net’s encoder-decoder + skip connections, while this paper opts for a classic deep CNN. Trade-offs: U-Net excels at precise localization; their method prioritizes hierarchical feature learning.Progress Over Time: This was an early proof-of-concept; our work builds on years of refinements (like U-Net’s design) that came after.
Brain tumor segmentation with deep neural networks [16]This paper introduces a fully automatic brain tumor segmentation system that works like a radiologist’s double-check process: first, it does a quick rough draft of the tumor boundaries; then, it refines the details using both zoomed-in (local) and big-picture (global) information. Why It’s Clever: Hands-Free Operation: No manual tweaking needed; MRI in → segmentation out. Two-Stage Magic: The cascaded design captures both fine edges and overall tumor shape better than single-step methods. Multimodal Mastery: Utilizes many MRI types (such as T1, T2, FLAIR, etc.) to increase accuracy. BraTS-Proven: Held its own against other top methods in its era. The Trade-Offs: GPU Hungry: All that deep learning horsepower comes with heavy computational costs. Complex Setup: More stages mean more moving parts that could potentially go wrong. Specialized Skills: Works great on BraTS data but might need tuning for other hospitals’ scanners. How It Stacks Up Against Our U-Net Project: One Net vs. Two-Stage: We’re using a single U-Net; they use a cascaded system. Their approach suggests accuracy could improve by adding refinement steps.Context Matters: Both methods use context, but theirs explicitly combines local and global views—a trick we might borrow.Benchmark Buddy: Their BraTS 2013/2015 scores give us a performance target to aim for (or beat!).The Bottom Line: Their results were top-tier for their time, accurately segmenting the tumor core, the whole tumor, and active regions with high Dice scores. While newer models (like nnU-Net) have since raised the bar, this paper showed how strategic architecture design (cascades + context) could push boundaries in medical AI.
Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation [17] The Breakthrough/The Discovery:This research uses a two-part powerhouse to address brain lesion segmentation: a 3D CNN that analyzes scans at multiple scales—like examining a brain scan through different magnifying glasses to catch both tiny and large lesions—and a CRF “polishing step” that corrects errors and smooths out sharp edges, much like an editor tidying up a preliminary text. Why It’s Impressive: Sees the Big and Small Picture: The multi-scale design catches lesions whether they're tiny spots or sprawling regions. True 3D Vision: Processes entire scan volumes at once (unlike 2D slice-by-slice methods). Boundary Makeover: The CRF refinement makes segmentations look more natural and precise. Handles Messy Reality: Works even when lesions vary wildly in shape, size, and appearance. The Trade-Offs: Heavy Lifting Required: All that 3D processing and CRF refinement demands serious computing power. Patience Needed: Training this sophisticated model isn’t a quick process. Tuning Challenges: The CRF introduces new knobs to tweak, adding complexity. How It Compares to Our 2D U-Net Project: 3D vs. 2D: We’re working with 2D slices; they process full 3D volumes—their approach might capture more spatial context. Fancy Finishing Touch: Our U-Net stops at raw output; their CRF step could potentially clean up our results. Complexity vs. Simplicity: Our 2D U-Net is lighter and faster than their more complex approach. Did It Deliver? Absolutely—it set new state-of-the-art records on brain lesion benchmarks. The numbers proved that combining multi-scale 3D analysis with CRF refinement creates remarkably accurate segmentations.
3D U-Net Learning Dense Volumetric Segmentation from Sparse Annotation [18]
This paper upgrades the classic 2D U-Net into a full 3D powerhouse—finally giving medical imaging AI the ability to analyze scans the way doctors do in three dimensions. Even better? It’s smart enough to learn from partially labeled datasets, which is a lifesaver since getting full 3D annotations is often impractical. Why Is It a Game-Changer: True 3D Vision: Processes entire MRI/CT volumes at once, capturing spatial relationships that 2D methods miss. Smoother, More Consistent Results: No more “jumpy” segmentations between slices—everything stays anatomically plausible. Works with Real-World Data: Doesn’t need every voxel labeled to learn effectively (huge for time-strapped clinicians). Proven Performance: Achieved excellent results on kidney segmentation tasks and became a new gold standard for 3D medical AI. The Reality Check: GPU Hunger Games: 3D convolutions consume memory like Pac-Man eats dots. Still Needs Decent Data: Sparse labels work, but you can’t train it on three scans and expect miracles. Tuning Required: Like all deep learning, it needs some fiddling to shine. How It Makes Our 2D U-Net Look Quaint: Flat vs. Full 3D: We’re analyzing brain tumors slice-by-slice like a flipbook; they’re working with the whole blockbuster 3D movie. Future-Proofing: Our own project’s roadmap mentions moving to 3D—this paper is an instruction manual. BraTS Potential: For complex 3D tumors, their approach would likely outperform our current 2D model.
3.Proposed Methodology
As shown in Fig. 2, the flow diagram illustrates the brain tumor segmentation process. This work tackles brain tumor segmentation using MRI scans, leveraging the BraTS 2020 dataset and a U-Net deep learning model. The pipeline starts with preprocessing steps: extracting 2D slices, resizing them to 128x128, normalizing intensities, and focusing on FLAIR and T1ce channels for optimal contrast. A custom data generator handles on-the-fly loading, one-hot encoding masks, and shuffling for robust training. The U-Net architecture, with its encoder-decoder structure and bottleneck features, is trained using carefully configured callbacks and evaluated on standard metrics like Dice score and IoU. Results show strong performance in segmenting tumor regions, with both quantitative metrics and visual predictions validating the approach. To make the tool practical, a Streamlit app wraps everything into an interactive interface, enabling users to visualize results and estimate tumor volumes effortlessly.
Based on Fig. 2, the paper is structured as follows: Section 3 details our dataset, preprocessing, data generator, the U-Net model architecture, training process, and evaluation. Section 4 presents our experimental results, including quantitative data and visual examples. Section 5 demonstrates the Streamlit application. Section 6 discusses the impact of hyperparameter tuning on project accuracy. Finally, Section 7 provides the conclusion.
3.1. Dataset
For this project, the BraTS 2020 dataset, often considered the "Olympics" of brain tumor imaging research, is utilized. Researchers have been using this annual challenge since 2012 to test and compare their latest AI models for identifying brain tumors in scans. This dataset was chosen specifically because it offers super-resolution data and represents the most recent information relevant to our topic. We have MRI scans from 369 patients, encompassing both fast-growing (high-grade) and slower-growing (low-grade) brain tumors. The data was divided into training (70%), validation (15%), and test (15%) sets, ensuring each group contained a representative mix of different tumor types. Figure 3 illustrates this data breakdown. This setup provides a solid foundation to train and test our AI model, ensuring that our results will be meaningful for real-world medical applications.
Each patient's scans include four different types of MRI images, each showing distinct aspects of the tumor. The four types of MRI images are:
T1-weighted (T1): Aids in the identification of anatomical structures and offers good contrast between gray and white matter.
T1-weighted with contrast enhancement (T1CE): Highlights areas where the blood-brain barrier has been disrupted, which is common in active tumor regions.
T2-weighted (T2): Shows edema and infiltration, appearing as bright regions around the tumor.
Fluid Attenuated Inversion Recovery (FLAIR): Suppresses cerebrospinal fluid signals, making it easier to detect edema and infiltration at the tumor boundaries.
All scans have been carefully prepared: they are aligned to the same brain template, cleaned to remove the skull, and adjusted to a uniform 1 mm³ resolution (each 3D scan measures 240×240×155 pixels). As shown in Fig. 4, the dataset comprises four types of images: FLAIR, T1, T2, and T1CE.
3.2. Data Preprocessing
Effective preprocessing of MRI data is crucial for optimal model performance. This involves several key steps: Data Loading Each MRI modality and its corresponding segmentation masks were loaded using libraries such as NiBabel [19]. Normalization To standardize intensity ranges, each modality for every patient was typically normalized. Common techniques include z-score normalization (subtracting the mean and dividing by the standard deviation of non-zero voxels) or scaling intensities to a specific range (e.g., 0 to 1 after clipping outliers) [20, 21]. Cropping/Resizing To focus on the brain region and reduce computational load, images were cropped to a region of interest containing the brain or resized to a uniform input dimension (e.g., 128x128 or 240x240 pixels for 2D slices) [22]. Slice Selection/Patch Extraction For 2D U-Net implementations, 3D MRI volumes are typically processed slice-by-slice. Slices containing tumor regions are often preferentially selected. Alternatively, a patch-based approach might be used, where smaller 2D or 3D patches are extracted from the volumes for training. Channel Combination The preprocessed slices from different MRI modalities (e.g., FLAIR, T1ce, T2) were stacked to form multi-channel input images for the U-Net model. Data Augmentation To increase the diversity of the training set and improve model generalization, data augmentation techniques were likely applied. These could include random rotations, flips, scaling, elastic deformations, and intensity shifts.
3.3. Data Generator
A custom Keras sequence-based data generator was implemented to efficiently load and preprocess data during training. The generator performs the following functions: Instantly loads batches of MRI slices along with their matching segmentation masks Applies preprocessing steps to each slice. Converts segmentation masks to a one-hot encoded format for multi-class classification. Handles shuffling of data between epochs. The data generator is designed to work with the BraTS dataset structure and manages the mapping between subject IDs and file paths.
3.4. Model Architecture
The brain tumor segmentation model is built upon the U-Net architecture, a proven solution for medical image analysis. Originally designed for biomedical imaging, U-Net’s symmetric encoder-decoder structure with skip connections makes it particularly effective for precise segmentation tasks.
3.4.1. The U-Net model architecture consists of:
Decoder Path (Expanding Path) The decoder path reconstructs the segmentation map using a 2×2 transposed convolution to upsample feature maps and skip connections that merge encoder features for spatial precision. It also includes two 3×3 convolutional layers with ReLU activation.
Output Layer The final 1×1 convolution with softmax activation produces probability maps for each class: background, necrotic/non-enhancing core (label 1), peritumoral edema (label 2), and enhancing tumor (label 4). Our model, based on U-Net and its general content, is shown in Fig. 5.
3.5. Training Process
The training process for the brain tumor segmentation model involves several steps and considerations to ensure optimal performance and convergence.
3.5.1. Dataset Splitting
To train and evaluate the model fairly, the BraTS 2020 dataset was divided into three parts: Training set (70%): The primary data used to teach the model.
This split ensures the model learns effectively while being tested on fresh data, mimicking real-world usage.
3.5.2. Training Configuration
The model was trained with the following configuration: Batch size: 2 per GPU (scaled according to available GPUs). Number of epochs: 25. Optimizer: Adam with an initial learning rate of 0.001. Loss function: Categorical cross-entropy. Input size: 128×128×2 (height × width × channels). Output size: 128×128×4 (height × width × classes)
3.5.3. Training Callbacks
To optimize training, automated checkpoints and adjustments were used:
ReduceLROnPlateau: Lowers the learning rate if progress stalls (patience: 2 epochs, reduction factor: 0.2). This prevents overshooting optimal performance.
As shown in Fig. 6, training performance was tracked using loss and accuracy (standard measures of model learning) and the Dice coefficient (which tracks tumor segmentation accuracy). The evolution of these parameters during training is depicted in the plot below. This approach balances thorough learning with efficient resource use, adapting dynamically to the model’s needs.
Best Model Selection, After training, the model with the lowest validation loss was selected as the final model. This approach helps prevent overfitting and ensures that the model generalizes well to unseen data. The best model’s weights were saved and used for all subsequent evaluations and predictions. As shown in Fig. 7, the training process for the best model achieved a training accuracy of 0.9942, a precision of 0.9947, and a validation accuracy of 0.9936 [24, 25].
3.6. Evaluation
To comprehensively evaluate the performance of our U-Net model, we utilized several standard metrics commonly employed in medical image segmentation. These metrics provide quantitative insights into the model’s accuracy, precision, and robustness.
3.6.1.Calculate evaluation metrics shown in Fig. 7 (on training set).
1.The Dice Similarity Coefficient (DSC) [26]: This metric quantifies the overlap between the anticipated segmentation and the ground truth. It is defined as:
Dice coefficient = 2 * |A ∩ B| / (|A| + |B|) (1)
where B denotes the ground truth set of pixels and A represents the forecast set.
In Fig. 7, the Dice coefficient is 0.6557, Dice-coefficient-oedema is 0.7951, Dice-coefficient-enhancing is 0.7661, and Dice-coefficient-necrotic is 0.6209.
2.Intersection over Union (IoU) [27, 28]: Also known as the Jaccard Index, IoU is similar to the DSC in that it measures the overlap between the predicted segmentation and the ground truth. It is calculated as:
IoU = Area of Intersection / Area of Union (2)
In Fig. 7, the IoU is 0.7846.
3.Accuracy [29]: This metric indicates the percentage of voxels (both tumor and non-tumor) that were correctly classified out of the total. While accuracy is a straightforward metric, it can be unreliable in segmentation tasks with imbalanced classes—like brain tumor segmentation, where tumor voxels are much rarer than healthy tissue.
Accuracy = (TP + TN) / (TP + FP + TN + FN) (3)
In Fig. 7, the accuracy is 0.9942.
4.Mean Squared Error (MSE) [30]: MSE quantifies prediction accuracy by averaging the squared differences between the model’s probabilities and the true labels. A lower MSE means the model’s predictions are closer to reality.
MSE = (1/n) * Σ (yi - Ϸi)² (4)
In Fig. 7, the MSE is 0.7846.
5.Sensitivity (Recall or True Positive Rate) [31]: This metric tells us how good the model is at catching actual tumor pixels—it measures the percentage of real positives that were correctly identified. Think of it as the model’s ability to avoid missing true tumor regions.
Sensitivity = TP / (TP + FN) (5)
where FN stands for False Negatives and TP for True Positives.
In Fig. 7, the sensitivity is 0.9926.
6.Specificity (True Negative Rate) [31]: This metric shows how well the model recognizes healthy tissue—it calculates the percentage of actual non-tumor pixels correctly identified. In other words, it measures the system’s ability to avoid false alarms in normal areas.
Specificity = TN / (TN + FP) (6)
where TN is True Negatives and FP is False Positives.
In Fig. 7, the specificity is 0.9982.
7.Precision (Positive Predictive Value) [31]: This metric tells us how reliable the model’s tumor detections are—it calculates what percentage of flagged tumor pixels are actually cancerous. High precision means when the model says ‘tumor’, you can trust it’s right.
Precision = TP / (TP + FP) (7)
In Fig. 7, the precision is 0.9947.
These metrics were calculated by comparing the model’s predicted segmentation masks against the ground truth annotations for each case in the evaluation dataset. The Res/ folder in the project files contains plots such as dice.png and IoU.png, which visually represent these metrics, likely over training epochs or as distributions over the test set.
3.6.2. Per-Class Evaluation
In addition to overall metrics, the model’s performance was evaluated separately for each tumor subregion:
1.1.Necrotic and Non-enhancing Tumor Core (NCR/NET): Dice coefficient for class 1
2.2. Peritumoral Oedema (ED): Dice coefficient for class 2
3.3. Enhancing Tumor (ET): Dice coefficient for class 3
This per-class evaluation helps identify which tumor regions the model segments more accurately and which regions might need improvement.
4.Experimental Results
Results are split into two sections: quantitative and qualitative.
4.1. Quantitative Results on Test Set
The model achieved the following performance on the test set:
A
Table 1 Summary of Segmentation Performance Metrics. This table presents
the mean and standard deviation of DSC, IoU, sensitivity, specificity, and precision for the segmented tumor regions (e.g., whole tumor) on the evaluation dataset. Overall Dice coefficient: 0.88; mean IoU: 0.7865.
Table 1: Summary of Segmentation Performance Metrics.
Metric Mean Valu Standard Deviation
Dice (DSC) [e.g., 0.88] [e.g., 0.05]
IoU (Jaccard) [e.g., 0.79] [e.g., 0.06]
Sensitivity [e.g., 0.90] [e.g., 0.04]
Specificity [e.g., 0.99] [e.g., 0.01]
Precision [e.g., 0.87] [e.g., 0.05]
As shown in Fig. 8, the model achieved the following performance on the test set: accuracy 0.9918.
Comparing the test results to the training and validation results, the values are not significantly different, which indicates the quality and efficiency of the model.
4.2. Qualitative Results on Test Set
In addition to quantitative metrics, visual inspection of the segmentation results provides valuable insight into the model’s performance on individual cases. Figure 9 presents representative examples of the U-Net model’s segmentation outputs on axial MRI slices from the test set, overlaid on the original FLAIR or T1ce images, and compared with the ground truth manual segmentations.
Qualitative Segmentation Results: Representative examples are shown in Fig. 10 (from left to right): original MRI slice (e.g., FLAIR), ground truth segmentation mask, and model-predicted segmentation mask. This work analyzes multimodal MRI brain scans (FLAIR, T1, T1ce, and T2) alongside their corresponding segmentation masks to improve tumor detection. The intensity values across different MRI sequences—ranging from 0 to 200—reveal distinct tissue contrasts that help differentiate healthy areas from potential abnormalities. By examining these intensity distributions and comparing them with ground truth masks, we demonstrate how combining multiple MRI modalities enhances diagnostic precision. The consistent patterns across sequences (particularly in the 50–200 range) suggest reliable markers for automated tumor identification. This approach not only validates the importance of multimodal imaging but also lays the groundwork for more accurate AI-assisted diagnostics in neuroradiology.
The following images show examples of segmentation results on various test cases. According to all the cases below, the model successfully handles various tumor shapes, sizes, and locations, demonstrating its robustness and generalization capability.
4.2.1. Single-Class Evaluation [32]
To better understand the model’s performance on specific tumor regions, single-class evaluations were performed. The following image shows an example of the model’s performance in segmenting the edema region (class 2). As shown in Fig. 14, the study focuses on evaluating brain tumor segmentation performance by comparing ground truth MRI data with model predictions, specifically for edema detection. The ground truth data shows intensity values across key ranges (0-120), while the predicted class—edema—demonstrates strong alignment in the critical 0-100 range. The results highlight the model’s ability to accurately identify and segment abnormal fluid buildup in brain tissue, a crucial step for diagnosis and treatment planning. By analyzing these intensity distributions, we validate the model’s precision in detecting pathological changes. The close match between ground truth and predicted values suggests reliable performance, paving the way for clinical applications where speed and accuracy matter.
4.2.2. Overall Performance
The overall performance of the model is summarized in Fig. 15, which shows the distribution of various metrics across the test set. The results indicate that the model achieves state-of-the-art performance in brain tumor segmentation, comparable to other leading methods in the field. This study demonstrates a highly accurate deep learning model for medical image analysis, achieving 99.2% training accuracy with strong validation performance. The model shows excellent convergence, with training loss steadily decreasing from 0.10 to 0.02 and validation loss maintaining a stable decline. Most impressively, it achieves a mean Intersection-over-Union (IOU) of 0.85–0.90 during training, indicating precise segmentation capabilities. The consistent gap between training and validation metrics (accuracy: 99.2% vs ~ 77.5%, IOU: 0.90 vs 0.75) suggests slight overfitting—a common challenge that could be addressed with more diverse data. These results prove the model’s potential for clinical applications where both precision and reliability are critical. The high IOU scores particularly highlight its effectiveness in boundary-sensitive tasks like tumor segmentation.
4.2.3. MRI Modality Visualization [33]
Different MRI modalities provide complementary information about the brain and tumor regions. The following visualization shows the four MRI modalities used in the BraTS dataset. As shown in Fig. 16, this visualization helps understand the unique characteristics of each modality and how they contribute to the segmentation task.
4.2.4. Multi-View Visualization [34]
Several anatomical planes (axial, sagittal, and coronal) can be used to view MRI scans. The following visualization shows a T1CE scan from three different views. This multi-view approach provides a more comprehensive understanding of the tumor’s spatial characteristics.
4.2.5. Segmentation Mask Visualization
The segmentation masks are visualized using a custom color map to distinguish between different tumor regions (purple: necrotic and non-enhancing tumor core (NCR/NET); green: peritumoral edema (ED); blue: enhancing tumor (ET)).
4.2.6. MRI and Mask Overlay To better understand the relationship between the MRI scans and the segmentation masks, overlay visualizations are used. This visualization superimposes the segmentation mask on the MRI scan, making it easier to see how the segmentation corresponds to the underlying anatomy.
4.2.7. Multimodal Tumor Visualization
Different tumor regions can be visualized in relation to the various MRI modalities. This research leverages multimodal MRI scans (FLAIR, T1, T2, and T1CE) from the BraTS dataset to improve brain tumor segmentation. The study focuses on precisely identifying three critical tumor regions: the whole tumor area, the dense tumor core, and the contrast-enhancing active tumor portions. By analyzing how these different MRI sequences highlight various tumor characteristics, we’ve developed an approach that combines their complementary strengths for more accurate diagnosis. The multimodal analysis proves particularly valuable—while FLAIR images excel at showing edema around tumors, T1CE sequences better reveal the actively growing tumor margins. This combination allows for comprehensive tumor assessment that single-modality scans can’t match. The methods demonstrated here could help radiologists make faster, more confident decisions when evaluating complex brain tumor cases.
4.2.8. T1 Montage and Nilearn Plots
Advanced visualization techniques include montage views and specialized neuroimaging plots. Figure 22 helps understand how different tumor regions appear in different MRI modalities and how they relate to each other. Figure 23 provides alternative ways to view and interpret the MRI data and segmentation results.
4.2.9. Four Modalities Display
A side-by-side comparison of the four MRI modalities helps understand their complementary nature. Figure 24 highlights the unique information provided by each modality and how they collectively contribute to the segmentation task. This study analyzes multimodal MRI brain scans (FLAIR, T1, T1CE, and T2) from the BraTS 2020 dataset to improve tumor detection and characterization. Each imaging sequence offers unique advantages—FLAIR highlights edema, T1 provides anatomical detail, T1CE shows active tumor margins, and T2 reveals fluid-filled regions. By combining these complementary views, we demonstrate how radiologists can get a more complete picture of tumor boundaries and composition than any single scan provides alone. The approach focuses on practical clinical application, showing how this multimodal analysis can help distinguish between tumor subtypes, track progression, and guide treatment decisions. These techniques may ultimately lead to faster, more accurate diagnoses while reducing dependence on invasive procedures.
4.2.10. Sample Batch Visualization
During training, sample batches are visualized to ensure that the data is being processed correctly. Figure 25 shows the input MRI slices (FLAIR and T1CE) alongside the corresponding segmentation masks.
5. The Streamlit Application of Running Results
Shown in Fig. 26, this application makes the technology accessible to medical professionals without requiring extensive technical knowledge.
Application Features
The Streamlit application offers the following features:
1.Case Selection: Users can select from available BraTS 2020 validation cases.
2.Slice Selection: Users can navigate through different slices of the selected case using a slider.
3.Original Scan Display: The application displays the original FLAIR and T1CE scans for the selected slice.
4.Segmentation: Users can run the segmentation model on the selected slice with a single click.
5.Result Visualization: The application displays the segmentation results as an overlay on the original scan and as probability maps for each tumor region.
6.Volume Estimation: The application estimates the volume of each tumor region based on the segmentation results.
User Interface
The application has a clean and intuitive user interface with the following components:
1.Header: Displays the title “Brain Tumor Segmentation.”
2.Case Selection Dropdown: Allows users to select a case from the available options.
3.Slice Selection Slider: Allows users to navigate through different slices of the selected case.
4.Original Scans Display: Shows the original FLAIR and T1CE scans for the selected slice.
5.Segmentation Button: Triggers the segmentation process for the selected slice.
6.Results Display: Shows the segmentation results, including the overlay and probability maps.
7.Volume Estimation: Displays the estimated volume of each tumor region.
Implementation Details
The Streamlit application is implemented and includes the following key components:
1.Model Loading: The application loads the trained segmentation model using TensorFlow’s load_model function.
2.Case Loading: The application lists available cases from the BraTS2020 validation dataset and loads the selected case.
3.Preprocessing: The application preprocesses the selected slice to match the input format expected by the model.
4.Prediction: The application runs the model on the preprocessed slice to generate segmentation predictions.
5. Visualization: The application visualizes the segmentation results using matplotlib and Streamlit’s plotting capabilities.
6.Volume Calculation: The application calculates the volume of each tumor region based on the number of voxels and the voxel size.
6.Hyper-parameter Tuning Impact On Project Accuracy
The accuracy of a machine learning project is profoundly influenced by the effectiveness of its hyperparameter tuning; models like the U-Net architecture are employed, and even minor adjustments to hyperparameters can lead to significant differences in performance.
In the context of this project (Brain Tumor Segmentation), several hyperparameters play a critical role in achieving high accuracy. These include the learning rate, batch size, number of epochs, and the specific configurations of the U-Net architecture itself.
Learning Rate: Finding the Right Pace, Imagine teaching someone to spot tumors: Too fast (high learning rate): They might jump to conclusions, missing important details, Too slow (low learning rate): They'll take forever to learn, maybe never getting it quite right.
We started with a moderate pace (0.001) using the Adam optimizer, with an automatic slowdown when progress stalled – like a smart tutor adjusting to the student's needs [35].
The quantity of training samples used in a single iteration is referred to as the batch size. In the Brain Tumor Segmentation project, a batch size of 2 per GPU was used. A larger batch size provides a more accurate estimate of the gradient, leading to more stable training and potentially faster convergence in terms of wall-clock time. However, very large batch sizes can lead to models that generalize poorly, getting stuck in sharp minima that do not translate well to unseen data, thus reducing accuracy on the validation and test sets. Smaller batch sizes, while introducing more noise into the gradient estimates, can help the model escape shallow local minima and often lead to better generalization and higher accuracy, albeit with slower training per epoch. Model generalization and computing efficiency are traded off when choosing the batch size [36].
Number of Epochs: One full run across the whole training dataset is represented by an epoch. The Brain Tumor Segmentation project was trained for 25 epochs. Low accuracy results from underfitting, which occurs when the model has not had enough time to identify the underlying patterns in the data. On the other hand, overfitting, or training for too many epochs, can make the model memorize the training data, including noise and particular training examples. This results in poor generalization and decreased accuracy on unseen validation and test data, but it also produces exceptional performance on the training set. A key hyperparameter tuning technique that aids in choosing the optimal model from the training process, avoiding overfitting, and guaranteeing the highest accuracy on fresh data is the `ModelCheckpoint` callback, which saves the model weights only when the validation loss improves [37].
Dropout: The Strategic Memory Loss, By randomly ignoring 20% of neurones during training Prevents the model from relying too much on any one feature. Forces it to learn multiple ways to recognize tumors, Too little (10%): The model might fixate on irrelevant scan details. Too much (50%): Could forget important tumor characteristics [38, 39].
Optimizer Choice and its Parameters The choice of optimizer (e.g., Adam, SGD, RMSprop) and its specific parameters (e.g., beta values in Adam) can significantly impact how quickly and effectively the model converges to an optimal solution. The Adam optimizer, known for its adaptive learning rates for different parameters, was used in the Brain Tumor Segmentation project. Tuning these parameters can lead to faster convergence and better final accuracy.
Model Architecture: Building the Right Tool, Our U-Net grows from 32 to 512 filters as it analyses scans: Early layers: Catch basic shapes (like tumor location), Deep layers: Spot fine details (like tumor edges).Getting these numbers wrong would be like using binoculars when you need a microscope – or vice versa.
Discussion and conclusion
This paper demonstrates how U-Net models can effectively automate brain tumor segmentation in MRI scans. Using the BraTS2020 dataset, the research outlines a complete process—from preparing the data to training and evaluating the model. The results show strong performance, with segmentation outputs closely matching what experts manually label, backed by solid scores on standard metrics like Dice and IoU.
These findings highlight U-Net's potential as a reliable tool for tackling the challenging, labour-intensive task of tumor segmentation. By using open benchmark data and sharing detailed methods, the work makes it easier for others to reproduce and compare results—an important step for advancing research in this field.
Looking ahead, there are exciting opportunities to improve upon this foundation. Future studies could explore 3D U-Net versions to better capture tumor volumes, smarter data augmentation to handle diverse cases, and refined post-processing to sharpen results. Better differentiation of tumor sub-regions also remains an important goal. As these tools evolve, they'll move closer to becoming seamless aids in clinical practice, helping doctors make faster, more accurate diagnoses for patients.
A
Data Availability
The BraTS-2020 dataset available on [https://www.kaggle.com/datasets/awsaf49/brats2020-training-data/code?datasetId=723383&sortBy=voteCount](https:/www.kaggle.com/datasets/awsaf49/brats2020-training-data/code?datasetId=723383&sortBy=voteCount)[https://www.kaggle.com/datasets/awsaf49/brats20-dataset-training-validation](https:/www.kaggle.com/datasets/awsaf49/brats20-dataset-training-validation)