Deep Learning-Based Segmentation of the Maxillary Sinus on Panoramic Radiographs Using MedSAM and DeepLabv3+

Yong Chan Park 1

Sang Jun Lee 2

Wan Lee 1

Han-Gyeol Yeom 1

Byung-Do Lee

DDS, PhD

1✉ Phone82-63-859-2912 Emaileebydo@gmail.com Emaileebydo@wku.ac.kr

Department of Oral and Maxillofacial Radiology and Wonkwang Dental Research Institute, College of Dentistry Wonkwang University Iksan South Korea

2 Division of Electronic Engineering Jeonbuk National University JeonJu South Korea

Yong Chan Park¹, Sang Jun Lee², Wan Lee¹, Han-Gyeol Yeom¹ and Byung-Do Lee¹*

¹Department of Oral and Maxillofacial Radiology and Wonkwang Dental Research Institute, College of Dentistry, Wonkwang University, Iksan, South Korea

²Division of Electronic Engineering, Jeonbuk National University, JeonJu, South Korea

*Corresponding author

Byung-Do Lee, DDS, PhD

Department of Oral and Maxillofacial Radiology and Wonkwang Dental Research Institute, College of Dentistry, Wonkwang University, Iksan, South Korea

Tel: 82-63-859-2912

Fax:82-63-857-4002

Email: eebydo@gmail.com, eebydo@wku.ac.kr

Abstract

Background

Panoramic radiographs are used in dental practice because of low radiation dose, patient comfort, and rapid acquisition. However, segmentation of the maxillary sinus remains difficult because of superimposed anatomical structures. We evaluated the performances of MedSAM and DeepLabv3+, two advanced segmentation models, for delineating the maxillary sinus on panoramic radiographs.

Methods

A total of 1,046 panoramic radiographs were retrospectively collected from a dental hospital and a private clinic. Maxillary sinus boundaries were manually annotated by two oral and maxillofacial radiologists and one general dentist, using the VGG Image Annotator. The dataset was randomly divided into training, validation, and test sets in a 60:20:20 ratio. Binary masks were generated and segmentation was performed using MedSAM and DeepLabv3 + in Python. Model performance was evaluated using Dice Similarity Coefficient (DSC), Intersection over Union (IoU), precision, recall, F1-score, Normalized Surface Distance (NSD), and 95th percentile Hausdorff Distance (HD95).

Results

Both models achieved high segmentation accuracy. MedSAM and DeepLabv3 + recorded a DSC of 0.9570 and 0.9534 and an IoU of 0.9183 and 0.9124, respectively. The NSD was 0.931 for MedSAM and 0.928 for DeepLabv3+, with HD95 < 0.02 for both models. MedSAM showed a slightly higher accuracy but demanded substantially greater computational complexity (1,487.9 vs. 593.97 Floating Point Operations per second) and parameters (90.49M vs. 26.68M) than DeepLabv3+.

Conclusions

MedSAM and DeepLabv3 + provided robust and reliable segmentation of the maxillary sinus on panoramic radiographs. These findings support the clinical feasibility of advanced deep learning models for automated sinus segmentation, particularly when three-dimensional imaging is unavailable.

Keywords:

deep learning

panoramic radiograph

maxillary sinus

segmentation

MedSAM

DeepLabv3+

Background

Panoramic radiography is a commonly employed dental diagnostic imaging modality owing to its broad anatomical coverage, low radiation dose, and patient comfort. It provides a comprehensive two-dimensional view of the maxillofacial region, enabling the rapid assessment of teeth, dentoalveolar structures, and sinus cavities. Therefore, panoramic radiographs are routinely used in initial dental evaluations, presurgical planning, and identification of pathological conditions. However, they provide a compressed two-dimensional projection of the complex three-dimensional anatomical structures of the jaw. This simplification introduces geometric distortions, anatomical overlaps, ghost images, and variations in magnification [1]. These limitations are particularly evident in regions with a high anatomical complexity, such as the maxillary sinus.

The maxillary sinus plays a critical role in various dental and surgical procedures, such as sinus lifts, implant placements, and tooth extractions [2]. The accurate delineation of sinus boundaries is essential to avoid surgical complications and improve clinical outcomes. However, the sinus anatomy exhibits significant inter-individual variability in size, pneumatization, mucosal thickness, and presence of lesions [3]. Moreover, its proximity to adjacent structures, such as the nasal cavity, ethmoid sinuses, and orbital floor, further complicates its identification in two-dimensional images. Although cone-beam computed tomography (CBCT) offers a more accurate and three-dimensional view of the maxillary sinus, its routine use is limited by high radiation exposure and cost and limited availability in routine dental practice [4]. Therefore, the clinical interest in enhancing the diagnostic capability of panoramic radiographs using automated segmentation technologies is growing.

In recent years, deep learning, particularly convolutional neural networks (CNNs), has revolutionized medical image analysis. CNNs can automatically learn hierarchical representations from image data without manual feature engineering [5]. Their layered structure enables the extraction of low-, mid-, and high-level features, which can be utilized for detection [6], classification [7], and segmentation [8]. In dentistry, CNN-based models have been applied successfully in tooth segmentation [9], periodontal disease prediction [10], identification of radiolucent lesions [11], and age estimation from dental radiographs [12].

CNN models have also demonstrated a high performance in feature extraction and classification processes for various medical images [13, 14]. In 2021, the panoptic segmentation method was proposed to extract semantic and instance details from dental panoramic radiographs, including the maxillary sinus [15]. The U-Net and YOLO architectures have also been used for similar purposes [16, 17].

Among CNN architectures, U-Net has gained popularity in biomedical image segmentation because of its encoder-decoder structure and skip connections, which preserve spatial information across layers [18]. More advanced architectures, such as DeepLabv3+, incorporate Atrous Spatial Pyramid Pooling (ASPP) to extract multi-scale contextual information and improve segmentation robustness across varying object sizes. DeepLabv3 + has demonstrated a remarkable performance in semantic segmentation tasks involving natural images and radiographs [19]. It is known for its stability and ability to handle variable resolutions and noisy input data, making it suitable for challenging dental image segmentation tasks, such as those of the maxillary sinus.

Attention mechanisms have also been introduced to improve the focus of CNNs on relevant spatial and channel features. For example, the Convolutional Block Attention Module has been used to refine CNN outputs by enhancing the salient features and suppressing irrelevant background information [20]. Recently, transformer-based models have been adopted in medical imaging, building on their success in natural language processing [21].

The application of large-scale foundation models and prompt-driven learning is an emerging paradigm in medical image segmentation. Segment Anything Model (SAM), developed for universal segmentation tasks, incorporated a new approach using prompts (points, boxes, and text) to guide segmentation in a zero-shot manner [22]. MedSAM, a medical adaptation of SAM, employs a Vision Transformer encoder, prompt encoder, and transformer-based mask decoder. Trained using large datasets, MedSAM can be generalized across diverse anatomical contexts without task-specific retraining [23]. However, despite their capabilities, SAM-based architectures remain underexplored in dental imaging [24]. A notable application by Zhicheng et al. showed the feasibility of SAM in segmenting impacted teeth on panoramic radiographs [25], suggesting its potential application for segmentation of other dental structures, including the maxillary sinus.

Despite considerable progress in deep learning-based medical image analysis, automated segmentation of the maxillary sinus in panoramic radiographs remains challenging because of the anatomical variability, low-contrast boundaries, and presence of overlapping structures. Variations in image acquisition settings, patient positioning, and presence of artefacts further exacerbate these difficulties [1]. Accordingly, robust and generalizable artificial intelligence (AI)-based models are required to delineate the maxillary sinus accurately in real-world clinical practice.

This study aimed to evaluate the segmentation performance of two advanced deep learning models, DeepLabv3 + and MedSAM, in delineating the maxillary sinus on panoramic radiographs. By applying and validating these models using a large dataset, we sought to determine their feasibility and clinical utility for automated sinus analysis in dental imaging.

Methods

Study population and panoramic radiograph preparation

The dataset comprised panoramic radiographs of patients who visited a university dental hospital and a private dental clinic between March 2016 and February 2025. The patients were randomly selected from those who had undergone various dental treatments, including third molar extraction, orthodontic consultation, and management of dental pain. A total of 1,046 patients (300 males and 746 females) with ages ranging from 16 to 83 years were included in the study.

The inclusion criteria were the absence of any prior maxillofacial surgical history and the presence of either no mucosal thickening or only mild mucosal thickening in the maxillary sinus on panoramic radiographs. The exclusion criteria included developmental jaw anomalies or history of maxillofacial trauma that could interfere with accurate radiographic interpretation.

Panoramic images were acquired using three digital imaging systems: Promax® (Planmeca OY, Helsinki, Finland), PCH-2500® (Vatech, Hwaseong, Korea), and PHT-30LF0® (Vatech).

The exposure conditions for each unit were standardized as follows: Promax® at 72 kVp, 12 mA for 16 s; PCH-2500® at 72 kVp, 10 mA for 13.5 s; and PHT-30LF0® at 60 kVp, 9 mA for 16 s. During image acquisition, all patients were positioned according to standard radiographic protocols, aligning the vertical midline of the face with the machine’s vertical reference line and ensuring that the Frankfurt horizontal plane was parallel to the floor. All images were processed using manufacturer's proprietary software and exported in the bitmap file format for further analysis.

Labelling and dataset preparation

Manual labelling of the maxillary sinus boundaries was performed on all 1,046 panoramic radiographs to establish the ground truth for segmentation. Of these, 808 images, obtained from a private dental clinic, were initially annotated by a general dentist and subsequently reviewed and confirmed by two board-certified oral and maxillofacial radiologists. The remaining images were annotated by the same radiologists in consensus.

The annotation was conducted using VIA™ (VGG Image Annotator, Visual Geometry Group, University of Oxford, UK), with the polygonal tool to manually outline the maxillary sinus region (Fig. 1). Anatomical references guided the delineation of boundaries: the superior boundary was determined based on the inferior orbital rim, which is the most consistently visible feature on panoramic radiographs, although it does not correspond to the true anatomical roof of the sinus. The inferior boundary was drawn along the well-defined cortical floor of the sinus. The medial border followed the anteromedial wall of the sinus, which can vary depending on individual anatomy and patient positioning, whereas the lateral boundary was demarcated as the posterior wall of the maxillary sinus.

Labelled data were exported in the comma-separated values (CSV) format using the VIA tool. Each CSV file contained polygon coordinate values corresponding to the annotated regions in each image, and was structured to maintain a one-to-one correspondence with the original panoramic radiograph. To facilitate the training of the segmentation model, these coordinate values were used to generate binary mask images using a Python-based script, allowing the labelled regions to serve as the ground truth during the deep learning process (Fig. 2). The dataset was divided into training (60%), validation (20%), and test (20%) subsets by random allocation. The training set was used for model learning, validation set for tuning hyperparameters and monitoring overfitting, and test set for final model performance evaluation. All images were used in their original resolution without resizing.

Model architecture and training setup

Two deep learning models were employed for segmentation: MedSAM, a transformer-based model, and DeepLabv3+, a CNN-based architecture. Model training and inference were implemented in Python 3.9.13 using PyTorch 2.0, with CUDA 12.1. All computations were performed on a workstation equipped with an NVIDIA GeForce RTX 3090 GPU. The training configurations are summarized in Table 1 and the model architectures are illustrated in Fig. 3 (MedSAM) and Fig. 4 (DeepLabv3+).

Table 1

Training configuration parameters for MedSAM and DeepLabv3 + models used in this study. This includes optimizer settings, learning rate, weight decay, batch size, number of epochs, data augmentation, and loss function.
	MedSAM	DeepLabv3+
Optimizer	AdamW	Adam
Learning Rate	1e-4	4e-5
Weight Decay	0.01	0.001
Batch Size	2	8
Epochs	10	40
Data Augmentation	None	None
Loss Function	Dice + BCE	Dice + BCE
BCE: Binary Cross-Entropy

Evaluation metrics

In this study, we used nine evaluation metrics to evaluate the performance of MedSAM and DeepLabv3+. Dice Similarity Coefficient (DSC) measures the overlap between predicted and ground truth masks. Intersection over Union (IoU) evaluates the precision of boundary matching. Recall (Sensitivity) indicates the ability of the model to correctly detect positive structures. Precision (Positive Predictive Value) reflects the number of predicted positives that are truly positive. F1-score is the harmonic mean of Precision and Recall. Normalized Surface Distance (NSD) measures the normalized distance between the predicted and ground truth boundaries. The 95th percentile Hausdorff Distance (HD95) measures the normalized distance between the predicted and ground truth boundaries. Params (M) is the number of parameters in each model, expressed in millions. Floating Point Operations per second (FLOPs) represent the computational complexity of the model measured in floating-point operations.

Results

Both MedSAM and DeepLabv3 + demonstrated high segmentation performances in maxillary sinus delineation based on panoramic radiographs. The performance was evaluated using the DSC and IoU metrics.

MedSAM achieved a DSC of 0.9570 and an IoU of 0.9183, whereas DeepLabv3 + achieved a DSC of 0.9534 and an IoU of 0.9124. These results indicate that both models effectively captured the target region with a high spatial precision. Although MedSAM slightly outperformed DeepLabv3 + in both the DSC and IoU, the difference in performance was marginal.

In terms of computational efficiency, DeepLabv3 + exhibited a significantly lower parameter count and fewer FLOPs than MedSAM. This implies that DeepLabv3 + offers a lighter model architecture with reduced resource consumption, which makes it more suitable for deployment in systems with limited computational power. The detailed model performance statistics and training configurations are listed in Table 2. The representative segmentation outcomes for MedSAM are illustrated in Fig. 5, and those for DeepLabv3 + are presented in Fig. 6.

Table 2

Quantitative evaluation metrics for the segmentation performance of MedSAM and DeepLabv3 + models.
	MedSAM	DeepLabv3+
DSC	0.9570	0.9534
IoU	0.9183	0.9124
Precision	0.9507	0.9556
Recall	0.9644	0.9529
F1-score	0.9570	0.9534
NSD	0.9316	0.9283
HD95	0.0161	0.0199
Params (M)	90.49	26.68
FLOPs (G)	1487.95	593.97
DSC: Dice Similarity Coefficient; IoU: Intersection over Union; NSD: Normalized Surface Distance; HD95: 95th percentile Hausdorff Distance; FLOPs: Floating Point Operations per second

Discussion

This study comprehensively evaluated the performance of two advanced deep learning models, MedSAM and DeepLabv3+, for segmenting the anatomical boundaries of the maxillary sinus on panoramic radiographs. Both models demonstrated strong segmentation performances, effectively addressing common challenges in panoramic imaging, such as low-contrast boundaries and overlapping anatomical structures.

DeepLabv3+, a semantic segmentation model based on CNNs, exhibited a robust and stable performance [19]. The ASPP module was particularly effective in capturing multi-scale contextual features, which is an essential capability considering the anatomical variability of the maxillary sinus. The decoder path of the model also plays a key role in refining the boundary details [26], which is crucial for accurately outlining the curved and complex shape of the sinus. Compared with panoptic segmentation methods, which, while more being granular, require higher computational resources and detailed annotations, DeepLabv3 + offers a more practical and efficient solution for clinical applications that are focused on precise regional segmentation.

Nonetheless, MedSAM, a recent foundation model designed for medical image segmentation, also showed an excellent performance in segmenting the maxillary sinus. One of its primary strengths lies in its generalizability, derived from pre-training on ≥ 1.5 million medical images across various modalities and conditions [23]. This broad training allows MedSAM to perform zero-shot segmentation, displaying its ability to adapt to new anatomical structures, such as the maxillary sinus, without additional retraining [27]. Its transformer-based architecture effectively captures long-range dependencies and global anatomical context, making it suitable for dealing with complex sinus shapes, even in low-contrast or artefact-prone panoramic images.

While DeepLabv3 + operates in a fully automated manner after training, MedSAM relies on external prompts (for example, points or bounding boxes) to guide the segmentation. Although this requirement may seem to be a limitation, it allows MedSAM to adapt flexibly to various target structures with minimal additional labelling [28]. Interestingly, MedSAM often outperformed DeepLabv3 + in boundary clarity, particularly in cases with ambiguous or poorly defined borders. Nevertheless, DeepLabv3 + was advantageous in computational efficiency and localization precision.

Quantitatively, both models exhibited a high consistency with ground truth masks. MedSAM achieved an IoU of 0.918 and an F1 score of 0.957, whereas DeepLabv3 + achieved an IoU of 0.912 and an F1 score of 0.953. The high F1 scores indicate high precision and recall, indicating a low false detection rate and balanced segmentation quality. The boundary performance was further confirmed by NSD of 0.931 for MedSAM and 0.928 for DeepLabv3+, indicating high alignment with reference boundaries. HD95 values of 0.016 (MedSAM) and 0.019 (DeepLabv3+) suggest that the majority of segmented points were present within a 1-pixel margin of the ground truth, highlighting an exceptional spatial accuracy. These results are encouraging and comparable to previous sinus segmentation studies performed using other modalities and architectures; for example, studies using V-Net or U-Net on CT and CBCT datasets reported Dice scores of 0.91–0.94 and IoU values up to 0.927 [29–31].

Architectures, such as TransUNet and UNETR, incorporate self-attention to model global dependencies and structural relationships, which are particularly beneficial for segmenting complex anatomical regions [21]. A-UNETR, for instance, has demonstrated an impressive segmentation performance in sinus structures, achieving Dice scores of 0.93–0.94 and IoU values as high as 0.88 in panoramic radiograph datasets [32]. This demonstrates that panoramic-radiograph-based segmentation, when powered by advanced deep learning, can offer comparable accuracy despite modality differences.

Accurate segmentation of the maxillary sinus is clinically important for early diagnosis, treatment planning, and monitoring disease progression. Although manual segmentation is considered the gold standard, it is labor intensive and prone to inter-observer variability [33]. Automated segmentation offers consistency, efficiency, and scalability, making it suitable for real-world clinical workflows.

This study had some limitations. The dataset mainly consisted of healthy sinus cases, which may have contributed to the high accuracy scores. Previous CBCT studies involving inflamed sinuses have reported significantly lower Dice scores (~ 0.75–0.77) compared to normal sinuses (~ 0.92–0.93) [34]. Thus, future research should investigate the model performance in both healthy and pathological cases, ideally using CBCT images as a reference standard.

Overfitting is a critical concern in medical imaging owing to the limited number of annotated datasets. Model robustness can be improved by incorporating diverse multi-institutional datasets [35]. In this study, 1,046 images from three different imaging systems were used to mitigate overfitting and enhance generalization.

Future trends in AI-driven segmentation of the maxillary sinus in panoramic images may incorporate several advancements. These include the integration of foundation models with dental-specific fine-tuning, enabling a more robust performance across anatomical variations and imaging artefacts. Multimodal learning, which combines CBCT, panoramic radiography, and clinical data, can enhance model contextualization and diagnostic relevance.

Ultimately, combining prompt-based segmentation with personalized AI systems may redefine the diagnostic workflow in dental and maxillofacial imaging.

This study demonstrated that both MedSAM and DeepLabv3 + achieved excellent performances in automated maxillary sinus segmentation on panoramic radiographs, with DSC values exceeding 0.95. Although MedSAM exhibited marginally superior accuracy metrics, DeepLabv3 + offered significant computational advantages, making it more suitable for clinical deployment. These findings suggest that deep-learning-based automated segmentation could serve as a valuable tool for enhancing diagnostic capabilities and clinical workflow efficiency in dental practice, particularly when CBCT is not readily available or clinically indicated.

Future research should focus on validating these models across diverse populations and pathological conditions, as well as evaluating their integration in clinical practice through randomized controlled trials to assess their impact on treatment outcomes and clinical decision-making processes.

Declarations

Ethics approval and consent to participate

This retrospective study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Wonkwang University Dental Hospital (IRB No.: WKDIRB202107-01).

The requirement for written informed consent was waived due to the retrospective nature of the study and the use of anonymised radiographic data.

Consent for publication

Not applicable

Data Availability

The datasets used and/or analysed during the current study are available from the corresponding author upon reasonable request, subject to appropriate ethical approval and data sharing agreements.

Competing interests

The authors declare no competing or conflicts of interest related to this study.

Funding

This study was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (grant number: HI23C0544).

Author Contribution

Lee BD contributed to the conception, design, data acquisition, and interpretation, and drafted and critically revised the manuscript. Park YC and Yeom HG contributed to the design, data acquisition, and interpretation, and drafted and critically revised the manuscript. Lee SJ contributed to the data interpretation and drafted and critically revised the manuscript. Lee W contributed to the conception and critical revision of the manuscript. All authors gave their final approval for the version to be published and agreed to be accountable for all aspects of this work.

Acknowledgements

Not applicable

References

Mori M, Ariji Y, Katsumata A, Kawai T, Araki K, Kobayashi K, et al. A deep transfer learning approach for the detection and diagnosis of maxillary sinusitis on panoramic radiographs. Odontology. 2021;109:941–8.

Bayrakdar IS, Elfayome NS, Hussien RA, Gulsen IT, Kuran A, Gunes I, et al. Artificial intelligence system for automatic maxillary sinus segmentation on cone beam computed tomography images. Dento Maxillo Fac Radiol. 2024;53:256–66.

Ohba T, Ogawa Y, Shinohara Y, Hiromatsu T, Uchida A, Toyoda Y. Limitations of panoramic radiography in the detection of bone defects in the posterior wall of the maxillary sinus: An experimental study. Dento Maxillo Fac Radiol. 1994;23:149–53.

Fischborn AR, Andreis JD, Wambier LM, Pedroso CM, Claudino M, Franco GCN. Performance of panoramic radiography compared with computed tomography in the evaluation of pathological changes in the maxillary sinuses: A systematic review and meta-analysis. Dento Maxillo Fac Radiol. 2023;52:20230067.

Yamashita R, Nishio M, Do RKG, Togashi K. Convolutional neural networks: An overview and application in radiology. Insights Imaging. 2018;9:611–29.

Lakhani P, Sundaram B. Deep learning at chest radiography: Automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology. 2017;284:574–82.

Yasaka K, Akai H, Abe O, Kiryu S. Deep learning with convolutional neural network for differentiation of liver masses at dynamic contrast-enhanced CT: A preliminary study. Radiology. 2018;286:887–96.

Christ PF, Elshaer MEA, Ettlinger F, Tatavarty S, Bickel M, Bilic P, et al. Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields. In: Joskowicz S, Sabuncu L, Unal MR, Wells G, W, editors. Medical Image Computing and Computer-Assisted Intervention. Ourselin. Springer; 2016. pp. 415–23.

Duan W, Chen Y, Zhang Q, Lin X, Yang X. Refined tooth and pulp segmentation using U-Net in CBCT image. Dento Maxillo Fac Radiol. 2021;50:20200251.

10.

Chatzopoulos GS, Koidou VP, Tsalikis L, Kaklamanos EG. Artificial intelligence for detection and classification of furcation defects using radiographic imaging: A systematic review. Imaging Sci Dent. 2025;55:e32.

11.

Ariji Y, Yanashita Y, Kutsuna S, Muramatsu C, Fukuda M, Kise Y, et al. Automatic detection and classification of radiolucent lesions in the mandible on panoramic radiographs using a deep learning object detection technique. Oral Surg Oral Med Oral Pathol Oral Radiol. 2019;128:424–30.

12.

Yeom HG, Lee BD, Lee W, Lee T, Yun JP. Estimating chronological age through learning local and global features of panoramic radiographs in the Korean population. Sci Rep. 2023;13:21857.

13.

Shalbaf A, Bagherzadeh S, Maghsoudi A. Transfer learning with deep convolutional neural network for automated detection of schizophrenia from EEG signals. Phys Eng Sci Med. 2020;43:1229–39.

14.

Yao W, Bai J, Liao W, Chen Y, Liu M, Xie Y. From CNN to transformer: A review of medical image segmentation models. J Imaging Inf Med. 2024;37:1529–47.

15.

Cha JY, Yoon HI, Yeo IS, Huh KH, Han JS. Panoptic segmentation on panoramic radiographs: Deep learning-based segmentation of various structures including maxillary sinus and mandibular canal. J Clin Med. 2021;10:2577.

16.

Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical Image Computing and Computer-Assisted Intervention. Springer; 2015. pp. 234–41.

17.

Aşantoğrol F, Çiftçi BT. Analytical comparison of maxillary sinus segmentation performance in panoramic radiographs utilizing various YOLO versions. Eur J Ther. 2023;29:748–58.

18.

Zannah R, Bashar M, Mushfiq RB, Chakrabarty A, Hossain S, Jung YJ. Semantic segmentation on panoramic dental X-ray images using U-net architectures. IEEE Access. 2024;12:44598–612.

19.

Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, editors. Computer Vision – ECCV 2018. Springer; 2018. pp. 833–51.

20.

Woo S, Park J, Lee JY, Kweon IS. CBAM: convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, editors. Computer Vision – ECCV 2018. Cham: Springer International Publishing; 2018. pp. 3–19.

21.

Zhang C, Deng X, Ling SH. Next-gen medical imaging: U-Net evolution and the rise of transformers. Sens (Basel). 2024;24:4668.

22.

Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L et al. Segment anything. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). New York: IEEE; 2023. pp. 3992–4003.

23.

Ma J, He Y, Li F, Han L, You C, Wang B. Segment anything in medical images. Nat Commun. 2024;15:654.

24.

Wang P, Gu H, Sun Y. Tooth segmentation on multimodal images using adapted segment anything model. Sci Rep. 2025;15:13874.

25.

He Z, Wang Y, Li X. Deep learning-based detection of impacted teeth on panoramic radiographs. Biomed Eng Comput Biol. 2024;15:11795972241288319.

26.

Ketenci Çay F, Yeşil Ç, Çay O, Yılmaz BG, Özçini FH, İlgüy. D. DeepLabv3 + method for detecting and segmenting apical lesions on panoramic radiography. Clin Oral Investig. 2025;29:101.

27.

Liu Y, Li W, Wang C, Chen H, Yuan Y. When 3D partial points meets SAM: Tooth point cloud segmentation with sparse labels. In: Linguraru MG, Dou Q, Feragen A, Giannarou S, Glocker B, Lekadir K, et al. editors. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. Cham: Springer Nature Switzerland; 2024. pp. 778–88.

28.

Liao J, Wang H, Gu H, Cai Y. PPA-SAM: Plug-and-play adversarial segment anything model for 3D tooth segmentation. Appl Sci. 2024;14:3259.

29.

Choi H, Jeon KJ, Kim YH, Ha EG, Lee C, Han SS. Deep learning-based fully automatic segmentation of the maxillary sinus on cone-beam computed tomographic images. Sci Rep. 2022;12:14009.

30.

Xu J, Wang S, Zhou Z, Liu J, Jiang X, Chen X. Automatic CT image segmentation of maxillary sinus based on VGG network and improved V-Net. Int J Comput Assist Radiol Surg. 2020;15:1457–65.

31.

Ozturk B, Taspinar YS, Koklu M, Tassoker M. Automatic segmentation of the maxillary sinus on cone beam computed tomographic images with U-Net deep learning model. Eur Arch Otorhinolaryngol. 2024;281:6111–21.

32.

Park JH, Choi J, Yun JP, Yeom HG, Lee BD, Lee SJ. Attention-based U-Net Transformer for segmentation of maxillary sinus regions in panoramic X-ray images. IEMEK J Embed Syst Appl. 2025;20:83–9.

33.

Tingelhoff K, Eichhorn KWG, Wagner I, Kunkel ME, Moral AI, Rilk ME, et al. Analysis of manual segmentation in paranasal CT images. Eur Arch Otorhinolaryngol. 2008;265:1061–70.

34.

Jung SK, Lim HK, Lee S, Cho Y, Song IS. Deep active learning for automatic segmentation of maxillary sinus lesions using a convolutional neural network. Diagnostics (Basel). 2021;11:688.

35.

Ding H, Wu J, Zhao W, Matinlinna JP, Burrow MF, Tsoi JKH. Artificial intelligence in dentistry – A review. Front Dent Med. 2023;4:1085251.

Figure Legends

Fig. 2

An example of image annotation and binary mask generation. (a) Original panoramic radiograph. (b) Labelled image annotated using the VIA tool, with polygon coordinates defining the maxillary sinus boundary. (c) Binary mask automatically generated from the coordinate values, where the labelled region is marked as white (sinus: 1) and the background as black (0), that is used for deep learning model training.

Fig. 5

Segmentation performance of MedSAM on panoramic radiographs.

(a) An example of under-segmentation, where certain areas of the maxillary sinus boundary were not captured by the model.

(b) An example of over-segmentation, where the model erroneously predicted regions outside the actual maxillary sinus boundary.

(c) An example of accurate segmentation, showing high concordance between the ground truth (yellow) and the predicted mask (red-orange).

Fig. 6

Segmentation performance of DeepLabv3 + on panoramic radiographs. Ground truth masks are shown in yellow, and predicted masks are overlaid in red-orange.

(a) The model failed to capture portions of the maxillary sinus boundary.

(b) Accurate segmentation of the right maxillary sinus, while incomplete delineation on the left side. (c) Both maxillary sinuses exhibit poor boundary identification.

(d) The right maxillary sinus boundary is well segmented, whereas minor omissions are noted on the left side.

Figure 1

Fig. 1

An example of the labelling procedure using VIA™ (VGG Image Annotator). The boundary of the maxillary sinus is formed by a curved line connecting multiple points.

Fig. 2

(a)

Fig. 2

(b)

Fig. 2

(c)

Figure 3

Fig. 3

Workflow of the MedSAM architecture utilised in this study. The input panoramic radiograph is initially processed by an image encoder to generate image embeddings. A bounding box prompt is introduced into the prompt encoder, which conditions the subsequent mask decoder. The decoder then produces a segmentation mask corresponding to the region specified by the prompt. The final output highlights the maxillary sinus with a high anatomical fidelity based on the given prompt information.

Figure 4

Fig. 4

Architecture of the DeepLabv3 + model employed in this study. The encoder utilises a convolutional backbone with Atrous Spatial Pyramid Pooling (ASPP) to capture multi-scale contextual information. ASPP applies parallel atrous (dilated) convolutions with rates of 6, 12, and 18, along with 1×1 convolution and image-level pooling. These high-level features are concatenated and processed through a 1×1 convolution. In the decoder, low-level features from previous layers are combined with the upsampled output from the encoder via concatenation, followed by successive 3×3 convolutions and bilinear upsampling to generate a high-resolution semantic segmentation map of the maxillary sinus.

Fig. 5

(a)

Fig. 5

(b)

Fig. 5

(c)

Fig. 6

(a)

Fig. 6

(b)

Fig. 6

(c)

Fig. 6

(d)

Yes