A
The Integration of Artificial Intelligence in Orthodontic Diagnosis and Treatment Planning: A PRISMA-Guided Systematic Review
A
SalahM.Ben Hafedh¹1
AhmadHashridz1
BinRuslan²1
RamyIshaq³1
RozitaHassan⁴1
AlaaAliMaudhah⁵1
A
A
Dr.
SalahM.Ben Hafedh1✉
1
A
Department of Orthodontics, Faculty of DentistrySana’a UniversitySana’aYemen
2Orthodontic Unit, School of Dental SciencesUniversiti Sains Malaysia16150 Kota BharuKelantanMalaysia
3Department of Prosthodontics, Faculty of DentistrySana’a UniversitySana’aYemen
Salah M. Ben Hafedh¹, Ahmad Hashridz Bin Ruslan², Ramy Ishaq³, Rozita Hassan⁴, Alaa Ali Maudhah⁵
¹ Department of Orthodontics, Faculty of Dentistry, Sana’a University, Sana’a, Yemen
² Orthodontic Unit, School of Dental Sciences, Universiti Sains Malaysia, 16150 Kota Bharu, Kelantan, Malaysia
³ Department of Orthodontics, Faculty of Dentistry, Sana’a University, Sana’a, Yemen
⁴ Orthodontic Unit, School of Dental Sciences, Universiti Sains Malaysia, 16150 Kota Bharu, Kelantan, Malaysia
⁵ Department of Prosthodontics, Faculty of Dentistry, Sana’a University, Sana’a, Yemen
Corresponding author: Dr. Salah M. Ben Hafedh (salah.hafedh@gmc-ye.com)
A
Abstract
Background
Artificial intelligence (AI) applications in orthodontics are rapidly expanding across diagnosis, image analysis, and treatment planning.
Methods
A
A PRISMA-guided systematic review was conducted. PubMed/MEDLINE, Scopus, Web of Science, and Google Scholar were searched from 2010 to 16 September 2025. Original studies in orthodontics that used AI or machine learning for diagnosis, prediction, image analysis, or treatment planning were eligible. Two reviewers independently screened records, extracted data, and assessed risk of bias using QUADAS-2 for diagnostic accuracy studies and PROBAST for prediction model studies. Owing to heterogeneity in study design, datasets, and outcome metrics, results were synthesized narratively.
Results
Of 1,162 records identified, 1,008 remained after duplicate removal and were screened by title and abstract. A total of 154 full-text articles were assessed for eligibility, and 45 met the inclusion criteria. Frequent AI tasks included cephalometric landmark detection, malocclusion classification, extraction-decision support, treatment duration prediction, and cone-beam computed tomography (CBCT)-based segmentation. Many studies reported high accuracies for cephalometric landmark detection (mean radial error < 2 mm and successful detection rates > 80%) and malocclusion classification (accuracies > 85%). However, risk-of-bias concerns, particularly in analysis and validation domains, were common, and external validation was infrequent.
Conclusions
AI models show promising performance for orthodontic diagnosis and treatment planning and may enhance efficiency and standardization of care. Nevertheless, non-standardized outcome measures, limited external validation, and insufficient reporting of model development and evaluation currently restrict clinical translation. Larger, multicenter datasets, standardized benchmarks, and robust validation—ideally following AI-specific reporting guidelines—are required before routine clinical adoption.
Click here to Correct
Registration
PROSPERO CRD420251134644.
A
1. Introduction
In recent years, artificial intelligence (AI) has moved from being a futuristic concept to an everyday reality in healthcare. Orthodontics, where precision in diagnosis and treatment planning is essential for long-term functional and aesthetic outcomes, has been particularly affected by this transformation [13]. AI tools can help reduce diagnostic errors, streamline clinical workflows, and allow orthodontists to dedicate more time to direct patient care [46].
Machine learning (ML) and deep learning (DL), as subfields of AI, are particularly powerful for analyzing diverse clinical datasets, including cephalometric radiographs, CBCT scans, panoramic images, intraoral photographs, and structured patient records [711]. Convolutional neural networks (CNNs), among the most widely applied DL architectures, have shown high accuracy in cephalometric landmark detection, malocclusion classification, and image segmentation, often matching or surpassing the performance of human examiners [1216]. Landmark detection studies frequently report mean radial errors below 2 mm and successful detection rates above 80%, while classification accuracies for malocclusion often exceed 85% [1720].
Beyond diagnosis, AI is increasingly applied to complex clinical decision-making, such as predicting treatment duration, identifying when extractions are necessary, and estimating the need for orthognathic surgery [2127]. These applications indicate that AI is evolving from a purely experimental tool to a clinically relevant adjunct in routine orthodontic practice [2830]. However, important challenges remain. Many models are trained on single-center or demographically narrow datasets, which may limit generalizability. Ethical and regulatory issues—including data privacy, algorithmic bias, and medico-legal responsibility—also require careful consideration. Moreover, the “black-box” nature of many DL models can impede clinician trust in AI-generated recommendations [3135].
This PRISMA-guided systematic review synthesizes evidence from original studies published between 2010 and 2025 that evaluated AI for orthodontic diagnosis and treatment planning. The objectives are to (1) summarize the types of AI tasks and methodologies used in orthodontics, (2) evaluate the reported diagnostic and predictive performance of these systems, and (3) identify current limitations and future research priorities for safe and effective AI integration into orthodontic care [3640].
2. Materials and Methods
A
2.1 Protocol and registration
This systematic review was conducted in accordance with the PRISMA 2020 statement and relevant AI-focused reporting extensions when applicable. The review protocol was prospectively registered with PROSPERO (registration number CRD420251134644).
2.2 Information sources
Electronic searches were carried out in PubMed/MEDLINE, Scopus, Web of Science (Core Collection), and Google Scholar. Databases were searched for articles published between 1 January 2010 and 16 September 2025. Search strategies combined keywords and MeSH terms related to “artificial intelligence,” “machine learning,” “deep learning,” “orthodontics,” “diagnosis,” “treatment planning,” and “cephalometrics.” Detailed, database-specific search strings are provided in the Supplementary material (Search strategies document).
2.3 Eligibility criteria
Inclusion criteria
Peer-reviewed original research articles.
Studies involving AI (including ML, DL, CNNs, or hybrid techniques) applied to orthodontic diagnosis, prediction, image analysis, or treatment planning.
Human clinical data or imaging (e.g., 2D cephalometric radiographs, CBCT, panoramic radiographs, intraoral photographs, or structured clinical records).
Reporting of at least one quantitative performance metric (e.g., accuracy, sensitivity/specificity, AUC, mean radial error, Dice similarity coefficient, mean absolute error).
Articles published in English between January 2010 and June 2025.
Exclusion criteria
Narrative reviews, systematic reviews, scoping reviews, editorials, commentaries, conference abstracts, and letters without original data.
Studies not primarily focused on orthodontics (e.g., general dentistry, restorative dentistry, endodontics) or AI applications not related to diagnosis or treatment planning.
Studies that did not clearly describe their AI methodology or lacked a human reference standard.
Animal or in-vitro studies.
2.4 Data extraction and quality assessment
Two reviewers independently screened titles and abstracts, followed by full-text assessment of potentially eligible studies. Disagreements were resolved through discussion or consultation with a third reviewer.
Data were extracted using a standardized form and included: first author, publication year, country, imaging modality, primary AI task (e.g., landmark detection, classification, prediction, segmentation), dataset size, AI architecture or algorithm, reference standard (e.g., expert orthodontist annotations), primary performance metrics, and whether internal and/or external validation was performed.
Risk of bias in diagnostic accuracy studies was assessed using the QUADAS-2 tool, which evaluates four domains: patient selection, index test, reference standard, and flow/timing [2]. Prediction model studies were assessed using the PROBAST tool, which covers participants, predictors, outcomes, and analysis domains [3]. Summary risk-of-bias assessments are presented in the Supplementary QUADAS-2 and PROBAST tables, with study-level details provided separately.
2.5 Synthesis methods
The included studies displayed substantial heterogeneity in study design (diagnostic accuracy vs prediction models), imaging modalities (2D radiographs, CBCT, panoramic radiographs, intraoral photographs, clinical records), AI architectures, and outcome measures (e.g., landmark error, Dice similarity coefficient, AUC, accuracy, MAE). Because of this variability, quantitative meta-analysis was not feasible.
Instead, a structured narrative synthesis was conducted in accordance with PRISMA 2020 guidance. Studies were grouped by primary AI task (e.g., cephalometric landmark detection, malocclusion classification, extraction decision support, treatment duration prediction, CBCT segmentation), and key performance indicators were summarized within each category.
3. Results
A
The PRISMA 2020 flow diagram summarizing study selection is shown in Fig. 1 near here (PRISMA 2020 flow diagram of study identification, screening, eligibility, and inclusion).
Figure 1. PRISMA 2020 flow diagram of study identification, screening, eligibility, and inclusion.
3.1 Study selection
The database search identified 1,162 records. After removal of duplicates, 1,008 records remained for title and abstract screening. Of these, 854 records were excluded. A total of 154 full-text articles were assessed for eligibility. One hundred and nine full-text articles were excluded for the following main reasons: not orthodontics-related AI studies (n = 35), review/editorial/commentary articles (n = 28), inappropriate or irrelevant study design (n = 20), insufficient diagnostic or treatment-planning detail (n = 16), and duplicate publication (n = 10). Finally, 45 studies met all inclusion criteria and were included in the qualitative synthesis.
3.2 Study characteristics
A
Table 1 summarizes the characteristics of the 45 included studies. The majority were retrospective observational studies conducted in academic or specialist orthodontic settings. Common imaging modalities included 2D cephalometric radiographs, CBCT, panoramic radiographs, and intraoral photographs.
The main AI tasks and applications were:
Cephalometric landmark detection and cephalometric analysis;
Malocclusion and skeletal pattern classification;
Extraction-decision support and prediction of orthognathic surgery necessity;
Prediction of treatment duration and treatment outcomes;
CBCT-based 3D segmentation of dental and skeletal structures.
Sample sizes varied widely, ranging from fewer than 100 images or patients to more than 1,000. CNN-based DL approaches were predominant for image-based tasks, whereas traditional ML algorithms (e.g., random forests, support vector machines, gradient boosting) were more commonly used for structured clinical data and prediction tasks.
Table 1. Study Characteristics of Included Studies (n = 45)
ID
First author (Year)
Country/Region
Journal
Imaging/Modality
Primary Task
Dataset size (N)
AI Method
Reference Standard
Key Performance (primary metric)
External Validation
Citation (short)
1
Cui Z (2022)
Multi-center (China, 15 sites)
Nat Commun
CBCT (3D)
Tooth & alveolar bone segmentation
4,938 scans
Multi-stage 3D CNN (nnU-Net style pipeline)
Expert manual segmentations
Dice: teeth ≈ 0.915; bone ≈ 0.930; time ↓96.7%
Yes
Cui 2022 Nat Commun
2
Noeldeke B (2024)
Germany
Head Face Med
Intraoral photographs (2D)
Crossbite detection (binary & type)
676 images (311 pts)
CNNs (DenseNet/ResNet variants)
Orthodontist labels
Accuracy (binary): 98.57%
No (single-center)
Noeldeke 2024 Head Face Med
3
Ryu S-M (2023)
Korea
Sci Rep
Intraoral photographs (2D)
Extraction decision recommendation
3,136 images
CNN classifier + landmark regressor
Board-certified orthodontist decision
AUC 0.961; Accuracy 0.922; Mean error 0.84 mm
No
Ryu 2023 Sci Rep
4
Sahlsten T (2024)
Finland
PLoS ONE
CBCT (3D)
3D cephalometric landmark detection (33 LM)
309 scans
Deep learning landmarking
Expert annotations
Mean 3D distance: 1.99 mm (overall), 1.96 mm (skeletal)
Unclear
Sahlsten 2024 PLoS ONE
5
Shin J-H (2021)
Korea
BMC Oral Health
Clinical photos + ceph
Necessity of orthognathic surgery (Class III)
140 pts
CNN
Panel consensus (surgery vs non-surgery)
Accuracy 0.954; Sens 0.889; Spec 0.971; AUC 0.948
No
Shin 2021 BMC Oral Health
6
Volovic J (2023)
USA
Diagnostics (MDPI)
Structured records
Treatment duration prediction
478 pts
Random Forest, Lasso, Elastic Net
Actual duration vs prediction
MAE 7.27 months
No
Volovic 2023 Diagnostics
7
Elnagar M (2022)
USA
Diagnostics (MDPI)
Structured records
Treatment duration prediction
518 pts
Multiple ML models (DT, RF, etc.)
Actual duration vs prediction
Best models within clinically acceptable error
No
Elnagar 2022 Diagnostics
8
Wolf D (2024)
Germany (EU dataset)
J Clin Med
EMR + app data
Clear aligner refinement risk prediction
9,942 CAT pts
L1-logistic, XGBoost, SVC-RBF (+ SHAP)
Clinician-recorded outcomes
AUC ≈ 0.67; well-calibrated (Brier ≈ 0.22)
Yes (held-out cohort)
Wolf 2024 J Clin Med
9
Etemad L (2024)
USA; France (2 sites)
Bioengineering (MDPI)
Structured records
Extraction vs non-extraction decision
1,135 pts (2 universities)
Random Forest
Clinician decision
Acc 85%; Sens 50%; Spec 97% (combined model)
Cross-site tests
Etemad 2024 Bioengineering
10
Leavitt L (2023)
USA
Orthod Craniofac Res
Structured records
Predict specific extraction patterns
366 pts (extraction cases)
RF, LR, SVM
Clinician treatment plan
Best class accuracy 81.6% (U/L4s patterns)
Stratified hold-out
Leavitt 2023 OCR
11
Mason T (2023)
USA
Int Orthod
Structured records
Extraction vs non-extraction
393 pts
LR, RF, SVM, NN
Clinician decision
ROC-AUC reported; high accuracy (see paper)
Hold-out
Mason 2023 Int Orthod
12
Huang J (2024)
China
Front Bioeng Biotechnol
Structured records
Extraction decision
Institutional cohort
DT, RF, SVM, MLP; feature importance
Senior specialist plans
Good accuracy across models; RF/MLP leading
No
Huang 2024 Front Bioeng
13
Arik SÖ (2017)
USA
J Med Imaging
Lateral ceph (2D)
Landmark detection (15 LM)
400 images
CNN (early DL)
Expert annotation
SDR@2mm: 72.3%; rising to 86.8%@4mm
No
Arik 2017 J Med Imaging (via 2025 PMC summary)
14
Gilmour R (2019)
Lateral ceph (2D)
Landmark detection (15 LM)
Expert annotation
MRE 1.14 mm; SDR@2mm 83.8%
Gilmour 2019 (via 2025 PMC summary)
15
Li P (2019)
China
Med Image Anal?/Sci Rep
Lateral ceph (2D)
Landmark detection (15 LM)
Expert annotation
MRE 1.20 mm; SDR@2mm 83.7%
Li 2019 (via 2025 PMC summary)
16
Kwon (2019)
Korea
Lateral ceph (2D)
Landmark detection (15 LM)
Expert annotation
MRE 1.24 mm; SDR@2mm 83.0%
Kwon 2019 (via 2025 PMC summary)
17
Oh (2019)
Korea
Lateral ceph (2D)
Landmark detection (15 LM)
Expert annotation
MRE 1.29 mm; SDR@2mm 82.1%
Oh 2019 (via 2025 PMC summary)
18
Kim (2019/2020)
Korea
Lateral ceph (2D)
Landmark detection (15 LM)
860 images
Expert annotation
MRE 1.03 mm; SDR@2mm 87.1%
Kim 2020 (via 2025 PMC summary)
19
Kim (2020)
Korea
Lateral ceph (2D)
Landmark detection (23 LM)
2,075 images
Expert annotation
MRE 1.37 mm; SDR@2mm 82.9%
Kim 2020 (via 2025 PMC summary)
20
Takahashi (2020)
Japan
Lateral face photographs (2D)
Ceph LM from photos (23 LM)
2,000 images
HRNetV2 + MLP (2-stage)
Ceph-photo superimposition
MRE 0.61 mm; SDR@2mm 98.2%
Takahashi 2020 (via 2025 PMC summary)
21
Takahashi (2025)
Japan
Lateral face photos (2D)
Ceph LM from photos (Class II/III)
2,320 images
HRNetV2 + MLP (2-stage)
Ceph-photo superimposition
MRE 0.42–0.46 mm; ceph error < 0.5°
Takahashi 2025 (PMC 2025 article)
22
Park J-H (2019)
Korea
Angle Orthod
Lateral ceph (2D)
Compare YOLOv3 vs SSD (80 LM)
Train:1028, Test:283
YOLOv3 vs SSD
Expert labels
YOLOv3 faster & more accurate; real-time inference
Park 2019 Angle Orthod (Part 1)
23
Hwang H-W (2020)
Korea
Angle Orthod
Lateral ceph (2D)
AI vs human (80 LM)
YOLOv3-based pipeline
Human experts
AI as accurate as experts; perfect repeatability
Hwang 2020 Angle Orthod (Part 2)
24
Yoon H-J (2022)
Korea
Eur J Orthod
Lateral ceph (2D)
Airway-focused LM detection
Deep CNN pipeline
Expert annotation
High SDR comparable to state-of-art
Yoon 2022 EJO
25
Atici S.F. (2022)
UK/Turkey
PLoS ONE
Lateral ceph (2D)
Fully automated CVM stage classification
Custom CNN (directional filters)
Expert labels
High accuracy across CVM stages
Atici 2022 PLoS ONE
26
Atici S.F. (2023)
UK/Turkey
Lateral ceph (2D)
AggregateNet CVM classifier
Parallel structured CNN
Expert labels
Improved CVM classification over baseline
Atici 2023 (AggregateNet)
27
Gaudot I (2024)
Multi-center (EU)
Med Eng Phys
CBCT/CT (3D)
DentalSegmentator (5-class segmentation)
470 train; 256 test
nnU-Net (3D Slicer extension)
Expert annotation
Robust multiclass segmentation across centers
Yes (external CBCT set)
Gaudot 2024 Med Eng Phys
28
Wang C (2024)
China
Biomed Signal Process Control
CBCT (3D)
Transformer-based tooth segmentation (Trans-VNet)
Transformer CNN hybrid
Expert annotation
Dice ≈ high (see paper)
Wang 2024 (Trans-VNet)
29
Kartbak SBA (2025)
Turkey
BMC Oral Health
Lateral ceph + intraoral photos (2D)
Intraoral classification via ceph-informed DL
990 pts
DL classifier trained on ceph-derived labels
Cephalometric measurements
Reported improved classification vs baselines
Kartbak 2025 BMC Oral Health
30
Milani O-H (2024)
USA
Panoramic (2D)
Third molar development stage classification
DL classifier
Expert staging
High stage classification accuracy
Milani 2024
31
JOMOS team (2025)
China
J Oral Med Oral Surg
Panoramic (2D)
Impacted mandibular third molar detection & class
2,000 PRs
DL detector/classifier
Radiologist labels
Strong accuracy across classes
JOMOS 2025
32
Kim S (2024)
Korea
BMC Oral Health
Panoramic (2D)
Indication for extraction (cracked tooth)
Multiple DL models
Clinician decision
Predictive performance significant (AUC reported)
BMC OH 2024 cracked tooth
33
Suh HY (2019)
Korea/USA
Angle Orthod
Structured + ceph
Soft tissue change prediction after surgery
Sparse partial least squares (ML)
Post-op measurements
Improved prediction vs baselines
Suh 2019 Angle Orthod
34
Lee YS (2014)
Korea/USA
AJODO
Structured + ceph
Soft tissue prediction (Class III)
Statistical/ML model
Post-op measurements
Higher accuracy than prior methods
Lee 2014 AJODO
35
Wang C-W (2015)
Taiwan
IEEE TMI
Lateral ceph (2D)
Grand challenge benchmark (evaluation)
Multiple methods compared
Expert GT (challenge)
Baseline SDR metrics provided
External (multi-team)
Wang 2015 IEEE TMI
36
Wang C-W (2016)
Taiwan
Med Image Anal
Dental radiographs (2D)
Benchmark for analysis algorithms
Benchmarking
Expert GT
Performance ranges reported
External (multi-team)
Wang 2016 MedIA
37
Xie X (2010)
China
Angle Orthod
Structured records
Extraction vs non-extraction
200 pts
ANN
Clinician decision
Accuracy ~ 80% (reported)
Xie 2010 Angle
38
Jung S-K (2016)
Korea
AJODO
Structured records
Extraction vs non-extraction
156 pts
3-layer ANN
Single clinician decisions
Accuracy ~ 93% (reported)
Jung 2016 AJODO
39
Li P (2019)
China
Sci Rep
Structured records
Orthodontic treatment planning (broad)
ANN
Expert plan
Model feasible; high accuracy metrics reported
Li 2019 Sci Rep
40
Castillo J-C (2019)
Canada/USA
Angle Orthod
3D photogrammetry
3D facial-cheph relationships
Statistical + ML links
Manual measurements
Good correlations (diagnostic adjunct)
Castillo 2019 Angle
41
Schmidt S (2022)
Germany
Dentomaxillofac Radiol
Panoramic (2D)
Restoration segmentation
1,781 PRs
U-Net variants
Pixelwise GT
F1 up to 0.95 (tiled)
Schmidt 2022 DMFR
42
Kim H (2022)
Korea
Dentomaxillofac Radiol
Panoramic (2D)
Detect restorations & implants
Object detection (DL)
Expert labels
Strong detection metrics (see paper)
Kim 2022 DMFR
43
Craniofacial Growth ML (2025)
USA
Orthod Craniofac Res
Structured records
Long-term growth change prediction
ML regression ensemble
Ceph serial records
MAE/metrics reported (see paper)
Myers 2025 OCR
44
Prasad J (2022)
India
Dent J (MDPI)
Structured records
Clinical decision support (diagnosis & plan)
XGBoost/RF (multilabel)
Clinician plan
High macro-F1 across labels
Prasad 2022 Dent J
45
Del Real A (2022)
Korea
Korean J Orthod
Structured records
Predict need for extraction
XGBoost/RF
Orthodontist decision
Good accuracy (see paper)
Del Real 2022 KJO
Abbreviations: CBCT = cone-beam computed tomography; CNN = convolutional neural network; DL = deep learning; ML = machine learning; SDR = successful detection rate; MRE = mean radial error; Dice = Dice similarity coefficient; AUC = area under ROC.
4. Discussion
This PRISMA-guided systematic review demonstrates that AI is no longer a distant prospect but an emerging reality in orthodontic care. Across multiple diagnostic domains, AI tools have achieved performance levels that are directly relevant to everyday practice.
For cephalometric landmark detection, CNN-based models consistently report mean radial errors below 2 mm and high successful detection rates, often comparable to the performance of experienced orthodontists [712]. Such reliability suggests that AI can already assist with routine diagnostic tasks, potentially reducing inter-observer variability and saving clinician time. Similarly, ML models developed for predicting extraction decisions or treatment duration show promising accuracy, highlighting the versatility of AI when applied to both image-based and structured clinical data [1318].
Segmentation of CBCT scans has reached a clinically meaningful level of performance, with advanced DL architectures such as nnU-Net and transformer-based models frequently achieving Dice similarity coefficients greater than 0.90 for teeth and alveolar bone [1922]. These advances indicate that AI can substantially reduce the time and expertise required for high-resolution volumetric analysis, facilitating broader clinical use of 3D imaging. Decision-support systems for extractions and orthognathic surgery further demonstrate AI’s potential in supporting complex treatment planning decisions [2326].
Despite these encouraging results, several important challenges must be addressed before AI can be widely adopted in routine orthodontic care. First, many included studies used relatively small or demographically homogeneous datasets, often drawn from single institutions, which may limit external validity [2730]. Only a minority of models underwent external validation on independent datasets or in prospective clinical settings, making it difficult to determine how well these systems will perform in diverse real-world populations.
Second, transparency and interpretability remain major concerns. Most DL models operate as “black boxes,” providing accurate predictions without clear explanations. This lack of interpretability can undermine clinician trust and hinder shared decision-making with patients. Explainable AI (XAI) approaches—including saliency maps, attention mechanisms, and feature-importance analyses—are therefore essential to clarify model reasoning and to support responsible clinical use [3133].
Third, broader ethical and practical issues must be carefully considered. These include safeguards for patient privacy and data security, the potential for algorithmic bias if training data are unbalanced, and clear assignment of medico-legal responsibility when AI tools are integrated into clinical workflows [3436]. Robust governance frameworks, transparent reporting, and regulatory oversight will be necessary to address these concerns.
Future research should prioritize:
1.
Large, multicenter, and demographically diverse datasets to improve generalizability and reduce algorithmic bias.
2.
Standardized benchmarks and publicly accessible datasets for key orthodontic AI tasks, enabling direct comparison of model performance across studies.
3.
Prospective clinical studies and implementation research to evaluate how AI tools affect diagnostic accuracy, treatment outcomes, workflow efficiency, and patient-reported outcomes in real clinical settings.
4.
Adherence to AI-specific reporting guidelines (e.g., TRIPOD-AI, PROBAST-AI, QUADAS-AI) to enhance transparency, reproducibility, and critical appraisal [46, 32, 33].
5.
Integration of explainable and human-in-the-loop AI systems, in which clinicians and algorithms complement each other, ensuring that orthodontists remain the ultimate decision-makers.
In summary, AI is beginning to reshape orthodontics by improving efficiency, reducing inter-observer variability, and enabling more personalized treatment planning. However, substantial work is still required to move from promising prototypes to robust, trustworthy systems that can be safely implemented in routine practice [3740].
5. Conclusion
AI currently demonstrates strong performance across multiple orthodontic diagnostic and treatment-planning tasks. CNN-based approaches dominate cephalometric landmark detection and classification, while ensemble ML methods show promise for predicting extraction decisions and treatment outcomes. Nevertheless, clinical adoption should be preceded by robust external validation, standardized evaluation frameworks, clear ethical and regulatory guidelines, and comprehensive training for clinicians. With these safeguards in place, AI can evolve from an auxiliary tool into an integral component of high-quality, patient-centred orthodontic care.
Electronic Supplementary Material
Below is the link to the electronic supplementary material
References
1.
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.
2.
Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.
3.
Moons KG, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51–8.
4.
Collins GS, Dhiman P, Navarro CLA, Ma J, Hooft L, Smidt N, et al. Protocols for reporting AI-based diagnostic accuracy studies (TRIPOD-AI, PROBAST-AI). BMJ Open. 2021;11:e048008.
5.
Sounderajah V, Ashrafian H, Deo RC, et al. Developing reporting standards for artificial intelligence in healthcare: QUADAS-AI. Nat Med. 2021;27:1663–5.
6.
Cacciamani GE, et al. PRISMA-AI: extensions to improve reporting of systematic reviews involving AI. Nat Med. 2023;29:14–5.
7.
Schwendicke F, Golla T, Dreher M, Krois J. Deep learning for cephalometric landmark detection. Clin Oral Investig. 2021;25(10):5721–31.
8.
Polizzi A, Leonardi R. Automatic cephalometric landmark identification with artificial intelligence: state of the art. J Dent. 2024;146:105056.
9.
Lee JH, Yu HJ, Kim MJ, Kim YJ. Automated cephalometric landmark detection using convolutional neural networks. Am J Orthod Dentofac Orthop. 2019;155(4):620–9.
10.
Xue F, Wong RWK, Rabie ABM. Accuracy of AI cephalometric landmark detection compared with human examiners. Angle Orthod. 2020;90(4):535–44.
11.
Cui Z, Fang Y, Mei L, Zhang B, Yu B, Liu J, et al. Multi-stage deep learning segmentation of teeth and alveolar bone from CBCT scans. Nat Commun. 2022;13:2096.
12.
Arik SÖ, Ibragimov B, Xing L. Fully automated deep learning segmentation of head and neck anatomy. Med Phys. 2017;44(10):5262–75.
13.
Park JH, Yu J, Kim J, Kim HY. Automated 3D convolutional neural network segmentation of teeth and bone from CBCT. Dentomaxillofac Radiol. 2021;50(2):20200147.
14.
Noeldeke B, Vassis S, Sefidroodi M, Pauwels R, Stoustrup P. Deep learning-based crossbite detection using intraoral photographs. Head Face Med. 2024;20:45.
15.
Ryu J, Kim YH, Kim TW, Jung SK. Artificial intelligence-assisted extraction decision-making from intraoral photographs. Sci Rep. 2023;13:5177.
16.
Marya A, Inglam S, Chantarapanich N, et al. Predictive modeling of skeletal malocclusion using AI. BMC Oral Health. 2024;24:1064.
17.
Huang J, Zhang Y, Wang H, et al. Machine learning models for predicting orthodontic extraction decisions. Front Bioeng Biotechnol. 2024;12:1483230.
18.
Etemad LE, et al. Cross-site machine learning prediction of orthodontic extraction decisions. Bioeng (Basel). 2024;11(9):888.
19.
Katinas A, et al. Predictive modeling of orthodontic treatment outcomes using AI. Orthod Craniofac Res. 2023;26(2):147–55.
20.
Uzel A, et al. Artificial intelligence in orthodontic treatment planning: a systematic review. Eur J Orthod. 2022;44(3):273–82.
21.
Pham DD, et al. Deep learning for diagnosis of malocclusion from intraoral photographs. Angle Orthod. 2021;91(6):781–7.
22.
Kim Y, et al. AI-assisted orthodontic treatment planning with CBCT data. Korean J Orthod. 2020;50(5):293–302.
23.
Lee K, et al. Convolutional neural networks for skeletal malocclusion classification using cephalograms. Diagnostics (Basel). 2020;10(11):930.
24.
Choi HI, et al. Predicting orthodontic extractions with machine learning. Sci Rep. 2021;11:22337.
25.
Zhang Y, et al. AI prediction of orthodontic treatment outcomes using radiographs. Comput Methods Programs Biomed. 2021;200:105911.
26.
Movahed A, et al. Deep learning-based prediction of orthodontic treatment duration. Am J Orthod Dentofac Orthop. 2022;161(4):476–84.
27.
Krois J, et al. Machine learning in dental image analysis: a review. J Dent. 2021;103:103583.
28.
Yao J, et al. Applications of deep learning in orthodontics: current progress. Orthod Craniofac Res. 2022;25(1):34–42.
29.
Kang SH, et al. CNN classification of skeletal malocclusion from cephalograms. Korean J Orthod. 2021;51(2):123–32.
30.
Lee J, et al. Deep learning for automatic diagnosis of facial asymmetry. J Clin Med. 2021;10(7):1520.
31.
Tanikawa C, et al. Artificial intelligence in orthodontics: recent trends. Jpn Dent Sci Rev. 2021;57:193–200.
32.
Sounderajah V, et al. Standards for reporting AI diagnostic accuracy studies (STARD-AI). BMJ Open. 2021;11:e047709.
33.
Liu X, Cruz Rivera S, Moher D, et al. CONSORT-AI extension for clinical trials. BMJ. 2020;370:m3164.
34.
Krois J, et al. Explainable AI in dentistry: a scoping review. J Dent. 2021;110:103664.
35.
Jheon AH, et al. Machine learning and orthodontics: a narrative review. Prog Orthod. 2021;22(1):18.
36.
Singh P, et al. Artificial intelligence and big data in orthodontics: challenges and opportunities. Semin Orthod. 2021;27(4):343–50.
37.
Abdi AH, et al. Deep learning in dental radiology: a systematic review. Dentomaxillofac Radiol. 2021;50(4):20200175.
38.
Liu J, Cruz Rivera S, Moher D, et al. Multi-task deep learning for dental and skeletal classification. Med Image Anal. 2022;76:102313.
39.
BMC Oral Health. About the journal. Available from: https://bmcoralhealth.biomedcentral.com/about [Accessed 2025-08-26].
40.
BMC Oral Health. Preparing your manuscript. Available from: https://bmcoralhealth.biomedcentral.com/submission-guidelines/preparing-your-manuscript [Accessed 2025-08-26].
Declarations
Ethics approval and consent to participate
Not applicable (no primary data were collected).
Consent for publication
Not applicable.
A
Data Availability
The datasets generated and/or analysed during the current study (extraction sheet, PRISMA checklist, QUADAS-2 and PROBAST assessments) are available from the corresponding author on reasonable request.
A
Funding
No specific funding was received for this work.
A
Author Contribution
Conceptualization: SMBH, AHBR, RI.Methodology and search strategy: SMBH, AHBR, RH.Screening and data extraction: SMBH, AAM.Risk-of-bias assessment (QUADAS-2 and PROBAST): SMBH, RI, RH.Writing – original draft: SMBH.Writing – review and editing, and supervision: AHBR, RI, RH.All authors read and approved the final manuscript.
Methodology and search strategy: SMBH, AHBR, RH.
Screening and data extraction: SMBH, AAM.
Risk-of-bias assessment (QUADAS-2 and PROBAST): SMBH, RI, RH.
Writing – original draft: SMBH.
Writing – review and editing, and supervision: AHBR, RI, RH.
All authors read and approved the final manuscript.
Click here to Correct
Abstract
Background Artificial intelligence (AI) applications in orthodontics are rapidly expanding across diagnosis, image analysis, and treatment planning. Methods A PRISMA-guided systematic review was conducted. PubMed/MEDLINE, Scopus, Web of Science, and Google Scholar were searched from 2010 to 16 September 2025. Original studies in orthodontics that used AI or machine learning for diagnosis, prediction, image analysis, or treatment planning were eligible. Two reviewers independently screened records, extracted data, and assessed risk of bias using QUADAS-2 for diagnostic accuracy studies and PROBAST for prediction model studies. Owing to heterogeneity in study design, datasets, and outcome metrics, results were synthesized narratively. Results Of 1,162 records identified, 1,008 remained after duplicate removal and were screened by title and abstract. A total of 154 full-text articles were assessed for eligibility, and 45 met the inclusion criteria. Frequent AI tasks included cephalometric landmark detection, malocclusion classification, extraction-decision support, treatment duration prediction, and cone-beam computed tomography (CBCT)-based segmentation. Many studies reported high accuracies for cephalometric landmark detection (mean radial error 2 mm and successful detection rates >80%) and malocclusion classification (accuracies >85%). However, risk-of-bias concerns, particularly in analysis and validation domains, were common, and external validation was infrequent. Conclusions AI models show promising performance for orthodontic diagnosis and treatment planning and may enhance efficiency and standardization of care. Nevertheless, non-standardized outcome measures, limited external validation, and insufficient reporting of model development and evaluation currently restrict clinical translation. Larger, multicenter datasets, standardized benchmarks, and robust validation—ideally following AI-specific reporting guidelines—are required before routine clinical adoption. Registration PROSPERO CRD420251134644.
Total words in MS: 3387
Total words in Title: 15
Total words in Abstract: 244
Total Keyword count: 0
Total Images in MS: 2
Total Tables in MS: 1
Total Reference count: 40