Phenotype-guided post-triage of combined screening–positive pregnancies using an interpretable decision-tree model

Paula Idalia Szenejko 1✉ Emailpaula.szenejko@gmail.com

Szymon Plotka 1

Marcin Wiechec 4

Filip Dabrowski 1

1 Doctoral School of Translational Medicine Centre of Postgraduate Medical Education Warsaw Poland

2 Faculty of Mathematics and Computer Science Jagiellonian University Kraków Poland

3 Department of Gynecology and Obstetrics Jagiellonian University Kraków Poland

4 MWU DOBRE USG Centre of Ultrasound Diagnostics Kraków Poland

5 Department of Gynecology and Gynecological Oncology Centre of Postgraduate Medical Education CMKP Warsaw Poland

Paula Idalia Szenejko1,*, Szymon Plotka2, Marcin Wiechec3,4, and Filip Dabrowski5,

1 Doctoral School of Translational Medicine, Centre of Postgraduate Medical Education, Warsaw, Poland

2 Faculty of Mathematics and Computer Science, Jagiellonian University, Kraków, Poland

3 Department of Gynecology and Obstetrics, Jagiellonian University, Kraków, Poland

4 MWU DOBRE USG Centre of Ultrasound Diagnostics, Kraków, Poland

5 Department of Gynecology and Gynecological Oncology, Centre of Postgraduate Medical Education CMKP, Warsaw, Poland

*paula.szenejko@gmail.com

Keywords:

Prenatal screening

Fetal aneuploidy

First-trimester screening

Interpretable machine learning

Phenotypic profiling

False-positive reduction

Abstract

The first-trimester combined screening test (CST) increases aneuploidy risk but results in a substantial false-positive burden, with most CST-positive pregnancies ultimately being euploid. Optimizing decision-making within this already risk-enriched population remains a major clinical challenge. This retrospective observational study evaluated a phenotype-guided post-triage (PGT) model applied exclusively to CST-positive pregnancies using the use of routinely collected maternal, biochemical, and ultrasound markers. An interpretable classification and regression tree (CART) model was developed to distinguish true-positive aneuploid pregnancies from false-positive euploid pregnancies. The dataset was split into training, validation, and independent test cohorts, and model performance was assessed in the test cohort. A prespecified positive predictive value–priority operating point was selected to minimize false-positive classification while maintaining acceptable aneuploidy detection. Among the 5,015 CST-positive pregnancies, the aneuploidy prevalence was 4.0% in the test cohort. The PGT model demonstrated strong discrimination comparable to baseline CST risk (area under the receiver operating characteristic curve of 0.93 for both), indicating unchanged global discrimination despite post-triage refinement. At the conventional CST cutoff of 1:300, 52% of euploid pregnancies were classified as screen-positive; application of the PGT model reduced false-positive classification to 3%. A false-positive reduction was observed across guideline-defined CST risk bands, and all aneuploid pregnancies misclassified as negative were detectable by cell-free DNA testing. Phenotype-guided post-triage substantially reduces false-positive classification among CST-positive pregnancies without altering global discrimination, supporting a decision-focused evaluation paradigm for postscreening tools and more efficient use of downstream prenatal testing.

Introduction

The first-trimester combined screening test (CST) integrates maternal age, fetal nuchal translucency, and maternal serum biochemistry to estimate individualized risk for fetal aneuploidy and has been widely implemented in population-based screening programs. Early evaluations revealed that approximately 5% of screened pregnancies would be classified as screenpositive at conventional risk cutoffs, enabling targeted referral for confirmatory testing while maintaining acceptable detection rates [1]. However, subsequent real-world analyses have demonstrated substantially higher proportions of CST-positive results in contemporary practice, with most affected pregnancies ultimately proving euploid, resulting in a high false-positive burden [2, 3].

Cell-free DNA (cfDNA) testing has improved secondary screening performance and is frequently used as a reflex test following a positive CST result. Nevertheless, universal reflex cfDNA testing for all CST-positive pregnancies remains challenging in many healthcare systems because of financial, organizational, and reimbursement constraints [4, 5]. As a result, contingent and stepwise screening strategies have been proposed to optimize downstream testing while preserving safety for pregnancies at genuinely increased risk [6, 7]. In this context, the principal challenge is not to improve primary screening discrimination but rather to refine decision-making within the already risk-enriched CST-positive subgroup, where the yield of additional testing varies substantially despite similar CST-derived risk estimates.

Accumulating evidence indicates that CST-positive pregnancies are phenotypically heterogeneous. Sonographic markers such as an absent or hypoplastic nasal bone and abnormal ductus venosus flow are consistently associated with aneuploidy [8, 9], whereas isolated biochemical deusingtions may generate elevated CST risks despite otherwise reassuring ultrasound findings [2, 3]. These observations suggest that aggregated CST risk does not fully capture biologically meaningful differences among screen-positive pregnancies.

Recent clinical evidence further supports phenotype-aware interpretation. Kosiński et al. reported that markedly elevated pregnancy-associated plasma protein A (PAPP-A) or free β-human chorionic gonadotropin (free β-hCG) levels in the first trimester are not associated with increased rates of aneuploidy or adverse outcomes when fetal anatomy is normal [10]. This finding underscores that extreme biochemical values, when occurring in isolation, may represent benign biological variation rather than pathological risk, motivating the development of post-screening approaches that explicitly account for the phenotypic context.

Interpretable machine learning methods, such as classification and regression trees (CARTs), provide a transparent framework for formalizing phenotypic patterns into rule-based decision structures that can be readily evaluated by clinicians [11–14]. When applied as post-triage tools within CST-positive populations, such models may enable more precise identification of pregnancies with low expected yields from reflex cfDNA or invasive testing. The aim of this study was therefore to develop and evaluate an interpretable, phenotype-guided CART-based post-triage model applied exclusively among CST-positive pregnancies, with the objective of reducing false-positive classification at a prespecified operating point while maintaining alignment with guideline-defined risk strata.

Results

Study population

Among the CST-positive pregnancies, 5,015 met the eligibility criteria and were randomly split into training (n = 3,009), validation (n = 1,003), and test (n = 1,003) cohorts. The incidence of aneuploidy was 4.1% in the training cohort and 4.0% in both the validation and test cohorts. Baseline characteristics were comparable across cohorts (Table 1). Participant inclusion, exclusions, and dataset splitting are reported in a STARD-compliant flow diagram (Fig. 1).

Table 1

Baseline characteristics of CST-positive pregnancies in the phenotype-guided posttriage (PGT) modeling cohort
Characteristic	Train (n = 3,009)	Validation (n = 1,003)	Test (n = 1,003)
Outcome
Euploid, n (%)	2,887 (95.9)	963 (96.0)	963 (96.0)
Aneuploid, n (%)	122 (4.1)	40 (4.0)	40 (4.0)
Continuous markers, mean ± SD
Maternal age, y	35.90 ± 4.66	35.86 ± 4.82	35.81 ± 4.79
Nuchal translucency, mm	2.10 ± 0.87	2.09 ± 0.94	2.10 ± 0.96
Crown–rump length, mm	66.05 ± 8.24	66.01 ± 8.36	65.87 ± 8.13
Fetal heart rate, bpm	159.12 ± 7.81	159.09 ± 7.89	159.59 ± 7.59
PAPP-A, MoM	0.77 ± 0.46	0.78 ± 0.46	0.75 ± 0.45
Free β-hCG, MoM	1.69 ± 1.20	1.69 ± 1.18	1.68 ± 1.21
Ultrasound markers, n (%) abnormal
Abnormal nasal bone	115 (3.8)	33 (3.3)	29 (2.9)
Abnormal ductus venosus flow	187 (6.2)	62 (6.2)	79 (7.9)
Tricuspid regurgitation	97 (3.2)	32 (3.2)	35 (3.5)
Single umbilical artery	51 (1.7)	24 (2.4)	23 (2.3)

Fig. 1

Patient flow and dataset construction for phenotype-guided post-triage analysis.

This flow diagram illustrates the stepwise inclusion and exclusion of pregnancies undergoing first-trimester combined screening (CST). Pregnancies were excluded because of missing pregnancy outcome, crown–rump length (CRL) outside the valid first-trimester range (45–84 mm), missing required continuous predictors, missing CST risk estimates, or CST-negative classification (minimum risk < 1:1000). The final CST-positive cohort (N = 5,015) was randomly divided into training (60%), validation (20%), and test (20%) subsets with stratification by outcome.

Fig. 2

Refined phenotype-guided post-triage (PGT) decision tree model applied to CST-positive pregnancies.

Classification and regression tree (CART) illustrating the phenotype-guided post-triage model developed to distinguish true-positive aneuploid pregnancies from benign CST false-positive results. Internal nodes represent decision rules based on routinely collected sonographic and biochemical markers. Terminal nodes define triage-positive and triage-negative classifications according to the expected yield of downstream testing, supporting decision-focused refinement within the CST-positive population.

Model discrimination

In the CST-positive test cohort, the phenotype-guided post-triage (PGT) model demonstrated strong discrimination (AUC = 0.93, 95% CI 0.91–0.95), comparable to the baseline CST risk (AUC = 0.93, 95% CI 0.91–0.95). There was no statistically significant difference between AUCs (DeLong test P = 1.00), indicating unchanged global discrimination despite post-triage refinement (Fig. 1, Table 2).

Table 2

Pretriage and post-triage performance at defined operating points in the CST-positive test cohort (n = 1,003)
Model	Interpretation stage	Operating point	FP/Euploid	FP risk	RR (95% CI)	OR (95% CI)	AUC
CST (min_risk)	Pretriage	1:300	498/963	0.52	Reference	Reference	0.93
PGT model	Post-triage	Prespecified PPV-priority operating point (threshold = 0.91)	27/963	0.03	0.05 (0.04–0.08)	0.03 (0.02–0.04)	0.93

False-positive reduction at operating points

At the conventional CST cutoff of 1:300, 52% of euploid pregnancies in the test cohort were classified as screen positive. The application of the PGT model at the prespecified PPV-priority operating threshold reduced false-positive classification to 3%. This corresponded to a relative risk of 0.05 (95% CI 0.04–0.08) and an odds ratio of 0.03 (95% CI 0.02–0.04) for false-positive classification (Table 2).

Performance across CST risk bands

When stratified by PTGiP risk bands, a false-positive reduction was observed across strata (Table 3). The sensitivity remained high, with all misclassified aneuploid cases representing conditions detectable by cfDNA.

Table 3

Performance of the phenotype-guided post-triage model across PTGiP risk bands among CST-positive pregnancies
PTGiP risk band	Total N	Aneuploid (n)	Euploid (n)	Aneuploid classified as triage-positive, n (%)	Euploid classified as triage-negative, n (%)	FP reduction (%)
Intermediate (1:300–1:1000)	472	0	472	–	458 (97.0)	97.0
High (1:100–1:300)	244	3	241	3 (100)	224 (92.9)	92.9
Very high (> 1:100)	287	37	250	34 (91.9)	161 (64.4)	64.4
High + very high combined	531	40	491	37 (92.5)	385 (72.5)	72.5

In the very high-risk band (> 1:100; n = 287), 34 of 37 aneuploid pregnancies were classified as positive by the model. Three aneuploid cases (trisomy 21, trisomy 18, and monosomy X) were classified as negative; all represent conditions detectable by cfDNA. Among the 250 euploid pregnancies in this band, 161 (64.4%) were classified as negative.

In the high-risk band (1:100–1:300; n = 244), all three aneuploid pregnancies were classified as positive, whereas 224 of the 241 euploid pregnancies (92.9%) were classified as negative.

In the intermediate-risk band (1:300–1:1000; n = 472), no aneuploid pregnancies occurred, and 458 pregnancies (97.0%) were classified as negative.

Across the combined very high- and high-risk bands (n = 531), the model classified 385 pregnancies (72.5%) as negative. All aneuploid pregnancies misclassified as negative in these bands were detectable by cfDNA, supporting the use of the model as a step-down tool rather than omission of downstream testing. Proposed is illustrated by Fig. 4.

Fig. 4

Clinical workflow simulation illustrating phenotype-guided post-triage decision pathways.

Flow diagram illustrating the application of the phenotype-guided post-triage (PGT) model across guideline-defined CST risk bands. CST-positive pregnancies were stratified into intermediate (1:300–1:1000), high (1:100–1:300), and very high (> 1:100) risk groups and processed by the CART algorithm. Triage-positive classifications indicate high expected yield and recommendation for reflex cell-free DNA or invasive testing, whereas triage-negative classifications indicate low expected yield and support step-down management. All aneuploid pregnancies misclassified as triage-negative represent conditions detectable by cell-free DNA testing.

Phenotype profiles

Distinct phenotypic profiles differentiated benign CST false-positive outcomes from true-positive aneuploid outcomes (Table 4). Benign profiles were characterized by isolated or modest marker deusingtions, whereas aneuploid profiles were associated with clustered abnormalities, including increased nuchal translucency, very low PAPP-A, and abnormal ductus venosus flow. These phenotype patterns were reproducible across cohorts and corresponded to terminal nodes of the CART model (Fig. 3).

Table 4

Phenotypic profiles distinguishing CST false-positive (benign) and true-positive (aneuploid) outcomes identified by the phenotype-guided post-triage model
Clinical feature	Benign phenotype (CST false positive/euploid)	High-risk phenotype (CST true positive/aneuploid)
Nuchal translucency (NT)	Typically, ≤ 2.5 mm; mild elevations (2.5–2.9 mm) benign when isolated	Markedly increased (> 2.95 mm) or > 2.5 mm with concurrent abnormalities
Nasal bone	Present (normal)	Absent or hypoplastic
Ductus venosus flow	Normal A-wave	Abnormal waveform (reversed or absent A-wave)
PAPP-A, MoM	Near median (~ 1.0 MoM) or moderately reduced (> 0.25 MoM)	Very low (≤ 0.25 MoM)
Free β-hCG, MoM	Near population median or isolated elevations	Often elevated (> 2.0 MoM) in trisomy 21; frequently reduced in trisomy 18/13
Crown–rump length (CRL)	Appropriate for gestational age	Small for gestational age (commonly in trisomy 18/13)
Typical CART leaf assignment	Low predicted probability of aneuploidy	Enriched probability of aneuploidy

Fig. 3

Discrimination of baseline CST risk and the phenotype-guided post-triage model in the CST-positive test cohort.

Receiver operating characteristic (ROC) curves comparing baseline CST risk and the phenotype-guided post-triage (PGT) model in the independent test cohort (n = 1,003). Both approaches demonstrated identical global discrimination (AUC = 0.93; DeLong test P = 1.00). The prespecified positive predictive value–priority operating point illustrates substantial reduction in false-positive classification without loss of overall discrimination.

Discussion

Recent work by Kosiński et al. demonstrated substantial phenotypic heterogeneity within CST-positive populations, showing that isolated extreme biochemical marker values—particularly elevated PAPP-A or free β-hCG—can be associated with favorable pregnancy outcomes when ultrasound anatomy is otherwise normal [10]. These findings challenge the assumption that all CST-positive pregnancies represent biologically equivalent risk states and underscore the limitations of risk scores derived solely from aggregated markers.

The present study builds directly on this biological insight by formalizing phenotype-guided post-triage within an interpretable machine-learning framework and by quantifying its impact by operating-point–specific effect sizes rather than global discrimination metrics. Whereas Kosiński et al. described phenotypic heterogeneity at a descriptive level, the current analysis translates this heterogeneity into an explicit, rule-based decision model applied at the clinical juncture where reflex testing decisions are made. As shown in Table 3, a false-positive reduction was observed across CST risk bands, indicating that phenotype-guided refinement is applicable beyond narrowly defined intermediate-risk groups.

The phenotypic patterns identified by the model provide mechanistic coherence to this effect. Benign CST false-positive profiles were characterized by isolated or modest marker deusingtions, whereas true-positive aneuploid profiles were associated with clustered abnormalities, including increased nuchal translucency, very low PAPP-A, and abnormal ductus venosus flow. These patterns are consistent with established associations between marker combinations and aneuploidy risk [15–17] while extending prior work by embedding phenotypic interpretation within a reproducible, thresholded decision framework.

Reducing false-positive classifications has important clinical implications. High false-positive rates among CST-positive pregnancies contribute to unnecessary reflex cfDNA testing, invasive diagnostic procedures, and psychological burden [18–20]. By markedly reducing false-positive classification at a prespecified operating point, phenotype-guided post-triage offers a pathway to more efficient allocation of downstream testing while preserving confidence in identifying truly aneuploid pregnancies. This approach is particularly relevant in healthcare systems where universal cfDNA testing is not feasible and contingent or stepwise strategies are required [21, 22].

Several limitations warrant consideration. This was a retrospective analysis conducted within a single screening framework, and external validation in independent populations is needed [23]. The model was evaluated exclusively among CST-positive pregnancies and is not intended for use as a primary screening tool. Although decision curve analysis was performed as a supportive assessment, conclusions regarding clinical value are based primarily on operating-point–specific effect sizes rather than decision curve metrics [24].

In conclusion, the application of phenotype-guided post-triage to CST-positive pregnancies can substantially reduce false-positive classification without altering global discrimination. Together with prior evidence demonstrating biological heterogeneity among CST-positive risk profiles [10], these findings support a decision-focused evaluation paradigm for post-screening models and suggest that systematic use of routinely collected phenotypic information may improve the efficiency of prenatal screening pathways [25].

Methods

Study design and data sources

Data were prospectively collected from pregnant individuals at five centers in Central Europe between January 2019 and December 2023. The present study constitutes a retrospective secondary analysis of these prospectively collected data. The retrospective analysis was restricted a priori to pregnancies classified as CST-positive according to Polish Society of Gynecologists and Obstetricians (PTGiP) risk thresholds. The study objective was to evaluate a phenotype-guided post-triage model applied exclusively within this CST-positive population.

All data were fully anonymized prior to analysis. The study was conducted in accordance with institutional and national ethical standards and was approved by the Bioethics Committee of the District Medical Chamber in Kraków (approval numbers: 77/KBL/OIL/2012 and 156/KBL/OIL/2017). Informed consent was not required for the secondary analysis of anonymized clinical data, in accordance with Polish national regulations, including the Act of 5 December 1996 on the Medical Profession and the Act of 6 November 2008 on Patients’ Rights and the Patient Ombudsman.

Study population and eligibility criteria

Pregnancies were eligible for inclusion if the following data were available: CST-derived risk estimates, first-trimester ultrasound measurements, maternal serum biochemical markers, and confirmed pregnancy outcomes. Pregnancies classified as low risk (< 1:1000) were excluded, as the model was designed exclusively for post-screening triage among screen-positive cases.

CST-positive pregnancies were categorized into PTGiP risk bands as follows:

Very high risk: >1:100

High risk: 1:100–1:300

Intermediate risk: 1:300–1:1000

The outcome status (aneuploid vs euploid) was established on the basis of invasive diagnostic testing, postnatal karyotype, or documented clinical follow-up confirming chromosomal normality.

Predictors

The candidate predictors consisted of routinely collected CST variables:

Maternal age (years)

Fetal nuchal translucency (NT; mm)

Crown–rump length between 45–84 (CRL; mm)

Fetal heart rate (beats per minute)

Pregnancy-associated plasma protein A (PAPP-A; multiples of the median, MoM)

Free β-human chorionic gonadotropin (free β-hCG; MoM)

Ultrasound markers were included as categorical variables:

Nasal bone (normal vs absent/hypoplastic)

Ductus venosus flow (normal vs abnormal or absent)

Tricuspid regurgitation (absent vs present)

Single umbilical artery (absent vs present)

Definitions of abnormal findings followed established first-trimester screening standards.

Missing values in continuous predictors resulted in exclusion of the affected record. Missing categorical ultrasound markers were retained and treated as nonabnormal, reflecting real-world reporting practices and preserving model applicability.

Model development

A classification and regression tree (CART) model was developed to distinguish true-positive aneuploid pregnancies from false-positive euploid pregnancies within the CST-positive cohort. The model was explicitly designed as a post-triage decision-support tool, not as a primary screening classifier.

The final CST-positive cohort was randomly divided into training, validation, and test subsets (60/20/20), with stratification by outcome, and the test set was held out for final performance evaluation. Tree complexity was controlled using cost-complexity pruning to prevent overfitting, with tuning performed in the validation cohort. Model interpretability was preserved by limiting tree depth and the number of terminal nodes.

Selection of the operating point

Rather than optimizing global discrimination, the operating threshold was prespecified to prioritize the reduction of false-positive classifications while maintaining acceptable sensitivity for aneuploidy detection. A positive predictive value (PPV)–priority operating point was selected in the validation cohort and then fixed for evaluation in the independent test cohort. This approach reflects the intended clinical use of the model for guiding reflex testing decisions among CST-positive pregnancies.

Statistical analysis

Model performance was evaluated in the independent test cohort. Discrimination was assessed using the area under the receiver operating characteristic curve (AUC). Comparisons between baseline CST risk and the post-triage model were performed using the DeLong test.

Diagnostic accuracy metrics included sensitivity, specificity, positive predictive value, negative predictive value, and balanced accuracy. False-positive classification rates were compared between baseline CST risk (using the conventional 1:300 cutoff) and the post-triage model at the prespecified operating point. The CST risk estimate was treated as part of the reference screening framework and not as an index test under evaluation. Effect sizes are reported as relative risks and odds ratios with 95% confidence intervals, which were calculated using the Wald method.

All analyses were performed using Python (version 3.11) with scikit-learn and associated scientific libraries. Statistical significance was defined as a two-sided P value < 0.05.

Reporting standards

This study adheres to the STARD 2015 guidelines for diagnostic accuracy studies and the TRIPOD-AI recommendations for prediction model development and evaluation. The model is reported with an explicit description of intended use, population scope, operating point selection, and limitations to support transparent interpretation and reproducibility. The model was not recalibrated in the test set.

Data availability

The data that support the findings of this study are not publicly available owing to national data protection regulations and institutional restrictions governing the use of clinical prenatal screening data. Deidentified data are available from the corresponding author upon reasonable request and subject to appropriate ethical approval and data-sharing agreements in accordance with institutional policies.

Code availability

The custom Python code used to develop and evaluate the classification and regression tree (CART) post-triage model is available from the corresponding author upon reasonable request during peer review. Upon acceptance, the code will be deposited in a public, DOI-minting repository and made openly available to enable independent verification and reuse. The code will be shared with editors and reviewers during peer review.

Ethical considerations

This study represents a retrospective secondary analysis of prospectively collected clinical data. All data were fully anonymized prior to analysis. The study was conducted in accordance with institutional and national ethical standards and was approved by the Bioethics Committee of the District Medical Chamber in Kraków (approval numbers: 77/KBL/OIL/2012 and 156/KBL/OIL/2017).

Informed consent

was not required for the secondary analysis of anonymized clinical data, in accordance with Polish national regulations, including the Act of 5 December 1996 on the Medical Profession and the Act of 6 November 2008 on Patients’ Rights and the Patient Ombudsman.

Data Availability

References

Wald, N. J., Hackshaw, A. K. & Frost, C. Prenatal screening for Down syndrome. Lancet 350, 829–835 (1997).

Wright, D., Syngelaki, A., Bradbury, I., Akolekar, R. & Nicolaides, K. H. First-trimester screening for trisomies 21, 18 and 13 by ultrasound and biochemical testing. Ultrasound Obstet. Gynecol. 35, 118–126 (2010).

Nicolaides, K. H. Screening for fetal aneuploidies at 11–13 weeks. Prenat Diagn. 31, 7–15 (2011).

Benn, P. et al. Aneuploidy screening: a position statement of the Chromosome Abnormality Screening Committee. Prenat Diagn. 32, 1–10 (2012).

Gil, M. M., Quezada, M. S., Revello, R., Akolekar, R. & Nicolaides, K. H. Analysis of cell-free DNA in maternal blood in screening for fetal aneuploidies: updated meta-analysis. Ultrasound Obstet. Gynecol. 45, 249–266 (2015).

Malone, F. D. et al. First-trimester or second-trimester screening, or both, for Down’s syndrome. N Engl. J. Med. 353, 2001–2011 (2005).

American College of Obstetricians and Gynecologists. Screening for fetal chromosomal abnormalities. Practice Bulletin 226. Obstet. Gynecol. 136, e48–e69 (2020).

Cicero, S., Curcio, P., Papageorghiou, A., Sonek, J. & Nicolaides, K. H. Absence of nasal bone in fetuses with trisomy 21 at 11–14 weeks. Ultrasound Obstet. Gynecol. 18, 623–626 (2001).

Matias, A. et al. Ductus venosus blood flow velocity waveforms in chromosomally abnormal fetuses at 10–14 weeks. Ultrasound Obstet. Gynecol. 11, 332–339 (1998).

10.

Kosiński, P. et al. Clinical consequences of maternal serum PAPP-A and free β-hCG levels above 2.0 multiples of the median in first-trimester screening. Eur. J. Obstet. Gynecol. Reprod. Biol. 282, 101–104. https://doi.org/10.1016/j.ejogrb.2023.01.016 (2023).

11.

Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. Classification and Regression TreesWadsworth,. (1984).

12.

Lemon, S. C., Roy, J., Clark, M. A., Friedmann, P. D. & Rakowski, W. Classification and regression tree analysis in public health. Annu. Rev. Public. Health. 24, 341–365 (2003).

13.

Austin, P. C. & Tu, J. V. Automated variable selection methods for logistic regression produce unstable models. J. Clin. Epidemiol. 57, 1138–1146 (2004).

14.

Steyerberg, E. W. Clinical Prediction Models (Springer, 2009).

15.

Snijders, R. J. M., Noble, P., Sebire, N. & Souka A. UK multicenter project on risk assessment for trisomy 21. Lancet352, 343–346 (1998).

16.

Spencer, K. et al. Maternal serum PAPP-A and free β-hCG in trisomy 21 pregnancies. Prenat Diagn. 20, 525–530 (2000).

17.

Wright, D. & Nicolaides, K. H. Screening for trisomy 21 by fetal nuchal translucency and maternal serum biochemistry. Prenat Diagn. 22, 877–886 (2002).

18.

Kuppermann, M. et al. Psychological outcomes after prenatal screening and diagnostic testing. Am. J. Obstet. Gynecol. .194, 140–146 (2006).

19.

Marteau, T. M., Kidd, J. & Cook, R. Psychological effects of false-positive results in prenatal screening. BMJ307, 1469–1472 (1993).

20.

Benn, P. et al. Uncertainties in prenatal screening and the role of diagnostic testing. Prenat Diagn. 32, 1–7 (2012).

21.

Norton, M. E., Jacobsson, B. & Swamy, G. K. Cell-free DNA analysis for noninvasive examination of trisomy. N Engl. J. Med. 372, 1589–1597 (2015).

22.

Cuckle, H., Benn, P. & Pergament, E. Cell-free DNA screening for fetal aneuploidy as a contingent test. Prenat Diagn. 35, 539–543 (2015).

23.

Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD). Ann. Intern. Med. 162, 55–63 (2015).

24.

Vickers, A. J. & Elkin, E. B. Decision curve analysis: a novel method for evaluating prediction models. Med. Decis. Making26, 565–574 (2006).

25.

Ansbacher-Feldman, M. et al. Risk-adapted strategies in first-trimester prenatal screening. Ultrasound Obstet. Gynecol. 60, 547–555 (2022).

Author Contribution

P.I.S. conceived the study, designed the analysis, performed the statistical and machine-learning analyses, and drafted the manuscript.S.P. independently reviewed the study methodology and verified the analytical code and results.M.W. contributed to data acquisition and provided scientific supervision during the primary phase of the project.F.D. provided institutional supervision and critically reviewed the manuscript.

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Competing Interests

We wish to inform the editors that one of the Guest Editors of this Collection, Dr. Paweł Jan Stanirowski, is a former professional colleague of the corresponding author. There are no current financial, supervisory, or dependent relationships. This disclosure is provided in the interest of transparency, and the authors fully respect Springer Nature’s policy of editorial independence

Acknowledgement

The authors gratefully acknowledge Agnieszka Nocuń, MD, PhD; Magdalena Dudzik, MD; Anna Matyszkiewicz, MD; Dominika Stettner-Kołodziejska, MD, PhD; Marcin Pasternok, MD; and Tomasz Góra, MD, PhD for their contributions to clinical data collection and acquisition of first-trimester screening examinations used in this study. Their work in patient care and ultrasound diagnostics made this analysis possible.

Additional information

Correspondence and requests for materials

Correspondence and requests for materials should be addressed to P.I.S.(email address to be added upon submission).

Table Legends

Table 1. Baseline characteristics of CST-positive pregnancies in the phenotype-guided post-triage (PGT) modeling cohort.

The characteristics included maternal age, gestational age, and biochemical/sonographic markers. Abnormality is defined as nasal bone absence or hypoplasia, abnormal or absent ductus venosus waveform, the presence of tricuspid regurgitation, or the presence of a single umbilical artery. For ultrasound markers, missing values were retained in the denominators and treated as nonabnormal, reflecting routine clinical reporting and preserving model applicability.

Table 2. Pretriage and post-triage performance at defined operating points in the CST-positive test cohort (n = 1,003).

False positives (FPs) denote euploid pregnancies incorrectly classified as positive. Relative risks (RRs) and 95% confidence intervals (CIs) were calculated using log-scale Wald methods; odds ratios (ORs) and 95% confidence intervals (CIs) were estimated using exact methods. Areas under the receiver operating characteristic curve (AUCs) were compared using the DeLong test for paired ROC curves ($P = 1.00$). PGT, phenotype-guided post-triage.

Table 3. Performance of the phenotype-guided post-triage model across PTGiP risk bands among CST-positive pregnancies.

Triage positivity indicates classification for reflex cell-free DNA testing or invasive diagnostic testing under the PGT model, whereas triage negativity indicates a low expected yield. Risk bands are defined per PTGiP guidelines: very high risk (> 1:100), high risk (1:100–1:300), and intermediate risk (1:300–1:1000). False-positive (FP) reduction is the proportion of CST screen-positive euploid pregnancies reclassified as triage-negative.

Table 4. Phenotypic profiles distinguishing CST false-positive (benign) and true-positive (aneuploid) outcomes identified by the phenotype-guided post-triage model.

Phenotypic profiles are derived from terminal nodes of the PGT decision tree (Fig. 2). Leaves labeled as benign corresponded to nodes with low aneuploidy prevalence ($<5\%$) in the training cohort, whereas high-risk leaves demonstrated enriched aneuploidy prevalence characterized by combinations of increased nuchal translucency, low PAPP-A, and abnormal ductus venosus flow. CST, combined screening test; MoM, multiples of the median; CRL, crown–rump length.

Abbreviations:

CST

combined screening test

CRL

crown–rump length

nuchal translucency

ductus venosus

MoM

multiples of the median

Yes