An Explainable Machine Learning Model for Predicting Necrotizing Enterocolitis in Neonates Based on Complete Blood Count Parameters

MengYang1,2,3

YuxiangWang4

YanLin1

NaFan1

HongweiGuo1

AndingZhang1

HaiyanWang1

ZhiboGao1

HefangWu1

QiaoZheng3

YuancuiMeng2✉

XunJiang1✉

Yang1Phone+8618202983023Phone+8613669136302Email863756276@qq.comEmail49656406@qq.com

XunJiangTEL1

Department of PediatricsTangdu Hospital, Air Force Medical UniversityXinsi Road, Baqiao District710038Xi’anShaanxiChina

2Neonatal Intensive Care UnitThe Second Affiliated Hospital of Xi’an Medical UniversityFangzheng Street, Baqiao District710038Xi’anShaanxiChina

3Xi’an Medical UniversityNo. 1 Xinwang Road, Weiyang District710021Xi’anShaanxi ProvinceChina

4School of Computer Science and TechnologyXidian UniversityNo. 266 Xinglong Section, Xifeng Road710126Xi’anShaanxi ProvinceChina

Meng Yang^1,2,3†, Yuxiang Wang^4†, Yan Lin^1†, Na Fan¹, Hongwei Guo¹, Anding Zhang¹, Haiyan Wang¹, Zhibo Gao¹,Hefang Wu¹ ,Qiao Zheng³,Yuancui Meng^2*, and Xun Jiang^1*

†Meng Yang, Yuxiang Wang and Yan Lin have the same contribution to the article.

*Correspondence:

Xun Jiang TEL + 8618202983023 E-mail address: 863756276@qq.com

Yuancui Meng TEL + 8613669136302 E-mail address:49656406@qq.com

Author details

Department of Pediatrics, Tangdu Hospital, Air Force Medical University, Xinsi Road, Baqiao District, 710038 Xi’an, Shaanxi, China

Neonatal Intensive Care Unit, The Second Affiliated Hospital of Xi'an Medical University, Fangzheng Street, Baqiao District, 710038 Xi’an, Shaanxi, China

Xi'an Medical University, No. 1 Xinwang Road, Weiyang District, Xi'an 710021, Shaanxi Province, China.

School of Computer Science and Technology, Xidian University, No. 266 Xinglong Section, Xifeng Road, Xi'an 710126, Shaanxi Province, China.

Full list of author information is available at the end of the articletract

Abstract

Background

Necrotising enterocolitis (NEC) is a major cause of morbidity and mortality in neonates, particularly among preterm infants. Identifying effective methods for early prediction is crucial for developing personalised treatment strategies and improving patient outcomes. Given the significant limitations of existing scoring systems and prediction models, there remains a pressing need to establish novel models for assessing NEC risk To this end, we developed and validated an interpretable machine learning (ML) model and deployed a web-based calculator for the early prediction of NEC onset in the neonatal intensive care unit(NICU).

Methods

We collected complete blood count parameters within the first 24 hours after birth and during the second postnatal week from 116 infants with NEC and 233 non-NEC infants admitted to the NICU of the Second Affiliated Hospital of Air Force Medical University and the Second Affiliated Hospital of Xi’an Medical University between January 2012 and January 2025, and calculated their mean values. Six different ML algorithms were applied to construct classification models for the development of a predictive tool for NEC diagnosis. We quantified model performance using metrics including the area under the receiver operating characteristic curve (AUC). the final model was interpreted using SHapley Additive exPlanations (SHAP), which also quantified feature importance.

Results

Among the six ML models evaluated, the XGBoost algorithm demonstrated superior performance. It achieved an AUC of 0.917 (95% CI:0.858–0.977), an accuracy of 0.8952 (95% CI:0.8203–0.9465), and a no information rate of 0.6667. Additional performance metrics included a sensitivity of 0.7429, specificity of 0.9714, positive predictive value (PPV) of 0.9286, negative predictive value (NPV) of 0.8831, precision of 0.9286, recall of 0.7429, and an F1-score of 0.8254.The calibration curve indicated a strong agreement between predicted probabilities and observed outcomes. SHAP analysis was employed to identify and rank the contribution of key features to the model's predictions. Furthermore, we developed a user-friendly, web-based calculator based on the final XGBoost model, accessible to clinicians at https://nec.yujincheng.cn/. This final model incorporated seven hematological parameters:mean platelet volume (MPV), red cell distribution width coefficient of variation (RDW-CV), mean corpuscular hemoglobin (MCH), white blood cell count (WBC), neutrophil percentage (NEUT%), platelet distribution width (PDW), and mean corpuscular volume (MCV).

Conclusion

Leveraging hematological parameters from the first two postnatal weeks, we developed and validated a robust and interpretable XGBoost model for predicting the risk of NEC. This tool facilitates early identification of high-risk neonates by clinicians and provides a foundation for personalizing therapeutic strategies. Furthermore, this study provides substantial digital support for advancing NEC prevention and management towards a more precise, personalized, and proactive paradigm.

Keywords

Necrotizing Enterocolitis

Machine Learning

Prediction Model

Interpretable Artificial Intelligence

Clinical Decision Support

Introduction

Necrotising enterocolitis (NEC) is a leading cause of devastating gastrointestinal emergencies in neonates, with a reported prevalence of 1.8–8.8% in neonatal intensive care units (NICUs)^[1]. Notwithstanding progress in neonatal intensive care, NEC remains a formidable clinical challenge and a major contributor to mortality and long-term morbidity, including intestinal strictures, short-bowel syndrome, and neurodevelopmental impairment^[2]. In ELBW infants, incidence ranges from 2–7%, with mortality exceeding 30% overall and rising above 80% among those requiring surgical intervention^[3]. Even among survivors, the burden of chronic complications underscores NEC as an urgent and unresolved challenge in perinatal medicine^[4].

The difficulty of addressing NEC lies in both its incompletely understood pathogenesis and the lack of reliable tools for early identification^[5][6]. Current clinical diagnosis relies largely on the Bell staging criteria, which integrate non-specific clinical, laboratory, and radiographic features^[7]. However, these criteria are insufficient for early recognition and perform poorly in risk stratification, particularly for fulminant NEC, which comprises only one-fifth of cases but accounts for the majority of NEC-related deaths^[8]. Abdominal radiography, though widely used, detects only late-stage findings such as pneumatosis and free air^[8]. Abdominal ultrasonography offers dynamic evaluation of bowel perfusion and peristalsis and shows greater sensitivity for ischaemic changes, but its utility remains limited by operator dependence and lack of standardisation^[9]. Similarly, routine biomarkers such as C-reactive protein and platelet count are widely accessible but non-specific^[10][11], whereas novel candidates—including intestinal fatty acid-binding protein (I-FABP; AUC up to 0.96) and cytokine-based signatures (AUC ≈ 0.94)—remain confined to research settings due to assay complexity and lack of clinical integration^[12][13].

In parallel, Several scoring systems^[14–16] and predictive models^[17]-[26] have been proposed, however, their predictive performance has remained modest and inadequate to guide timely clinical intervention. Recent machine learning (ML)-based models offer improved accuracy, but most remain static, relying on single time-point data rather than capturing the dynamic evolution of disease^[17]-[23]. This limitation is critical, as NEC pathogenesis is inherently progressive, with risk and severity fluctuating over time^[5][6]. The absence of dynamic, serial-data–driven prediction tools not only constrains early warning capacity but also hinders precision in surgical decision-making, where timely identification of irreversible intestinal necrosis remains a major unmet clinical need.

Taken together, NEC continues to impose a disproportionate burden of mortality and long-term morbidity in preterm populations^[4]. Existing diagnostic frameworks, biomarkers, and predictive models fall short in providing reliable, early, and individualised risk assessment^[8]. There is therefore a pressing need for dynamic, high-resolution predictive approaches that can integrate serial clinical and laboratory measurements to more accurately capture disease trajectories, facilitate timely intervention, and ultimately improve outcomes in this vulnerable population. In this study, we retrospectively analysed complete blood count parameters obtained within the first 24 hours after birth and during the second postnatal week from 116 neonates with NEC and 233 without NEC admitted to the Department of Neonatology, Second Affiliated Hospital of Air Force Medical University and the Second Affiliated Hospital of Xi’an Medical University between January 2012 and January 2025. Using the XGBoost machine learning algorithm, we developed a highly predictive model for NEC risk assessment. The Shapley Additive exPlanations (SHAP) framework was applied to interpret model predictions and elucidate feature importance, thereby enhancing clinical relevance. Finally, we implemented a user-friendly, web-based calculator https://nec.yujincheng.cn/ that enables health-care providers to estimate individual NEC risk from routine haematological parameters, with the aim of supporting clinical decision-making in neonatal intensive care units.

Methods

An overview of the study design is presented in Fig. 1.

Fig. 1

The schematic illustrates the key steps of the analytical workflow.NEC, necrotising enterocolitis; NICU, neonatal intensive care unit; DCA, decision curve analysis; LASSO, Least Absolute Shrinkage and Selection Operator; RF, Random Forest; XGB:extreme gradient boosting;KNN, K-Nearest Neighbor;SVM, Support Vector Machine;.LR:logistic regression; DT:decision tree.

Study Population

We retrospectively analyzed 32 features, including routine blood parameters and blood gas analysis levels, from NEC and non-NEC neonates in the NICU of the Second Affiliated Hospital of Air Force Medical University and the Second Affiliated Hospital of Xi’an Medical University between January 2012 and January 2025. Birth history information was also collected for all patients. Inclusion criteria for the NEC group were^[7]:(1) diagnosis confirmed by two experienced neonatologists according to the modified Bell's staging criteria, stage IA, IB, IIA, or IIB; (2) neonatal age < 28 days. Patients with severe systemic diseases such as immunodeficiency, genetic metabolic diseases, pulmonary hemorrhage, congenital intestinal malformations, congenital diaphragmatic hernia, intestinal malrotation, meconium ileus, or simple intestinal perforation were excluded. Patients with incomplete medical records were also excluded. Non-NEC neonates hospitalized during the same period were selected as the control group, matched by admission time and gestational age. Exclusion criteria were the same as for the NEC group. Additionally, control neonates had to meet the following criteria^[27]:(1) admitted to the neonatology department shortly after birth and received complete support and treatment; (2) availability of routine blood test results from the first 24 hours and the second week after birth; (3) no evidence of intestinal perforation during hospitalization, including no intra-abdominal air on abdominal X-rays. Further exclusion criteria for non-NEC neonates were:(1) rapid deterioration leading to death within 72 hours; (2) diagnosis of gastrointestinal malformations during hospitalization; (3) severe or life-threatening congenital extra-intestinal malformations; (4) severe pulmonary abnormalities; and (5) incomplete data.

Clinical Characteristics and Data Collection

Clinical characteristics of the enrolled neonates included sex,birth weight, gestational age, Apgar scores at 1 and 5 minutes, and mode of delivery.

For neonates in both groups, the initial complete blood count and blood gas analysis obtained within the first 24 hours after birth, as well as the complete blood count parameters from the second postnatal week, were collected. The average value for each hematological parameter across these two time points was calculated and used for subsequent analyses.

The laboratory parameters collected were categorized as follows:

Arterial Blood Gas (ABG) analysis:pH value(pH), partial pressure of carbon dioxide (PaCO₂, mmHg), partial pressure of oxygen (PaO₂, mmHg), and bicarbonate (HCO₃⁻, mmol/L).

White blood cell parameters:White blood cell count (WBC, ×10⁹/L), absolute neutrophil count (NEUT, ×10⁹/L), neutrophil percentage (NEUT%), absolute lymphocyte count (LYMPH, ×10⁹/L), lymphocyte percentage (LYMPH%), absolute monocyte count (MONO, ×10⁹/L), monocyte percentage (MONO%), absolute eosinophil count (EOS, ×10⁹/L), eosinophil percentage (EOS%), absolute basophil count (BASO, ×10⁹/L), and basophil percentage (BASO%).

Red blood cell parameters:Red blood cell count (RBC, ×10¹²/L), hemoglobin (Hb, g/L), hematocrit (HCT%), mean corpuscular volume (MCV, fL), mean corpuscular hemoglobin (MCH, pg), mean corpuscular hemoglobin concentration (MCHC, g/L), red cell distribution width–coefficient of variation (RDW-CV, %), and red cell distribution width–standard deviation (RDW-SD, fL).

Platelet parameters:Platelet count (PLT, ×10⁹/L), mean platelet volume (MPV, fL), platelet distribution width (PDW, fL), and plateletcrit (PCT, %).

All predictor variables were derived from objective data extracted from the electronic medical records.

Analytical Methods and R Packages

All statistical analyses were conducted using R software (version 4.4.2). To ensure data integrity, variables with > 30% missing values were pruned. The remaining missing data were imputed via the "mice" package (version 3.16.0) with 20 imputations.

The dataset was subsequently partitioned into training and testing sets at a 7:3 ratio using stratified random sampling, implemented with the "caret" package (version 6.0–93).

Baseline characteristics and comparative analyses were conducted using the CBCgrps package. Univariate logistic regression analysis on the training set was performed using the stats package (v4.2.2). The following methods and corresponding packages were employed:LASSO regression with glmnet, multivariable logistic regression using glm {stats}, recursive feature elimination (RFE) with caret, decision tree with rpart, random forest using randomForest, XGBoost via xgboost, naive Bayes and SVM with e1071, k-nearest neighbors using kknn.Forest plots were generated using the forestplot package. Model discrimination was assessed using the pROC package for ROC analysis and the PRROC package for precision-recall curves. Calibration was evaluated using the val.prob and calibrate functions from the rms package, along with calibration curves from the riskRegression package. The Hosmer–Lemeshow test was performed using the ResourceSelection package. Decision curve analysis (DCA) and net reduction curves were conducted using the dca.R function.

Model Evaluation and Visualization

Model performance was rigorously evaluated across key dimensions using established R packages. Discrimination was quantified with the pROC package for the receiver operating characteristic (ROC) analysis and the PRROC package for precision-recall (PR) curves. Calibration was assessed using calibration curves (generated with the riskRegression package), alongside the val.prob and calibrate functions from the rms package, and the Hosmer-Lemeshow test (ResourceSelection package). Clinical utility was evaluated via decision curve analysis and net reduction curves, generated using the dca.R function. All forest plots were created with the forestplot package.

Variable Selection

In this study, we employed a sequential feature selection approach utilizing the Boruta algorithm^[28]. Recursive Feature Elimination (RFE)^[29], and Least Absolute Shrinkage and Selection Operator (LASSO) regression ^[30]. This was followed by multivariate logistic regression analyses to identify the most predictive variables, thereby enhancing model performance and stability. The robustness of the selection process and the generalizability of the models were ensured through ten rounds of ten-fold cross-validation applied throughout the feature screening phase.

Model Construction and Hyperparameter Tuning

Six distinct machine learning algorithms were employed to construct predictive models:Decision Tree (DT), k-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), eXtreme Gradient Boosting (XGBoost), and Logistic Regression^[31]. The model training workflow consisted of four sequential stages:dataset construction, data preprocessing, training validation, and prediction. A randomized grid search was conducted for hyperparameter tuning, with the primary objective of maximizing the Area Under the Receiver Operating Characteristic Curve (AUC)^[32–34].

Cross-Validation

For each tuned model, seven performance metrics were derived from the test dataset. The mean AUC value, along with other evaluation metrics, was calculated based on 10 independent repetitions of the model training process to ensure reliability. The preprocessed dataset was subjected to stratified 10-fold cross-validation for model training and optimization. The efficacy of the selected hyperparameters was assessed by comparing the performance of the baseline trained models against the tuned models^[32–34].

The stratified 10-fold cross-validation procedure involved randomly partitioning the entire cohort of 245 subjects into 10 equally sized subsets, while preserving the class distribution in each fold. In each iteration, nine subsets were used as the training set, and the remaining one was held out as the test set^[35]. This process was repeated 10 times, ensuring each subset served as the test set exactly once. The average predictive performance across all 10 iterations was reported as the final performance of each algorithm. This repeated, stratified cross-validation approach provided a more robust estimate of model performance and guaranteed consistent class distribution in every repetition ^[35].

Model Performance Evaluation

Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, F1-score, and Brier score^[32–34]. The AUC served as the primary indicator of discriminative ability, with higher values signifying superior classification performance^[32–34]. A probability threshold of 0.5 was applied to define binary outcomes. Accuracy represented the proportion of correctly classified cases among all observations. Specificity indicated the proportion of true negatives correctly identified, reflecting a lower false-positive rate at higher values. Sensitivity denoted the proportion of true positives accurately detected, with higher values corresponding to a lower false-negative rate. The PPV reflected the proportion of predicted positives that were truly positive, whereas the NPV represented the proportion of predicted negatives that were truly negative. The F1-score, defined as the harmonic mean of precision and recall, provided a balanced measure of both ^[32–34].

Model calibration, representing the agreement between predicted probabilities and observed outcomes, was assessed using the Hosmer-Lemeshow test and visualized via calibration curves^[36]. The logarithmic loss (Log-Loss) metric was employed to quantify the accuracy of the predicted probabilities by measuring the discrepancy between these probabilities and the actual labels^[36]. Statistical comparison of AUC values between different models was performed using DeLong's test^[36] The clinical utility of the models across various probability thresholds was evaluated using decision curve analysis (DCA), which estimates the net benefit. The optimal predictive model was selected based on its performance across the aforementioned metrics in both the training and test sets^[37].

Model Interpretation

The decision-making process of machine learning models often functions as a "black box," obscuring the rationale behind individual predictions^[38]. SHAP (SHapley Additive exPlanations) addresses this limitation by leveraging concepts from cooperative game theory, specifically Shapley values, which ensure an equitable distribution of feature attributions based on their marginal contributions across all possible feature subsets^[39]. This approach guarantees that explanations at the sample level remain faithful to the model's predictions, thereby enabling users to develop trust in the contribution of each feature to a specific outcome ^[40–43].

Web Calculator

To facilitate clinical translation, the final prediction model was operationalized as a user-friendly web application. This platform calculates and displays the probability of NEC onset when the values of the relevant features in the final model are provided.

Statistical Analyses

Continuous variables were summarized according to their distribution. Variables with a normal distribution are expressed as mean ± standard deviation and compared using the independent-samples t test. Non-normally distributed variables are presented as median (interquartile range) and compared with the Mann–Whitney U test. Categorical variables are reported as counts and percentages, and group differences were assessed using the χ² test. A two-sided p value of less than 0·05 was considered statistically significant.

Results

Baseline characteristics between the NEC group and non-NEC group.

This study included 233 non-NEC infants and 116 NEC infants. Among the 245 patients in the training set, 81 (33%) were diagnosed with NEC. In the test set of 104 patients, 35 (33.6%) developed NEC. Table 1 presents the baseline demographic and clinical characteristics of the patient cohort. The distributions of all variables were comparable between the training and test sets (all p-values > 0.05), as shown in Table 2. Furthermore, several clinical characteristics showed significant differences between the NEC and non-NEC groups. For example, compared to the non-NEC group, the NEC group had a smaller gestational age (31.60 [29.70; 33.40] vs. 35.00 [32.00; 37.60], p < 0.001), lower birth weight (1470.00 [1230.00; 1850.00] vs. 2000.00 [1515.00; 2870.00], p < 0.001), and a higher pH value within the first 24 hours (7.33 [7.29; 7.38] vs. 7.29 [7.22; 7.34], p < 0.001).

Table 1
Demographic and Clinical Profile of the NEC vs. Non-NEC Cohorts
Variables	Total (n = 349)	non-NECgroup (n = 233)	NECgroup (n = 116)	p
Male,n(%)	204(58.5%)	126(54.1%)	78(67.2%)	0.021ᵇ
Gestationalage(weeks)	32(30.1,34.9)	31.6(29.7,33.4)	35(32,37.6)	< 0.001ᵃ
BW,(g)	1600(1300,2160)	1470(1230,1850)	2000(1515,2870)	< 0.001ᵃ
Apgarscoreat1stminute	9(7,10)	9(8,10)	9(6.75,10)	0.606ᵃ
Apgarscoreat5thminute	10(9,10)	10(9,10)	10(9,10)	0.869ᵃ
Delivery mode,n(%)				0.002^b
Vaginaldelivery	177(50.7%)	132(56.7%)	45(38.8%)
Cesareansection	172(49.3%)	101(43.3%)	71(61.2%)
pH	7.3(7.24,7.35)	7.29(7.22,7.34)	7.33(7.29,7.38)	< 0.001ᵃ
PaCO₂,(mmHg)	42.5(35.6,49.3)	43.3(37.9,51.8)	38.65(29.85,44.02)	< 0.001ᵃ
PaO₂,(mmHg)	72(60,85.3)	72(56.3,84.2)	72.6(63.48,86)	0.127ᵃ
WBC,(×10⁹/L)	10(8.04,12.42)	10.02(8.07,12.19)	9.92(7.84,12.59)	0.709ᵃ
NEUT%,(%)	46.54 ± 12.99	44.15 ± 11.62	51.32 ± 14.26	< 0.001ᶜ
LYMPH%,(%)	39.25(32.1,48.25)	40.65(34.8,49.45)	34.65(26.91,43.26)	< 0.001ᵃ
MONO%,(%)	9.78(7.95,12.3)	10.15(8.45,12.45)	8.62(6.69,11.4)	< 0.001ᵃ
EOS%,(%)	2.2(1.5,3.25)	2.2(1.6,3.45)	2.12(1.19,2.98)	0.058ᵃ
BASO%,(%)	0.45(0.3,0.7)	0.5(0.35,0.85)	0.35(0.25,0.5)	< 0.001ᵃ
NEUT(×10⁹/L)	4.53(3.2,6.16)	4.53(3.19,5.3)	4.91(3.33,7)	0.013ᵃ
LYMPH(×10⁹/L)	3.6(2.9,4.43)	3.62(3.17,4.59)	3.27(2.39,4.08)	< 0.001ᵃ
MONO(×10⁹/L)	0.98(0.76,1.25)	0.98(0.81,1.27)	0.9(0.62,1.18)	0.007ᵃ
EOS(×10⁹/L)	0.21(0.13,0.34)	0.21(0.14,0.36)	0.21(0.1,0.31)	0.043ᵃ
BASO(×10⁹/L)	0.04(0.03,0.08)	0.05(0.04,0.09)	0.03(0.02,0.04)	< 0.001ᵃ
RBC(×10¹²/L)	4.12 ± 0.51	4.1 ± 0.48	4.16 ± 0.55	0.294ᶜ
Hb,(g/L)	151.75(140,163)	152.5(140,164)	150.5(140.25,162.12)	0.859ᵃ
HCT,(%)	45(41.8,48.8)	44.75(41.85,48.4)	45.4(41.69,49.16)	0.45ᵃ
MCV,(fL)	109.15(105.35,113.8)	109.15(105.4,113.8)	108.88(105.35,113.76)	0.73ᵃ
MCH,(pg)	36.75(35.55,38.35)	37.1(35.6,38.45)	36.4(35.29,37.8)	0.03ᵃ
MCHC,(g/L)	338(329.5,344.5)	338.5(332,344.5)	334.75(324.38,343.62)	0.006ᵃ
RDW-CV,(%)	16.25(15.4,17.3)	16.4(15.75,17.45)	15.75(14.6,16.8)	< 0.001ᵃ
RDW-SD,(fL)	64.8(60.1,70.15)	65.3(61,70.7)	62.92(58.24,67.91)	0.002ᵃ
PLT,(×10⁹/L)	243(190,304.5)	241.5(200,297)	243.5(170.25,317.88)	0.861ᵃ
MPV,(fL)	10.6(10.1,11.2)	10.8(10.5,11.45)	10.05(9.25,10.6)	< 0.001ᵃ
PDW,(fL)	13.75(12,16.1)	13.15(11.65,14.15)	16.35(13.75,16.75)	< 0.001ᵃ
PCT,(%)	0.27(0.22,0.32)	0.27(0.23,0.33)	0.26(0.19,0.31)	0.009ᵃ

Table 2
Comparison of demographic characteristics and clinical characteristics between training and test sets.
Variables	Total (n = 349)	Test (n = 104)	Train (n = 245)	p
Male,n(%)	204(58.5%)	62(59.6%)	142(58.0%)	0.841ᵇ
Gestationalage(weeks)	32(30.1,34.9)	32.6(29.98,34.9)	32(30.4,35)	0.866ᵃ
BW,(g)	1600(1300,2160)	1510(1242.5,1985)	1620(1300,2200)	0.129ᵃ
Apgarscoreat1stminute	9(7,10)	9(7.75,10)	9(7,10)	0.167ᵃ
Apgarscoreat5thminute	10(9,10)	10(9,10)	10(9,10)	0.496ᵃ
Modeofdelivery,n(%)				0.784^b
Vaginaldelivery	177(50.7%)	53(51.0%)	124(50.6%)
Cesareansection	172(49.3%)	51(49.0%)	121(49.4%)
pH	7.3(7.24,7.35)	7.32(7.25,7.36)	7.3(7.23,7.35)	0.099ᵃ
PaCO₂,(mmHg)	42.5(35.6,49.3)	41.8(34.58,47.88)	42.5(35.9,49.5)	0.413ᵃ
PaO₂,(mmHg)	72(60,85.3)	72(55.13,86.3)	72(61.6,85)	0.818ᵃ
WBC,(×10⁹/L)	10(8.04,12.42)	9.7(7.9,12.15)	10.12(8.1,12.57)	0.265ᵃ
NEUT%,(%)	46.54 ± 12.99	47.11 ± 13.57	46.29 ± 12.75	0.598ᶜ
LYMPH%,(%)	39.25(32.1,48.25)	39.97(32.52,47.17)	38.95(32.1,48.9)	0.746ᵃ
MONO%,(%)	9.78(7.95,12.3)	9.95(7.82,12.26)	9.65(8,12.35)	0.949ᵃ
EOS%,(%)	2.2(1.5,3.25)	2.11(1.44,3.16)	2.2(1.5,3.4)	0.383ᵃ
BASO%,(%)	0.45(0.3,0.7)	0.45(0.29,0.85)	0.45(0.3,0.65)	0.674ᵃ
NEUT(×10⁹/L)	4.53(3.2,6.16)	4.53(3.2,5.95)	4.53(3.29,6.2)	0.868ᵃ
LYMPH(×10⁹/L)	3.6(2.9,4.43)	3.46(2.67,4.25)	3.6(2.96,4.47)	0.137ᵃ
MONO(×10⁹/L)	0.98(0.76,1.25)	0.98(0.72,1.26)	0.98(0.76,1.25)	0.817ᵃ
EOS(×10⁹/L)	0.21(0.13,0.34)	0.21(0.09,0.37)	0.21(0.14,0.32)	0.668ᵃ
BASO(×10⁹/L)	0.04(0.03,0.08)	0.04(0.03,0.09)	0.04(0.03,0.07)	0.354ᵃ
RBC(×10¹²/L)	4.12 ± 0.51	4.06 ± 0.47	4.15 ± 0.52	0.101ᶜ
Hb,(g/L)	151.75(140,163)	150.75(139.5,162.12)	152(140.5,163.5)	0.613ᵃ
HCT,(%)	45(41.8,48.8)	45.15(41.8,48.36)	45(41.85,48.9)	0.72ᵃ
MCV,(fL)	109.15(105.35,113.8)	108.8(105.84,112.42)	109.25(105.15,113.85)	0.813ᵃ
MCH,(pg)	36.75(35.55,38.35)	37.1(35.74,38.88)	36.7(35.45,38.05)	0.081ᵃ
MCHC,(g/L)	338(329.5,344.5)	338.5(328,344.12)	338(329.5,344.5)	0.981ᵃ
RDW-CV,(%)	16.25(15.4,17.3)	16.17(15.4,17.1)	16.25(15.45,17.45)	0.435ᵃ
RDW-SD,(fL)	64.8(60.1,70.15)	64.97(60.49,70.6)	64.8(60.1,69.9)	0.862ᵃ
PLT,(×10⁹/L)	243(190,304.5)	236(192,298.62)	243(190,308.5)	0.597ᵃ
MPV,(fL)	10.6(10.1,11.2)	10.6(9.89,11.1)	10.6(10.2,11.25)	0.31ᵃ
PDW,(fL)	13.75(12,16.1)	13.75(12.28,16.25)	13.75(11.9,16.1)	0.25ᵃ
PCT,(%)	0.27(0.22,0.32)	0.27(0.23,0.32)	0.27(0.22,0.32)	0.94ᵃ
Data are presented as mean ± SD, median (IQR), or n (%), as appropriate.
ᵃ Mann–Whitney U test; ᵇ Chi-square test; ᶜ Student’s t test.
Abbreviations:BW, birth weight; VD, vaginal delivery; CS, cesarean section; WBC, white blood cell count; NEUT%, neutrophil percentage; LYMPH%, lymphocyte percentage; MONO%, monocyte percentage; EOS%, eosinophil percentage; BASO%, basophil percentage; NEUT, neutrophil count; LYMPH, lymphocyte count; MONO, monocyte count; EOS, eosinophil count; BASO, basophil count; RBC, red blood cell count; Hb, hemoglobin; HCT, hematocrit; MCV, mean corpuscular volume; MCH, mean corpuscular hemoglobin; MCHC, mean corpuscular hemoglobin concentration; RDW-CV, red cell distribution width-coefficient of variation; RDW-SD, red cell distribution width–standard deviation; PLT, platelet count; MPV, mean platelet volume; PDW, platelet distribution width; PCT, plateletcrit.
Selection of Predictor Variables
First, Boruta feature selection identified the top 15 features, RFE selected the top 15 predictor variables, and LASSO regression identified the top 16 predictor variables. Predictors identified by all three methods were included in the multivariate logistic regression analysis, which then identified 7 optimal features. (Fig. 2, Fig. 3,, Table 3).

Table 3
Results of the Multivariate Logistic Regression Analysis
Variable	B	SE	OR	95% CI	Z	p-value
PDW	0.34	0.08421	1.406	1.192–1.658	4.043	< 0.001
MPV	-1.112	0.18727	0.329	0.228–0.475	-5.941	< 0.001
RDW-CV	-0.275	0.11741	0.759	0.603–0.956	-2.344	0.019
MCH	-0.467	0.15914	0.627	0.459–0.857	-2.932	0.003
NEUT%	0.044	0.01687	1.045	1.011–1.080	2.61	0.009
MCV	0.124	0.05584	1.132	1.015–1.263	2.227	0.026
MONO(×10⁹/L)	0.647	0.43371	1.91	0.817–4.470	1.493	0.136
WBC	-0.107	0.05341	0.898	0.809–0.997	-2.009	0.045

Table 4
Performance of ML models
Model	Accuracy (95% CI)	Sensitivity	Specificity	Precision	Recall	F1 Score
KNN	0.8942 (0.8186–0.946)	0.7714	0.9565	0.9000	0.7714	0.8308
XGB	0.8846 (0.8071–0.9389)	0.8571	0.8986	0.8108	0.8571	0.8333
DT	0.8846 (0.8071–0.9389)	1.0000	0.8519	0.6571	1.0000	0.7931
RF	0.8654 (0.7845–0.9244)	0.7714	0.9130	0.8182	0.7714	0.7941
LR	0.8462 (0.7622–0.9094)	0.6286	0.9565	0.8800	0.6286	0.7333
SVM	0.1442 (0.0830–0.2267)	0.3714	0.0290	0.1625	0.3714	0.2261

Data are presented as mean ± SD, median (IQR), or n (%), as appropriate.

ᵃ Mann–Whitney U test; ᵇ Chi-square test; ᶜ Student’s t test.

Abbreviations:

birth weight

vaginal delivery

cesarean section

WBC

white blood cell count

NEUT%

neutrophil percentage

LYMPH%

lymphocyte percentage

MONO%

monocyte percentage

EOS%

eosinophil percentage

BASO%

basophil percentage

NEUT

neutrophil count

LYMPH

lymphocyte count

MONO

monocyte count

EOS

eosinophil count

BASO

basophil count

RBC

red blood cell count

hemoglobin

HCT

hematocrit

MCV

mean corpuscular volume

MCH

mean corpuscular hemoglobin

MCHC

mean corpuscular hemoglobin concentration

RDW-CV

red cell distribution width-coefficient of variation

RDW-SD

red cell distribution width–standard deviation

PLT

platelet count

MPV

mean platelet volume

PDW

platelet distribution width

PCT

plateletcrit.

Fig. 2

Variable selection. (A) Feature importance assessment using the Boruta algorithm.(B) Distribution of feature importance scores across classifier runs in the Boruta algorithm.(C) Recursive Feature Elimination (RFE) with cross-validation.(D) Importance ranking of predictive variables generated by RFE.(E) LASSO coefficient distribution plot.(F) LASSO regression regularization parameter plot.(G) Hematological parameters selected by LASSO and their corresponding coefficients.(H)Waterfall plot of LASSO-derived risk scores.

Fig. 3

Forest plot of the multivariable logistic regression analysis.

Model Performance Comparison and Optimal Model Selection

The performance of the multivariate logistic regression and machine learning models was assessed using the area under the receiver operating characteristic curve (AUC)(Fig. 5A, Fig. 6E), accuracy, specificity, sensitivity, and the F1-score. Their average performance on stratified 10-fold cross-validation is summarized in Table 4, Fig. 4.

The XGBoost model had the highest AUC in the validation set (0.917), followed by the RF model (0.910), and then the SVM model (0.873). The XGBoost model also demonstrated excellent performance in accuracy (0.8846), specificity (0.8986), sensitivity (0.8571), and F1-score (0.8333), indicating its high accuracy in distinguishing between NEC and non-NEC groups. This model can be effectively used to construct an NEC diagnostic model. Figure 5 visually compares the evaluation metrics results of all compared models, highlighting the advantages of the best model. Figure 5 illustrates the superior performance of the XGBoost model across all metrics, indicating that it is well-suited for clinical prediction.

Fig. 6

showing the discrimination ability of individual CBC parameters and the XGB model in the training (Fig. 6A) and validation (Fig. 6B) cohorts. The XGB model exhibited superior area under the curve (AUC) values compared with single predictors, indicating enhanced predictive accuracy.

DCA for the training (Fig. 6C) and validation (Fig. 6D) cohorts demonstrate that the XGB model provides greater net clinical benefit across a wide range of threshold probabilities relative to individual CBC indicators, suggesting improved clinical utility for NEC risk stratification.

In summary, the XGBoost model demonstrated superior performance on both training and test sets and was therefore identified as the optimal predictor for NEC risk in this cohort, with the Random Forest (RF) model being the next best alternative.

Fig. 4

Performance of the six machine learning models.Evaluation using seven metrics across various dimensions facilitates a comprehensive assessment of diagnostic efficacy and the selection of the optimal approach.

Fig. 5

Performance evaluation results of the different MLmodels.(A) Receiver operating characteristic (ROC) curves:(B) Calibration curve for the training set:(C) Calibration curve for for the validation set:(D, E) Decision curve analysis (DCA) for the training and validation sets, respectively.

Fig. 6

XGBoost model based on CBC parameters for predicting NEC. (A, B) illustrate that the XGBoost model outperformed individual CBC parameters in both the training and validation cohorts. (C, D) DCA curves show the model provided greater net clinical benefit across a wide range of threshold probabilities compared with single CBC predictors.(E) Bar chart is used for the intuitive comparison of AUC among the different models.

SHAP-driven interpretability analysis of the XGBoost model for NEC prediction

As shown in Fig. 7A, we quantified the contribution of various complete blood count (CBC) parameters to the XGBoost model’s prediction of NEC using the Gain metric and ranked the feature importance. The results demonstrated that PDW held a dominant position, with its gain value significantly higher than the other parameters, suggesting that PDW is the central driving factor in the model’s ability to differentiate NEC from non-NEC cases. MPV, RDW-CV, MCH and other red blood cell morphology parameters followed in importance, highlighting their auxiliary role in the prediction. Figure 7B illustrates the confusion matrix for the training set, Fig. 7C showing the model's classification performance on the test set:True Positives (TP) = 78, True Negatives (TN) = 162, False Positives (FP) = 3, and False Negatives (FN) = 1. The accuracy is calculated as (162 + 78)/(162 + 78 + 3 + 1) ≈ 98.3%, demonstrating the model's excellent discriminative ability and high sensitivity, which meets the clinical need for early NEC warning. The Sensitivity (Recall) is calculated as 78/(78 + 1) ≈ 99.0%, indicating an extremely low risk of missed NEC cases. The Specificity is 162/(162 + 3) ≈ 98.18%. Figure 7C presents the confusion matrix for the validation set, numbers within the matrix indicate sample counts:true negatives (TN) = 68, false positives (FP) = 9, false negatives (FN) = 2, and true positives (TP) = 26. The corresponding performance metrics were:accuracy 89.5%, sensitivity 92.9%, specificity 88.3%, positive predictive value 74.3%, negative predictive value 97.1%, and F1 score 82.6%., further highlighting the model’s high reliability for predicting positive results, making it suitable as an exclusionary diagnostic tool.

Fig. 7D

Bars represent the average contribution of each hematologic parameter to the model prediction. PDW and MPV had the largest average impact, followed by NEUT%, RDW-CV, MCH, WBC, and MCV. Figure 7E SHAP summary (beeswarm) plot showing per-sample contributions of hematologic features to the NEC prediction. Each point represents one sample’s SHAP value for the corresponding feature (horizontal axis); color indicates the original feature value (low→high). Positive SHAP values increase the predicted probability of NEC. Platelet indices (PDW and MPV) exhibited the largest average contributions, with higher values predominantly pushing predictions toward NEC; neutrophil percentage showed a mixed directional effect. RDW-CV and MCH contributed moderately, whereas WBC and MCV had smaller overall effects with occasional outliers. Figure 8. Each panel shows the relationship between the raw feature value (x-axis) and the feature’s SHAP value (y-axis); positive SHAP values increase the predicted probability of NEC. Platelet indices (PDW, MPV) show the largest and most nonlinear effects, whereas neutrophil percentage (NEUT%) demonstrates a non-monotonic (U-shaped) relationship. RDW-CV and MCH have moderate influences; MCV and WBC show minimal overall effects with some outliers. SHAP values are reported on the model output scale (log-odds). Figure 7F (case 6). SHAP force plot for an individual prediction (baseline E[f(x)] = − 0.837; baseline probability ≈ 30.2%). Feature contributions (SHAP, on log-odds scale) were:PDW − 2.50, MPV + 0.926, NEUT% +0.298, RDW-CV − 0.249, MCH + 0.201 (WBC and MCV negligible). Summing these yields 7f(x) ≈ − 2.17, corresponding to a predicted NEC probability ≈ 10.3%. Negative SHAP values reduce predicted NEC probability; positive values increase it.

In conclusion, this decision plot facilitates individualized predictive interpretation. Clinicians can track the risk progression of infants by observing dynamic changes in parameters, such as the increase in MPV from baseline to peak, which can serve as an early indicator of disease progression.

Implementation of the Web Calculator

The final prediction model was deployed as an web-based calculator (Fig. 9). This tool allows users to input the values of the seven key features to automatically compute an individual's predicted risk of developing NEC. The calculator is publicly accessible at: https://nec.yujincheng.cn/.

Fig. 7

(A)Variable importance ranking plot of the XGBoost model for predicting NEC.༈B༉Confusion matrix of the training set.(C)Confusion matrix of the validation set. (D)Dependency plot based on SHAP values. (E) Beeswarm plot of SHAP values.(F)SHAP decision plot for Case 6.

Fig. 8

SHAP dependence plots of key predictors in the XGBoost model.

Fig. 9

Web application screenshot

Discussion

The absence of a clinical tool for the early and accurate prediction of NEC has been a long-standing challenge^[1]−[3], Although several machine learning (ML) models have emerged recently, most are constrained by their reliance on static data from a single timepoint, failing to capture the dynamic progression of NEC^[17–26]. Our study addresses this critical gap by innovatively integrating hematological parameters from both the first 24 hours and the second week of life, using their average to simulate a simplified "dynamic monitoring" concept. Within an interpretable ML framework, we developed and validated a high-performance, transparent model for early NEC risk prediction. Our key findings not only confirm the considerable potential of ML in predicting pediatric critical illness but also shed light on the pivotal alterations in platelet and red blood cell parameters during the pre-symptomatic phase of NEC, offering novel insights into its early pathophysiology.

Specifically, the XGBoost algorithm performed best among the six models, with an AUC of 0.917 (95% CI:0.858–0.977), accuracy of 0.8952, specificity of 97.1%, and sensitivity of 74.3%. This performance indicates that the model can distinguish between NEC and non-NEC cases with high precision (the superiority of the ROC curve is visually demonstrated in Fig. 5). The scientific significance lies in the fact that this model, for the first time, incorporates dynamic indicators (such as the mean changes in MPV and RDW-CV) into the prediction system, revealing the potential of platelet activation (increased MPV) and red blood cell morphological heterogeneity (increased RDW-CV) as early biomarkers, which are directly related to the pathophysiology of NEC (such as intestinal microthrombus formation and inflammatory response)^[5][6][44]. The occurrence and development of NEC is a temporal process, and its pathophysiological changes, such as inflammatory activation, platelet consumption, and red blood cell destruction, are gradually reflected in changes in peripheral blood routine parameters^[5][6]. By calculating the average of parameters at two key time points, the early postnatal period and the second week of life, this study captured this dynamic trend. Compared to static models that only use the initial laboratory values upon admission, this approach better reflects the pathophysiological evolution of the body, thereby providing richer predictive information.

This study combines algorithmic advantages with interpretability processing. The XGBoost algorithm itself is good at handling complex nonlinear relationships and feature interactions, making it very suitable for the characteristics of medical data^[40]. More importantly, we did not stop at a "black box" model but introduced the SHAP framework to deeply interpret the model^[43]. By identifying key predictors, global interpretability was achieved, and by understanding the basis for predictions in individual cases, local interpretability was achieved, greatly enhancing clinicians' trust and acceptance of the model^[43]. Through SHAP analysis, this study clearly revealed that PDW is the most important variable for predicting NEC, its contribution far exceeding other features. In the complex pathogenesis of NEC, the elevated PDW identified in our study elevates the role of platelets to an unprecedented level of importance. The increase in PDW is not an isolated laboratory phenomenon but reflects a series of complex pathophysiological changes in the body during the pre-onset and early stages of NEC^[44][45]. Firstly, increased PDW is closely related to intestinal mucosal barrier injury. Secondly, increased PDW is a sensitive marker of systemic inflammatory response. Thirdly, our findings strongly echo and significantly advance knowledge in this field. Although traditional clinical focus and most biomarker research have centered on inflammatory markers like white blood cell count and C-reactive protein (CRP)^[11], in recent years, some pioneering studies have begun to focus on changes in platelet parameters in NEC^[44]. Our study, using a data-driven machine learning approach, identified PDW as the most important predictor from a large candidate feature set containing multiple parameters, providing strong clinical data support for the hypothesis that "platelets play a key role in the pathophysiology of NEC.^[44]" More importantly, our SHAP dependence analysis revealed a nonlinear "threshold effect" between PDW and NEC risk. This finding has potential clinical significance as it provides clinicians with an objective, quantifiable warning threshold.

The second most important predictor was Red Cell Distribution Width - Coefficient of Variation (RDW-CV)^[46]. RDW-CV quantifies the heterogeneity of red blood cell size, and its elevation is often associated with various pathological states such as malnutrition, inflammation, and oxidative stress. In neonates, elevated RDW-CV may reflect erythropoietin stress release, abnormal erythropoiesis, or shortened red blood cell lifespan. In the context of NEC, its significance may lie in:1) Involvement in inflammation and oxidative stress. 2) Reflecting nutritional status. Mean Corpuscular Hemoglobin (MCH) and Mean Corpuscular Hemoglobin Concentration (MCHC) were also selected into the final model, further emphasizing the importance of red blood cell parameters.

It is noteworthy that NEUT%, although selected by the model, ranked relatively low in importance. This seems contrary to conventional understanding since NEC is essentially an inflammatory disease^[5][6]. One possible explanation is that in the early stage before the clear onset of NEC (Bell stage II or above), systemic inflammation may not yet have reached a level causing a significant increase in neutrophils, while changes in platelets and red blood cells might be earlier and more sensitiv. Another possibility is that the changes in neutrophils have high inter-individual variability, and their "dynamic trend" might be more predictive than a single absolute value, which is a direction for future in-depth exploration.

In recent years, studies using ML to predict NEC have gradually increased. For example, Mou et al. (2025) developed prediction models based on databases from two medical centers^[18], and Song et al. (2022) constructed a nomogram for predicting Bell stage II/III NEC^[19].. These studies provide important references, but our work advances the field in several aspects:1) Data Dimension:We introduced the "two-time-point average" as a simplified dynamic concept, whereas most previous studies relied on a single admission value or the value closest to onset^[43][44]. 2) Model Performance:Our model's AUC reached 0.917, superior to many reported models (common AUC range 0.85–0.90). 3) Model Transparency and Interpretability:We conducted systematic SHAP analysis, providing not only feature importance ranking but also in-depth revelation of the direction and dose-response relationship between features and prediction outcomes, making the model's decision-making process transparent, understandable, and trustworthy. This aligns highly with the current direction in medical AI emphasizing "Trustworthy AI." 4) Immediate Applicability:We deployed the final model as an online, free web calculator. Clinicians only need to input 7 routine, easily obtainable blood parameters to obtain an individualized NEC risk probability in real-time, greatly facilitating the translation of research results into clinical practice.

Limitations

However, we must acknowledge the limitations of this study with a cautious attitude. First, the retrospective design is an inherent limitation. Although internal validation showed good robustness, the model's generalizability across different regions, hospital levels, and populations needs confirmation through large-scale, multi-center, prospective external validation. The retrospective design also cannot avoid data missingness and potential selection bias. Second, the scope of feature selection. We focused on routine blood and blood gas parameters, which are the most routine and easily available tests in the NICU, but inevitably ignored other potentially information-rich modal data. Future integration of multi-omics, multi-modal data is expected to build more powerful and accurate prediction models. Third, the simplification of dynamic processing. Using the average of two time points to approximate a dynamic process, while a clever and practical strategy, cannot compare with the information content contained in more dense time-series monitoring data. With the improvement of electronic medical record systems and the development of real-time data stream processing technology, building truly real-time dynamic prediction models will become possible.

Conclusion

In summary, this study develops and validates an interpretable machine-learning model for the early prediction of NEC in neonates, leveraging dynamic routine blood parameters. Beyond serving as an actionable clinical tool, the model's interpretability analyses provide novel insights into the potential pathophysiological roles of metrics like MPV and RDW-CV in early NEC development. This tool can assist clinicians in stratifying high-risk patients for targeted management, thereby supporting personalized intervention and potentially improving outcomes. Collectively, our work provides a digital framework to advance NEC care towards more precise and pre-emptive strategies.

Abbreviations

NEC Necrotising enterocolitis

NICU neonatal intensive care unit

ML Machine learning

LR Logistic regression

SVM Support vector machine

KNN K-nearest neighbors

XGBoost EXtreme Gradient Boosting

SHAP SHapley Additive exPlanations

AUC Area under the receiver operating characteristic curve

Declarations

Ethics approval and consent to participate

This study was reviewed and approved by the Ethics Committee of the Second Affiliated Hospital of Air Force Medical University (approval number:202504-02) and the Science and Technology Ethics Committee of Xi’an Medical University (approval number:XYYJSLS2025004)) and conducted under the guidance of the Declaration of Helsinki.

As a retrospective study utilizing exclusively anonymized data, the requirement for informed consent was waived.

All methods were carried out inaccordance with relevant guidelines and regulations.

Consent for publication

Not applicable.

Data Availability

All datas in this study should be requested from the corresponding authors.

Competing interests

The authors declare that they have no competing interests.

Funding

The authors acknowledge financial support for this investigation, including research execution, authorship activities, and publication costs. This work was supported by the National Natural Science Foundation of China (Grant No.:82270563, 82000522), the Fundamental Research Funds for the Central Universities (Grant No.:GK202205010).

Author Contribution

MY drafted the initial manuscript and contributed to the revisions. Y-XW executed statistical analyses. NF, A-DZ, H-WG, Z-BG, H-YW, H-FW and QZ conducted data acquisition. Y-CM,YL and XJ contributed to the development of methodological framework. YL, Y-CM and XJ conceived and designed the study, supervised the conceptualization process, and critically revised the manuscript for intellectual content. All authors rigorously reviewed the final draft and provided unanimous approval for submission.

Acknowledgement

We gratefully acknowledge the NICU of the Second Affiliated Hospital of Air Force Medical University and the Second Affiliated Hospital of Xi’an Medical University for providing patients information and data support.

Author details

Department of Pediatrics, Tangdu Hospital, Air Force Medical University, Xinsi Road, Baqiao District, 710038 Xi’an, Shaanxi, China

Neonatal Intensive Care Unit, The Second Affiliated Hospital of Xi'an Medical University,

Fangzheng Street, Baqiao District, 710038 Xi’an, Shaanxi, China

Xi'an Medical University, No. 1 Xinwang Road, Weiyang District, Xi'an 710021, Shaanxi Province, China.

School of Computer Science and Technology, Xidian University, No. 266 Xinglong Section, Xifeng Road, Xi'an 710126, Shaanxi Province, China.

Full list of author information is available at the end of the articletract

References

Wontae K, Jeong MS. Necrotizing Enterocolitis. N Engl J Med. 2020;383:2461.

Patricia W, Lin BJ. Stoll. Necrotising enterocolitis. Lancet. 2006;368:1271–83.

Marion CW, Henry RL. Moss. Necrotizing enterocolitis. Annu Rev Med. 2008;60:111–24.

Allison L, Speer KP, Lally CP, et al. Surgical Necrotizing Enterocolitis and Spontaneous Intestinal Perforation Lead to Severe Growth Failure in Infants. Ann Surg. 2024;280:432–43.

David JH, Chhinder PS. Bench to bedside - new insights into the pathogenesis of necrotizing enterocolitis. Nat Rev Gastroenterol Hepatol. 2022;19:468–79.

Bo L, Mina Y, Dorothy L et al. Exploring the Complex Pathophysiology of Necrotizing Enterocolitis in Preterm Neonates. Annu Rev Pathol, 2025; 21.

George SB, Ian HJ, Cheryl B, et al. Methods of identifying surgical Necrotizing Enterocolitis-a systematic review and meta-analysis. Pediatr Res. 2024;97:45–55.

Gephart SM, Gordon PV, Penn AH, et al. Changing the paradigm of defining, detecting, and diagnosing NEC:Perspectives on Bell's stages and biomarkers for NEC. Semin Pediatr Surg. 2017;27:3–10.

Misun H, Luis OTG, Rebecca AD, et al. The role of ultrasound in necrotizing enterocolitis. Pediatr Radiol. 2021;52:702–15.

10.

Guo HY, Li YZ, Wang LL. Assessment of inflammatory biomarkers to identify surgical/death necrotizing enterocolitis in preterm infants without pneumoperitoneum. Pediatr Surg Int. 2024;40(1):191.

11.

Ghauri AA, Shaukat Z, Chattha AA, et al. C-reactive Protein/Albumin Ratio as a Prognostic Indicator for Predicting Surgical Intervention in Neonates With Necrotizing Enterocolitis:A Prospective Cohort Study. Cureus. 2025;17(7):e87308.

12.

Reisinger KW, Kramer BW, Van Z, David C, et al. Non-invasive serum amyloid A (SAA) measurement and plasma platelets for accurate prediction of surgical intervention in severe necrotizing enterocolitis (NEC). PLoS ONE. 2014;9(3):17.

13.

Sbragia L, Gualberto IJN, Xia J, et al. Intestinal Fatty Acid-binding Protein as a Marker of Necrotizing Enterocolitis Incidence and Severity:a Scoping Review. J Surg Res. 2024;303:613–27.

14.

Dipak M, Rupa RS, Nisha KB, et al. Neonatal mortality risk assessment using SNAPPE- II score in a neonatal intensive care unit. BMC Pediatr. 2019;19(1):279.

15.

Wu PL, Lee WT, Lee PL, et al. Predictive power of serial neonatal therapeutic intervention scoring system scores for short-term mortality in very-low-birth-weight infants. Pediatr Neonatol. 2014;56(2):108–13.

16.

Fijas M, Vega M, Xie XH, Kim M, et al. SNAPPE-II and MDAS scores as predictors for surgical intervention in very low birth weight neonates with necrotizing enterocolitis. J Matern Fetal Neonatal Med. 2022;36:2148096.

17.

Chen XY, Li YQ, Liu YH, et al. Fulminant necrotizing enterocolitis:clinical features and a predictive model. BMC Pediatr. 2025;25(1):1–9.

18.

Mou YL, Li JH, Wang JJ, et al. The early prediction of neonatal necrotizing enterocolitis in high-risk newborns based on two medical center clinical databases. J Matern Fetal Neonatal Med. 2025;38(1):2521798.

19.

Song ST, Zhang J, Zhao YW, et al. Development and Validation of a Nomogram for Predicting the Risk of Bell's Stage II/III Necrotizing Enterocolitis in Neonates Compared to Bell's Stage I. Front Pediatr. 2022;10:863719.

20.

Huang P, Luo ND, Shi XQ, et al. Risk factor analysis and nomogram prediction model construction for NEC complicated by intestinal perforation. BMC Pediatr. 2024;24(1):143.

21.

Li XT, Zhang LT, Gao HJ, et al. The prediction models for the optimal timing of surgical intervention for necrotizing enterocolitis:nomogram vs. five machine learning models. Pediatr Surg Int. 2025;41(1):260.

22.

Sonja D, Lea EB, Julia M, et al. Prediction of High Bell Stages of Necrotizing Enterocolitis Using a Mathematic Formula for Risk Determination. Child (Basel). 2022;9(5):604.

23.

Min T, Ling Y, Yu L, et al. Development and validation of a nomogram model for predicting the occurrence of necrotizing enterocolitis in premature infants with late-onset sepsis. Eur J Med Res. 2025;30(1):595.

24.

Cui C, Qiu L, Li L, et al. A time series algorithm to predict surgery in neonatal necrotizing enterocolitis. BMC Med Inf Decis Mak. 2024;24(1):304.

25.

Lin YC, Salleb-Aouissi A, Hooven TA. Interpretable prediction of necrotizing enterocolitis from machine learning analysis of premature infant stool microbiota. BMC Bioinformatics. 2022;23(1):104.

26.

Masi AC, Embleton ND, Lamb CA, et al. Human milk oligosaccharide DSLNT and gut microbiome in preterm infants predicts necrotising enterocolitis. Gut. 2021;70(12):2273–82.

27.

Committee on Fetus and Newborn. Hospital discharge of the high-risk neonate. Pediatrics. 2008;122(5):1119–26.

28.

Miron B. Kursa,Witold Rudnicki. Feature Selection with Boruta Package. J Stat Softw. 2010;36(11):1–13.

29.

Ravishankar H, Radhika M, Mullick R et al. Recursive feature elimination for biomarker discovery in resting-state functional connectivity. 2016 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

30.

O'Brien CM. Statistical Learning with Sparsity:The Lasso and Generalizations. Int Stat Rev. 2016;84(1):156–7.

31.

Kelly C, Okada K et al. Variable interaction measures with random forest classifiers. 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI).

32.

Obuchowski NA, Bullen JA. Receiver operating characteristic (ROC) curves:review of methods with applications in diagnostic medicine. Phys Med Biol. 2018;63(7):07TR01.

33.

Liu R, Zhu et al. On the consistent estimation of optimal Receiver Operating Characteristic (ROC) curve. 2022.36th Conference on Neural Information Processing Systems, NeurIPS.

34.

Yu J, Yang L, et al. Easy and accurate variance estimation of the nonparametric estimator of the partial area under the ROC curve and its application. Stat Med. 2016;35(13):2251–82.

35.

Felix M, Jan N, Van R. Fast and Informative Model Selection Using Learning Curve Cross-Validation. IEEE Trans Pattern Anal Mach Intell. 2023;45:9669–80.

36.

Andrew AK, Jack EZ. Assessing the calibration of mortality benchmarks in critical care:The Hosmer-Lemeshow test revisited. Crit Care Med. 2007;35(9):2052–6.

37.

Kerr KF, Wang ZY, Janes H, et al. Net Reclassification Indices for Evaluating Risk Prediction Instruments:A Critical Review. Epidemiology. 2014;25(1):114–21.

38.

Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the Performance of Prediction Models:A Framework for Traditional and Novel Measures. Epidemiology. 2010;21(1):128–38.

39.

Lundberg S, Lee S et al. A Unified Approach to Interpreting Model Predictions.2017 NIPS.

40.

Goecks J, Jalili Vh, Heiser LM, et al. How Mach Learn Will Transform Biomed Cell. 2020;181(1):92–101.

41.

Goldstein BA, Scales CD, Cerullo M, Mureebe L, et al. Development and Performance of a Clinical Decision Support Tool to Inform Resource Utilization for Elective Operations. JAMA Netw open. 2020;3(11):e2023547.

42.

Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems; 2017.

43.

Michalski A, Duraj K, Kupcewicz B. Leukocyte deep learning classification assessment using Shapley additive explanations algorithm. Int J Lab Hematol. 2023;45:297–302.

44.

Kasirer Y, Shchors I, Hammerman C, Bin A, et al. Platelet Indices:Universally Available Clinical Adjunct for Diagnosing Necrotizing Enterocolitis. Am J Perinatol. 2023;41:e1575–80.

45.

Jabeen J, Jha S, Garg V, Datta S, et al. Normative Data on Platelet Count, Mean Platelet Volume, Platelet Distribution Width, Platelet-Large Cell Ratio, and Plateletcrit in Neonates. Cureus. 2025;17:e89293.

46.

Salas AA, Gunn E, Carlo WA, Bell EF, et al. Timing of Red Blood Cell Transfusions and Occurrence of Necrotizing Enterocolitis:A Secondary Analysis of a Randomized Clinical Trial. JAMA Netw Open. 2024;7:e249643.

Yes

Abstract

Background Necrotising enterocolitis (NEC) is a major cause of morbidity and mortality in neonates, particularly among preterm infants. Identifying effective methods for early prediction is crucial for developing personalised treatment strategies and improving patient outcomes. Given the significant limitations of existing scoring systems and prediction models, there remains a pressing need to establish novel models for assessing NEC risk To this end, we developed and validated an interpretable machine learning (ML) model and deployed a web-based calculator for the early prediction of NEC onset in the neonatal intensive care unit（NICU）. Methods We collected complete blood count parameters within the first 24 hours after birth and during the second postnatal week from 116 infants with NEC and 233 non-NEC infants admitted to the NICU of the Second Affiliated Hospital of Air Force Medical University and the Second Affiliated Hospital of Xi’an Medical University between January 2012 and January 2025, and calculated their mean values. Six different ML algorithms were applied to construct classification models for the development of a predictive tool for NEC diagnosis. We quantified model performance using metrics including the area under the receiver operating characteristic curve (AUC). the final model was interpreted using SHapley Additive exPlanations (SHAP), which also quantified feature importance. Results Among the six ML models evaluated, the XGBoost algorithm demonstrated superior performance. It achieved an AUC of 0.917 (95% CI:0.858–0.977), an accuracy of 0.8952 (95% CI:0.8203–0.9465), and a no information rate of 0.6667. Additional performance metrics included a sensitivity of 0.7429, specificity of 0.9714, positive predictive value (PPV) of 0.9286, negative predictive value (NPV) of 0.8831, precision of 0.9286, recall of 0.7429, and an F1-score of 0.8254.The calibration curve indicated a strong agreement between predicted probabilities and observed outcomes. SHAP analysis was employed to identify and rank the contribution of key features to the model's predictions. Furthermore, we developed a user-friendly, web-based calculator based on the final XGBoost model, accessible to clinicians at https://nec.yujincheng.cn/. This final model incorporated seven hematological parameters:mean platelet volume (MPV), red cell distribution width coefficient of variation (RDW-CV), mean corpuscular hemoglobin (MCH), white blood cell count (WBC), neutrophil percentage (NEUT%), platelet distribution width (PDW), and mean corpuscular volume (MCV). Conclusion Leveraging hematological parameters from the first two postnatal weeks, we developed and validated a robust and interpretable XGBoost model for predicting the risk of NEC. This tool facilitates early identification of high-risk neonates by clinicians and provides a foundation for personalizing therapeutic strategies. Furthermore, this study provides substantial digital support for advancing NEC prevention and management towards a more precise, personalized, and proactive paradigm.