Comparative Evaluation of Ensemble Machine Learning Models for Predicting Antimicrobial Resistance from Electronic Health Records

ZohoorAlmalki1✉Emails44680217@uqu.edu.sa

AmjadAlthagafi1✉Emails44680033@uqu.edu.sa

SarahAl-Shareef1✉Emailsaashareef@uqu.edu.sa

1Computer Science and Artificial IntelligenceUmm Al-Qura UniversityMakkahSaudi Arabia

Zohoor Almalki, Amjad Althagafi and Sarah Al-Shareef

Computer Science and Artificial Intelligence, Umm Al-Qura University, Makkah, Saudi Arabia.

*Corresponding author(s). E-mail(s): s44680217@uqu.edu.sa; Contributing authors: s44680033@uqu.edu.sa; saashareef@uqu.edu.sa;

Abstract

Antimicrobial resistance (AMR), the ability of microbes to survive exposure to drugs intended to eliminate them, is a critical global health concern exacerbated by the overuse and misuse of antibiotics. In this study, we leverage machine learning techniques to predict AMR and evaluate the performance of several advanced supervised algorithms. Using the Antibiotic Resistance Microbiology Dataset (ARMD), a detailed electronic health record (EHR) dataset containing rich clinical, demographic, microbiological, and treatment data from Stanford Healthcare [1, 2].

Our approach involves robust data preprocessing to predict the likelihood that a patient’s bacterial iso- late responds to a specific antibiotic as either resistant or susceptible, based on clinical characteristics, microbiological findings, treatment history, and demographic information. We compare the perfor- mance of state-of-the-art machine learning models, including XGBoost, LightGBM, Random Forest, and HistGradientBoostingClassifier, in building reliable predictive models of antibiotic susceptibility. By benchmarking these models on a large real-world dataset, this research identifies effective pre- dictive strategies that can support antimicrobial stewardship, enhance clinical decision-making, and contribute to addressing the growing challenge of AMR.

Keywords:

antimicrobial resistance

electronic health records

machine learning

gradient boosting

random forest

clinical decision support

1 Introduction

An important worldwide health concern is antimi- crobial resistance (AMR), which is made worse by the overuse and abuse of antibiotics.

In this study, we leverage machine learn- ing techniques to predict antimicrobial resistance (AMR) and systematically compare the perfor- mance of multiple supervised algorithms. We utilize the Antibiotic Resistance Microbiology

Dataset (ARMD), which is a detailed electronic health record (EHR) dataset that encompasses extensive clinical, demographic, microbiological, and treatment data from Stanford Healthcare [1, 2].

We systematically compare the predictive per- formance of multiple advanced supervised ML algorithms, including XGBoost, LightGBM, Ran- dom Forest, and HistGradientBoostingClassifier, to create resilient predictive models that can

classify bacterial susceptibility to different antibi- otics. The proposed methodology involves select- ing a sample of 1,000 patients—500 with sus- ceptible infections and 500 with resistant infec- tions—ensuring comprehensive inclusion of clin- ical, microbiological, and treatment information across all their relevant hospital encounters.

These findings have the potential to support antimicrobial stewardship efforts, enhance clini- cal decision-making, and contribute meaningfully to the global fight against AMR. By conducting comprehensive benchmarking on a large-scale real- world dataset, this study identifies the most effec- tive machine learning approaches for predicting antimicrobial resistance.

2 Related Work

Recent advances in machine learning (ML) have shown significant promise in improving antimicro- bial resistance (AMR) prediction and guiding the clinical use of antibiotics. Numerous studies have explored various types of data and ML algorithms to enhance the accuracy and speed of resistance detection.

Talamantes-Becerra et al. (2024) developed ML models based on genomic data to fore- cast ciprofloxacin resistance in Escherichia coli, using AMRFinderPlus gene annotations and k- mer frequencies as key features. Their research demonstrated that integrating biologically anno- tated and reference-free genomic features through ensemble models such as XGBoost resulted in accuracy exceeding 90%, highlighting the poten- tial of high-dimensional genomic data for AMR prediction. However, this approach depends on whole-genome sequencing, which may not be read- ily available in clinical settings [3]. In contrast, our study leverages the Antibiotic Resistance Micro- biology Dataset (ARMD), a comprehensive elec- tronic health record (EHR) dataset that includes clinical, demographic, microbiological, and treat- ment data [1, 2]. This enables a more practical and scalable framework for real-time AMR pre- diction using data commonly available in hospital environments.

Tejeda et al. (2024) validated iAST, a machine learning-based decision support tool designed to guide both empirical and organism-targeted antibiotic therapy across a network of hospitals in

Spain. By leveraging patient demographic infor- mation and historical antibiogram data, iAST achieved empirical therapy success rates exceed- ing 91%. Unlike many model development studies, this work focused on clinical deployment and demonstrated the real-world impact of ML tools on antibiotic stewardship [4]. While Tejeda et al.’s study highlights deployment and clinical effective- ness, our work centers on a systematic evaluation of multiple state-of-the-art ML algorithms applied to a comprehensive clinical dataset [1, 2]. Our goal is to identify the most robust models for potential integration into decision support systems.

Smak Gregoor et al. (2023) evaluated a mobile health application powered by AI for the detection of skin cancer, illustrating the broader applica- bility of AI in healthcare. Although not directly related to AMR, their findings on the risks of overdiagnosis and increased resource burden are highly relevant to AI-driven AMR tools, where false positives can lead to unnecessary treatments and clinical strain [5]. Our study addresses these concerns through rigorous model validation and balanced sampling strategies, aiming to reduce false predictions and enhance clinical reliability.

Babirye et al. (2024) predicted resistance to key anti-tuberculosis drugs by integrating whole-genome sequencing data with clinical vari- ables from Mycobacterium tuberculosis isolates in Uganda. They employed logistic regression, SVM, and XGBoost, observing modest improvements in predictive accuracy when clinical features were included [6]. In contrast to their pathogen-specific approach, our study evaluates Random Forest and multiple Gradient Boosting algorithms across a wide range of antibiotics and organisms, using a multi-pathogen EHR dataset [1, 2]. This broad- ens the applicability of our findings to diverse resistance scenarios.

Kim et al. (2021) developed machine learning models to predict antibiotic resistance in urinary tract infections (UTIs) using urine culture and susceptibility data. They applied Random For- est, XGBoost, and logistic regression, reporting varying AUROC scores depending on the antibi- otic. While effective, their study focused on a narrower clinical context and employed a limited set of ML techniques [7]. In contrast, our work systematically evaluates multiple advanced boost- ing algorithms and Random Forest on a larger

and more diverse clinical dataset [1, 2], aim- ing to develop a generalized framework for AMR prediction across multiple infection types.

P´erez de la Lastra et al. (2024) sur- veyed a broad range of machine learning approaches—including supervised, unsupervised, deep learning, and reinforcement learning—for AMR prediction. They emphasized the current effectiveness of supervised methods leveraging genomic and clinical data, while also noting the growing promise of deep learning and reinforce- ment learning for treatment optimization [8]. Building on this foundation, our study focuses on supervised learning, specifically evaluating Ran- dom Forest and gradient-boosting models that were applied to a rich clinical EHR dataset [1, 2]. We incorporate rigorous preprocessing and targeted strategies for handling class imbalance, addressing methodological gaps often overlooked in prior work.

Tran Quoc et al. (2023) used machine learn- ing models to predict AMR in ICU patients using EMR data from two hospitals in Vietnam. XGBoost outperformed other models in terms of AUROC and F1 scores, demonstrating the value of integrating clinical and microbiological data in resource-limited settings [? ]. Building on this work, our study evaluates a Random Forest and broader range of gradient boosting algorithms, incorporates advanced sampling techniques, and validates performance on a larger and more diverse clinical dataset [1, 2], with the goal of developing models with improved generalizability.

I˙lhanlı et al. (2024) developed and validated machine learning models to predict antibiotic resistance in UTI patients using electronic medical record (EMR) data from a South Korean hospital, incorporating 71 clinical features. Their mod- els achieved moderate AUROC scores (0.63–0.72) and were integrated into a clinical decision sup- port system, with an emphasis on interpretability through SHAP explanations [9]. Similarly, our study not only prioritizes model interpretabil- ity alongside predictive performance but also extends this approach by leveraging Random Forest and multiple gradient-boosting algorithms and addressing class imbalance using advanced sampling techniques on a multi-antibiotic, multi- pathogen dataset [1, 2].

Pikalyova et al. (2024) investigated AMR pre- diction for Staphylococcus aureus using genomic

data and Generative Topographic Mapping (GTM). While GTM demonstrated lower pre- dictive performance compared to conventional machine learning models, it offered valuable inter- pretability through the visualization of genomic resistance landscapes [10]. In our study, we focus on supervised classifiers—specifically Ran- dom Forest and gradient-boosting models that are trained on clinical and microbiological EHR data—aiming to achieve a practical balance between predictive accuracy and clinical applica- bility [1, 2].

In conclusion, prior research has advanced antimicrobial resistance (AMR) prediction using diverse data sources, machine learning mod- els, and clinical applications. However, many of these studies depend heavily on genomic data or focus narrowly on specific pathogens or antibi- otics, which limits their generalizability and clin- ical utility. Our study distinguishes itself by leveraging the Antibiotic Resistance Microbiol- ogy Dataset (ARMD), a comprehensive clinical electronic health record resource [1, 2], to sys- tematically evaluate Random Forest and several advanced gradient-boosting algorithms. Further- more, we employ rigorous data preprocessing, incorporate targeted strategies to address class imbalance, and perform thorough model valida- tion. These combined efforts support the devel- opment of robust and interpretable predictive models with strong potential to enhance clinical decision-making and antimicrobial stewardship.

3 Data

3.1 Dataset and De-Identification

The Antibiotic Resistance Microbiology Dataset (ARMD) consists of fully de-identified electronic health record data from Stanford Healthcare [1, 2]. To ensure patient privacy and comply with data-sharing policies, the dataset employs the fol- lowing de-identification measures: (i) unique ran- domly generated identifiers for patients and cul- ture orders, (ii) temporal de-identification through jittering of timestamps while preserving temporal relationships, (iii) age censoring with predefined bins and grouping of patients aged 90 years or older, (iv) gender encoding as binary values with- out explicit male/female labels, and (v) exclusion of all direct patient identifiers. All demographic

and clinical details are provided in a de-identified format.

3.2 Cohort Selection and Balancing

To ensure balanced representation for model development, a targeted sampling strategy was employed. From the full ARMD dataset [1, 2], a cohort of 1,000 patients was selected based on their culture results, comprising 500 with ‘Suscep- tible’ outcomes and 500 with ‘Resistant’ outcomes. These patients corresponded to a total of 997 unique clinical encounters. This predefined cohort served as the foundation for all subsequent data extraction and integration processes.

3.3 Data Preparation

For each relevant dataset—microbiology cultures cohort, prior medications, demographics, labora- tory results, vital signs, antibiotic class exposure, and comorbidity—records were filtered to include only those associated with the selected cohort. Each file was then preprocessed independently, involving handling of missing values, encoding of categorical variables, and generation of statisti- cal and temporal features. After preprocessing, the individual datasets were merged into a unified file using consistent patient and culture identi- fiers through a series of left joins. This integration preserved all available information for the target cohort and resulted in a final dataset of shape (4,982,642 rows, 47 columns). The distribution of susceptibility categories is shown in Fig. 1.

The final dataset contains a total of approx- imately 5 million records, with 2,697,708 labeled as Susceptible and 2,284,934 as Resistant. This results in a mild class imbalance (approximately 54% vs 46%), which is generally not severe enough to negatively impact model performance. Most modern machine learning algorithms, especially tree-based models like XGBoost and Random For- est, are robust to such small imbalances and can handle them effectively without additional rebalancing techniques. The merged dataset was subsequently split into training and testing sub- sets using an 80/20 ratio, yielding a training set of shape (3,986,113 rows, 46 columns) and a test- ing set of shape (996,529 rows, 46 columns), which were used for model development and evaluation.

Fig. 1

Distribution of Susceptibility Categories – After a series of left joins. The dataset includes 2,697,708 suscep- tible (class 0) and 2,284,934 resistant (class 1) samples.

3.4 Data Components and Feature Engineering

The dataset integrates information from multiple clinical domains. Key features were extracted and engineered as follows:

Culture and Susceptibility Information: Each record represents a microbiological culture linked to a patient encounter. This component includes identifiers for the patient and culture order, a timestamp for the culture collection, and integer-encoded features for the identified organism and tested antibiotic. The key target variable is susceptibility, representing the result of the antibiotic susceptibility test, classified as either Resistant or Susceptible.

Patient Demographics: This component includes essential demographic features: age, represented as an ordinal categorical variable grouped into meaningful age brackets (e.g., ’18–24 years’, ’25–34 years’, ..., ’above 90’), and gender, encoded as a categorical integer. These features capture population variability and are linked to each culture record via patient and procedure identifiers. The distribution of age groups by gender is illustrated in Fig. 2.

Fig. 2

Age Group Distribution by Gender

Laboratory Results: Laboratory data include core clinical biomarkers such as white blood cell count (WBC), neutrophils, lymphocytes, hemoglobin (HGB), platelets (PLT), sodium (Na), bicarbonate (HCO₃), blood urea nitrogen (BUN), creatinine (Cr), lactate, and procalci- tonin. For each, we selected median values to capture central physiological tendencies. Addi- tionally, delta values (defined as the difference between last and first measurements) were cal- culated where both values were available. How- ever, many delta features were found to be extremely sparse (with some having no non-null values), and those with 0 or < 1% complete- ness were excluded. The resulting lab feature set primarily consists of robust median-based sum- maries, while retaining only a few delta features

with usable coverage. Models capable of han- dling missing data, such as XGBoost, were used to mitigate the impact of remaining sparsity.

Vital Signs: Vital sign data (e.g., heart

rate, respiratory rate, temperature, systolic and diastolic blood pressure) were provided as pre-aggregated values without raw time- stamped measurements. For each variable, we retained the median, first, and last values when available. We then computed delta features (e.g., delta heartrate) representing the change between the first and last measurements to cap- ture temporal trends. Due to significant sparsity in the first/last readings, most delta features were missing for a large subset of patients. To balance feature richness and sample size, we retained rows with at least one valid delta mea- surement while preserving the robust median summaries. This resulted in a time-aware, yet broadly applicable, representation of physiolog- ical states across patients.

Comorbidity Profile: Original comorbidity data included over 170 distinct conditions, many of which were rare, redundant, or overly specific. To create a more clinically mean- ingful and model-efficient representation, we grouped related conditions into broader cate- gories based on clinical relevance and domain knowledge. The resulting feature set consists of 11 binary indicators representing major comor- bidity domains, including diabetes, congestive heart failure (CHF), chronic kidney disease (CKD), cancer, transplant history, immunosup- pression, chronic obstructive pulmonary disease

Fig. 3

Comorbidity Frequency in Dataset

4 Methods

A number of machine learning models were devel- oped and tested for AMR prediction. These mod- els were trained on 47 features extracted from the dataset, including demographic, clinical, and microbiologic information, as well as laboratory results, antibiotic exposure history, and other rele- vant variables [1, 2]. The correlations among these features are visualized in Fig. 4.

Fig. 4

Correlation Heatmap

4.1 Model Selection

We selected a series of gradient-boosting models and Random Forest due to their effectiveness in handling imbalanced datasets, high-dimensional feature spaces, and missing values. These mod- els are known for their ability to capture complex nonlinear relationships, making them well-suited for predicting antimicrobial resistance (AMR) based on diverse clinical, microbiological, and treatment-related variables [1, 2].

LightGBM (LGBMClassifier): Chosen for its speed and low memory usage. Although LightGBM supports native categorical han- dling, in our setup all features were prepro- cessed and converted into numeric format. The model remained effective and efficient for high- dimensional structured data.

Random Forest (RandomForestClassi- fier): A decision tree-based ensemble learning approach was selected for its robustness to noise and its ability to deal with high dimension- ality. In our case, all features were numeri- cal, and missing values were retained as-is (no imputation), leveraging the inherent ability of tree-based models like Random Forest to han- dle nulls via surrogate splits during training. This approach preserves the original data dis- tribution while avoiding potential bias from imputation.

HistGradientBoostingClassifier (Scikit- learn): A native Scikit-learn model that offers a trade-off between runtime efficiency and predictive performance. It is suitable for struc- tured data and offers competitive results with minimal overhead.

4.2 Experimental Design

Each model was trained on the ARMD dataset and evaluated using an 80/20 train-test split [1, 2]. The models included XGBoost, LightGBM, Ran- dom Forest, and HistGradientBoostingClassifier. Performance was assessed by classifying resistant

versus susceptible cases using precision, recall, F1-score, and ROC AUC as evaluation metrics.

4.3 Statistical analysis

All analyses were performed using Python (3.11.5) with open-source libraries including scikit-learn, XGBoost, and LightGBM. To ensure reproducibil- ity, experiments were run with fixed random seeds. Class imbalance was addressed in XGBoost using the scale pos weight parameter, while other models were trained with default settings. Large language models (ChatGPT, OpenAI) were used only for language editing and formatting; all out- puts were reviewed and verified by the authors.

5 Results

We evaluated the performance of XGBoost, Light- GBM, HistGradientBoosting and Random Forest on the ARMD dataset for antimicrobial resistance (AMR) prediction [1, 2]. The models were assessed using key classification metrics: precision, recall, F1-score, and ROC AUC, with a focus on their ability to distinguish between resistant (class 1) and susceptible (class 0) cases.

5.1 Classification Metrics

Each model was tested on a held-out dataset containing 996,529 records. Table 1 summarizes precision, recall, and F1-score for both classes. In addition, the ROC AUC for each model is reported in Table 2. The comparative ROC curves of the four models are shown in Fig. 5.

5.2 Summary of Findings

XGBoost achieved the highest recall (88%) for resistant cases, making it particularly valu- able when the priority is to minimize missed resistance. It also delivered the best ROC AUC (0.9020), indicating strong discriminative power overall.

LightGBM and HistGradientBoosting

demonstrated the highest precision (80%) for resistant cases, suggesting they are better at reducing false positives—important when avoiding unnecessary antibiotic use.

Random Forest provided balanced perfor-

mance, achieving a resistant precision of 75% and recall of 66%. but showed slightly lower

Fig. 5

ROC Curve Comparison for the four models.

overall performance compared to the boosting models, it was still highly competitive, espe- cially for situations that favored model inter- pretability and robustness over the marginal performance gains.

For susceptible cases, all models demonstrated

strong performance, with recall reaching 85% in LightGBM and HistGradientBoosting. This high recall is beneficial for confidently identify- ing patients who are unlikely to need aggressive antimicrobial therapy.

5.3 Model Interpretability using SHAP

To enhance the interpretability of the model, SHAP analysis (SHapley additive explanations) was applied to the XGBoost model. SHAP values explain the individual features’ contribution to the model’s predictions by measuring the influence on the output probability. Figure 6 is a SHAP sum- mary plot indicating the most influential features for the classification of the antibiotic resistance. The plot reveals that antibiotic type and bacterial organism were among the most influential features in the model’s predictions. aligning with known microbiological principles. Clinical variables, such as median systolic blood pressure (median sysbp), bicarbonate (median hco3) and platelet count (median plt) were also exhibited significant influ- ence. Transplant history (has transplant) emerged as a key risk factor, notably highlighting the effect of immunosuppression on resistance development.

Table 1
Detailed class-wise performance metrics for AMR prediction.
Model	Class	Precision	Recall	F1-Score
XGBoost	Susceptible (0) Resistant (1)	0.87 0.71	0.70 0.88	0.78 0.79
LightGBM	Susceptible (0) Resistant (1)	0.77 0.80	0.85 0.70	0.81 0.74
Random Forest	Susceptible (0) Resistant (1)	0.74 0.75	0.81 0.66	0.77 0.70
HistGradientBoosting	Susceptible (0) Resistant (1)	0.77 0.80	0.85 0.70	0.81 0.74

Table 2
ROC AUC values for each model.
Model	ROC AUC
XGBoost	0.9020
LightGBM	0.8985
Random Forest	0.8764
HistGradientBoosting	0.8985

These insights validate the model’s clinical rele- vance and illustrate how SHAP enhances trans- parency, connecting the predictions with medi- cal meaningful features. Such interpretability is important for building trust in AI-driven decision support tools, allowing clinicians to not only rely on predictions but also understand the rationale behind them. especially in critical contexts like antimicrobial resistance.

6 Discussion

The goal of this study was to compare the per- formance of three advanced gradient boosting models—XGBoost, LightGBM, and HistGradi- entBoosting—along with Random Forest on the ARMD dataset [1, 2] for predicting antimicro- bial resistance (AMR). These models were chosen for their robustness with imbalanced data, ability to handle high-dimensional inputs, and demon- strated success in binary classification problems. The findings demonstrate strong predictive per- formance across all models. XGBoost emerged as the best-performing model for detecting resis- tant infections, while LightGBM and HistGradi- entBoosting offered more balanced performance and proving computationally efficient and reliable. Random Forest had a bit lower overall perfor- mance and ROC AUC compared to the boosting models, it still achieved good results, especially

Fig. 6

SHAP summary plot showing the impact of top features on model output for XGBoost

for the susceptible class and it represents a strong baseline model.

6.1 Interpretation of Results

The consistently high performance of all four mod- els is encouraging for AMR prediction from elec- tronic health records [1, 2]. XGBoost achieved the highest recall (88%) for resistant cases, indicating it correctly identified 88% of true resistant infec- tions. However, its lower precision (71%) suggests 29% of predicted resistant cases were false posi- tives—a trade-off that may justify overtreatment

in high-risk scenarios. LightGBM and HistGradi- entBoosting achieved the highest precision (80%) for resistant cases, meaning only 20% of pre- dicted resistant cases were false positives. This reduces unnecessary antibiotic use, a key goal of stewardship programs aiming to curb resistance. Notably, both models handled null values natively and trained efficiently—HistGradientBoosting’s lower memory overhead may advantage resource- constrained settings. While Random Forest had lower AUC (0.8764) than boosting models, its moderate precision (75%) and recall (66%)—com- bined with inherent interpretability (e.g., feature importance)—make it viable for clinical audits or regulatory contexts where explainability is man- dated, despite marginally lower accuracy. Light- GBM and HistGradientBoosting produced identi- cal performance metrics (precision/recall/AUC), likely due to their shared gradient-boosting foun- dations. However, LightGBM’s histogram-based optimization (evidenced by training logs showing row-wise multi-threading) may offer faster train- ing on datasets exceeding millions of records. All models natively handled missing values with- out imputation—XGBoost and LightGBM assign nulls to optimal branches during splitting, while Random Forest uses surrogate splits. This pre- serves data integrity, critical for EHRs where missingness may reflect clinical decisions (e.g., omitted tests).

6.2 Clinical Implications

The choice of model for clinical deployment should be guided by institutional risk tolerance and treat- ment priorities:

XGBoost is optimal when minimizing missed resistant infections is critical (e.g., in ICU set- tings or immunocompromised patients), given its 88% recall. While this may lead to 29% false positives (precision: 71%), the clinical cost of undertreatment outweighs the risks of overtreatment in high-mortality scenarios.

LightGBM/HistGradientBoosting are preferable for antibiotic stewardship programs, where reducing unnecessary prescriptions is paramount. Their 80% precision means only

1 in 5 predicted resistant cases is a false alarm, minimizing collateral damage to the microbiome and resistance selection pressure.

Random Forest serves best in explanatory use cases (e.g., root-cause analysis or qual- ity audits) due to its inherent interpretabil- ity, despite its lower recall (66%). It should be avoided when missing true resistance could prove catastrophic.

Implementation requires integration with clinical workflows:

Flagging high-risk cases (XGBoost predictions) for rapid diagnostic confirmation.

Using LightGBM/HGB outputs to de-escalate

therapy when resistance is unlikely.

Embedding RF’s feature importance outputs to identify resistance drivers.

This data-driven approach could reduce mortality from untreated resistance while curbing unneces- sary antibiotic use - addressing both arms of the AMR crisis.

7 Future Work

Future work should aim to further improve model performance, generalizability, and real-world util- ity. Key directions include:

Incorporating additional data sources: Including clinical notes (via natural language processing), genomic information, and real-time monitoring data could enhance the model’s ability to capture complex factors influencing resistance.

External validation across institutions:

Evaluating the model on datasets from different hospitals, geographic regions, or populations is crucial to assess robustness and reduce potential biases.

Expanding interpretability for clinical

decision-making

While SHAP analysis was applied in this study, future work could explore more interactive or user-facing explanations (e.g., dashboards or counterfactual explana- tions) to better support clinical adoption.

Prospective evaluation in real-time set-

tings

Deploying the model in a live clinical environment and measuring its impact on pre- scribing practices and patient outcomes would provide strong evidence of its utility.

Model adaptation over time: Given the

evolving nature of antimicrobial resistance, incorporating continual learning or periodic

retraining strategies would help maintain pre- dictive accuracy over time.

8 Conclusion

This study demonstrates the effectiveness of machine learning models, including advanced gradient-boosting techniques and Random Forest, for AMR prediction using structured EHR data [1, 2]. All models performed competitively, with XGBoost providing the best recall for resistant infections and the highest ROC AUC. Light- GBM and HistGradientBoosting offered strong alternative solutions with balanced performance. Although there has been a slightly decreased over- all performance of Random Forest, it provided solid results, especially for susceptible cases, and it serves as a reliable baseline model. With fur- ther refinement and integration into clinical sys- tems, these models hold significant potential for improving antimicrobial decision-making, reduc- ing unnecessary treatment, and combating the global threat of antibiotic resistance.

Declarations

Ethics approval

This study was conducted in accordance with the principles of the Declaration of Helsinki and all relevant institutional and national research ethics guidelines.

The ARMD dataset was generated under ethical oversight and approved by the Stan- ford University Institutional Review Board (IRB; eProtocol #70466).

Consent to participate

Patient consent was waived by the Stanford Uni- versity IRB because the study used de-identified, retrospective electronic health record data with minimal risk to participants.

Consent for publication

Not applicable.

Data Availability

The electronic health record–derived ARMD dataset analyzed in this study is publicly available on the Dryad repository at https://doi.org/ 10.5061/dryad.jq2bvq8kp.

Competing interests

The authors declare no financial or non-financial competing interests.

Funding

No external funding was received for this study.

Author Contribution

Z.A. (Zohoor Almalki) designed the study, performed data preprocessing and analysis, implemented the machine learning models, and generated figures and tables. Z.A. and A.A. (Amjad Althagafi) drafted and revised the manuscript.S.A. (Sarah Al-Shareef) supervised the research, contributed to study design, and provided critical revisions. All authors read and approved the final manuscript.

S.A. (Sarah Al-Shareef) supervised the research, contributed to study design, and provided critical revisions. All authors read and approved the final manuscript.

Acknowledgement

The authors would like to thank Umm Al-Qura University, specifically the Department of Computer Science and Artificial Intelligence, for their support and guidance.

References

Nateghi Haredasht F et al. Antibiotic Resistance Microbiology Dataset (ARMD): A De-identified Resource for Studying Antimicrobial Resistance Using Electronic Health Records. Dryad Digital Repository. [Dataset] (2025). https://doi.org/10.5061/dryad.jq2bvq8kp

Nateghi Haredasht F et al. Antibiotic Resistance Microbiology Dataset (ARMD). https://arxiv.org/abs/2503.07664. Preprint, Stanford Healthcare (2025).

Talamantes-Becerra B, Kang AJ, Gilbert KA, Lindsay BG, Leroux H. Predic- tion of antibiotic susceptibility in E. coli isolates. In: Bichel-Findlay J, editor. Studies in Health Technology and Informatics. Volume 318. Amsterdam, The Netherlands: IOS; 2024. pp. 150–5. https://doi.org/10.3233/SHTI240907.

Tejeda MI, Fern´andez J, Valledor P, Almirall C, Barber´an J, Romero-Brufau S et al. Retrospective validation study of a machine learning–based software for empirical and organism-targeted antibiotic therapy selection. Antimicrobial Agents and Chemotherapy 68(10), 0077724 (2024) https: //doi.org/10.1128/aac.00777-24

Smak Gregoor AM, Heijden JP, Ver- haegh ME, Witkamp L, Buis PA, Karssemeijer N. An artificial intelligence based app for skin cancer detection evaluated in a population based setting. npj Digit Med. 2023;6(90):1–8. https://doi.org/10.1038/s41746-023-00831-w.

Babirye SR, Nsubuga M, Mboowa G, Batte C, Galiwango R, Kateete DP. Machine learning-based prediction of antibi- otic resistance in Mycobacterium tuberculosis clinical isolates from uganda. BMC Infect Dis. 2024;24(1391). https://doi.org/10.1186/s12879-024-09225-9.

Kim C, Park RW, Rhie SJ. Develop- ment of a machine learning prediction model to select empirical antibiotics in patients with clinically suspected urinary tract infection using urine culture data. Open Forum Infec- tious Diseases 8(Suppl. 1), 196 (2021) https: //doi.org/10.1093/ofid/ofab123

Lastra JM, Wardell SJT, Pal T, Fuente- Nu´n˜ez C, Pletzer D. From data to deci- sions: Leveraging artificial intelligence and machine learning in combating antimicrobial resistance – a comprehensive review. Jour- nal of Medical Systems 48(71) (2024) https: //doi.org/10.1007/s10916-024-02153-6

I˙lhanlı N, et al. Prediction of antibiotic resistance in patients with a urinary tract infection: Algorithm development and vali- dation. JMIR Med Inf. 2024;12:51326. https://doi.org/10.2196/51326.

10.

Pikalyova K, Orlov A, Horvath D, Mar- cou G, Varnek A. Predicting Staphylo- coccus aureus antimicrobial resistance with interpretable genomic space maps. Molecular Informatics 43(5), 202300263 (2024) https.

//doi.org/10.1002/minf.202300263

Yes