Research on the Generalization of a Blood Donor Recruitment Framework Based on Machine Learning

ZihanZhang1,2

XiaofanYe3

YudongJiao4

MingyuanWang5

LinaCai3

ZhenggangLi4

HuiminYan3

ChunZhou3

YuchiMa3

YinguangTang3

TianbaoSun3

RenhuaDiao4

DexingLiu4

EnyongFan4

ShaoboLi5

ShouguangXu4✉Emailxsg0457@sina.com

WeibinYan5✉Email976619605@qq.com

HuiXue2,6✉Emailhxue@seu.edu.cn

WenbiaoLiang3✉Emailwenbiaoliang@hotmail.com

1College of Software EngineeringSoutheast University210096NanjingChina

Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University)Ministry of EducationChina

3Jiangsu Province Blood CenterNanjingChina

4Yangzhou Blood StationYangzhouChina

5Suzhou Blood StationSuzhouChina

6School of Computer Science and EngineeringSoutheast University210096NanjingChina

Zihan Zhang^1,2#, Xiaofan Ye^3#, Yudong Jiao^4#, Mingyuan Wang^5#, Lina Cai^3#, Zhenggang Li^4#, Huimin Yan³, Chun Zhou³, Yuchi Ma³, Yinguang Tang³, Tianbao Sun³, Renhua Diao⁴, Dexing Liu⁴, Enyong Fan⁴, Shaobo Li⁵, Shouguang Xu^4*, Weibin Yan^5*, Hui Xue^2,6*, Wenbiao Liang^3*

1 College of Software Engineering, Southeast University, Nanjing, 210096, China

2 Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China

3 Jiangsu Province Blood Center, Nanjing, China

4 Yangzhou Blood Station, Yangzhou, China

5 Suzhou Blood Station, Suzhou, China

6 School of Computer Science and Engineering, Southeast University, Nanjing, 210096, China

*Corresponding author: Shouguang Xu (xsg0457@sina.com), Weibin Yan (976619605@qq.com), Hui Xue (hxue@seu.edu.cn), Wenbiao Liang (wenbiaoliang@hotmail.com)

#Zihan Zhang, Xiaofan Ye, Yudong Jiao, Mingyuan Wang, Lina Cai, and Zhenggang Li contributed to this article equally.

Contributions to the literature:

Machine learning (ML) algorithms can identify blood donor with a high willingness.

A generalize machine learning–based blood donor recruitment framework was developed.

A prospective study utilizing ML models was conducted in three cities.

ML methods improve the efficiency of blood donor recruitment.

This recruitment strategy can be adopted by blood donation organizations worldwide.

Abstracts

Background

Recruiting blood donors is essential for public health; however, existing traditional methods are often inefficient, due to its reliance on large-scale messaging campaigns to achieve acceptable success rates. Recent studies have shown that machine learning–based recruitment strategies can significantly outperform traditional approaches.

Methods

Recruitment framework was developed and validated using donation and SMS data from Nanjing, China, then fine-tuned with 10% of data from Suzhou and Yangzhou, thereby demonstrating cross-center applicability. Optimized multi-layer perceptron (MLP) and random forest (RF) models were prospectively compared with conventional recruitment approaches across all three cities.

Results

In Nanjing, the recall reached 0·72 for MLP and 0·70 for RF. Fine-tuned models generalized well, achieving recall of 0·63 and 0·67 in Suzhou, and 0·58 and 0·63 in Yangzhou. Further optimization led to improved recall rates of 0·70 and 0·77 in Suzhou, and 0·68 and 0·64 in Yangzhou. Compared to traditional methods, recruitment success increased by 408·28% in Nanjing, 25·19% in Suzhou, and 47·31% in Yangzhou respectively, while SMS volume decreased by 18·71%, 121·52%, and 138·75%. Efficiency per SMS increased by 9·54% in Nanjing, 53·61% in Suzhou, and 38·98% in Yangzhou.

Conclusions

ML–driven recruitment frameworks demonstrated strong generalizability across cities, substantially improving recruitment efficiency while reducing costs. These findings support their adoption to enhance donor recruitment and ensure a more stable blood supply.

Keywords:

Blood donor recruitment

SMS recruitment

Machine Learning

Artificial Intelligence

Background

Blood donor recruitment currently faces significant challenges, as declining donor willingness continues to place increasing pressure on recruitment efforts. Developing precise strategies to improve both efficiency and quality has therefore become essential. With the ongoing digitalization of blood services and the emergence of large-scale databases, machine learning (ML) solutions are increasingly being adopted in transfusion medicine^1,2. ML techniques offer promising tools to address challenges across the transfusion supply chain—enhancing donor recruitment efficiency and optimizing blood resource management. Compared to traditional approaches, which randomly select a list of donors who meet the requirements for blood donation, ML models provide substantial advantages in processing large volumes of data and uncovering latent patterns. Notably, they can effectively capture complex nonlinear relationships without relying on linear assumptions, handle high-dimensional feature spaces, and reveal intricate interdependencies among variables. Furthermore, through incremental learning³, ML models can be continuously updated in real time, enabling rapid adaptation to new data. These capabilities confer greater efficiency, adaptability, and analytical power for identifying hidden trends in large-scale datasets. Although several ML-based approaches have been explored for blood and hematopoietic stem cell donor recruitment, most prior studies have focused primarily on evaluating model performance from a data modeling standpoint, without extending to real-world applications^4,5.

Our previous work⁶ developed a machine learning–based blood donor recruitment model using data from Yangzhou, where accuracy was employed as the primary evaluation metric. The results demonstrated that machine learning–based precise recruitment at a single center outperformed traditional methods. However, that study did not evaluate the generalizability of the recruitment framework. Moreover, because donor recruitment constitutes a highly imbalanced classification problem—where the number of non-donors greatly exceeds that of actual recruiters—the accuracy, which reflects the overall proportion of correctly predicted samples, may not be the most appropriate evaluation metric. In contrast, recall more effectively captures the proportion of true positive cases identified, making it a more suitable metric for this task. To better address this class imbalance, the present study adopts the principle of “recall as the primary metric, supported by other indicators” in model construction and performance evaluation. Specifically, models were initially trained on data from Nanjing and subsequently fine-tuned using 10% of the data from Yangzhou and Suzhou to assess cross-center transferability. Building on this foundation, optimized models were developed and applied to targeted SMS-based recruitment in all three cities—Nanjing, Suzhou, and Yangzhou—resulting in more effective and accurate donor recruitment outcomes.

Methods

Participants and Ethics Approval

Whole blood donation and SMS recruitment data were retrospectively collected from blood centers in Nanjing, Suzhou, and Yangzhou, with all personally identifiable information—such as names, identification numbers, addresses, and contact details—excluded to ensure privacy. The data spanned from January 1, 2017, to November 30, 2022, for Nanjing; January 1, 2018, to April 30, 2023, for Suzhou; and October 1, 1998, to August 25, 2024, for Yangzhou. The collected variables included serial number, donor ID, gender, nationality, ethnicity, date of birth, ABO and Rh blood types, occupation, education level, residence type, donation date and volume, mobilization method (individual, group, or mutual aid), and donation location.

This study was approved by the institutional ethics committee, as it did not involve the use of private donor information.

All modeling and recruitment procedures complied with relevant ethical guidelines and regulations, and all donors had previously provided informed consent.

Data Preprocessing

Each donor is linked to their corresponding SMS message via a unique identification code, and their donation status is tracked within seven days of receiving the message. Using the donor ID, several variables are derived: the total number of donations, cumulative donation volume, donation interval (i.e., the time between the SMS and the most recent prior donation), and donation frequency—calculated as (date of last donation – date of first donation) / total number of donations. Additionally, donor age is computed as (SMS sending date – date of birth) / 365, and repeat donor status is defined as having at least two recorded donations in the historical data.

Construction and Validation of Recruitment Models in Nanjing

Recruitment models, with recall as the primary evaluation metric, were initially trained and fine-tuned using data from Nanjing. Donation records from 2017 to 2021 were analyzed in conjunction with SMS recruitment data to capture historical donor behavior and characterize donation patterns—including donation count, frequency, total volume, intervals, and age—thus generating comprehensive donor profiles. SMS data from 2022 were subsequently used to assess donor willingness by linking response outcomes to these profiles. This enabled the identification of features distinguishing high- and low-willingness donors and the construction of a labeled dataset comprising effective donors (those who donated within seven days of receiving the SMS) and ineffective donors (those who did not).

The SMS recruitment data collected from Nanjing in 2022 were divided into a training set and a test set using a 70:30 split. Several machine learning models—including eXtreme Gradient Boosting (XGBoost) ⁷, Support Vector Machine (SVM) ⁸, K-Nearest Neighbors (KNN), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF) ⁹, and Multi-Layer Perceptron (MLP) ¹⁰—were trained on the training set. To address class imbalance, sampling techniques such as Synthetic Minority Oversampling Technique (SMOTE) ¹¹ and under-sampling (US) were employed, in combination with cost-sensitive learning approaches, including MFE, MSFE, and weighted mean squared error loss functions ^12,13. Each model (XGBoost, SVM, KNN, LR, RF, DT, MLP) was evaluated under different sampling strategies (raw data, US, SMOTE) using a grid search over four performance metrics: accuracy, precision, recall, and F1-score. Model selection was primarily based on the recall value, with the remaining metrics used for reference. All models were implemented in Python programming language using the XGBoost, Scikit-learn ¹⁴, and PyTorch ¹⁵ libraries.

Research on the Generalizability of Machine Learning in Precise Recruitment

To evaluate the generalizability of our approach, MLP and RF models initially trained on data from Nanjing were fine-tuned using 10% of donor records from Suzhou and Yangzhou. An incremental learning paradigm was employed to gradually align the feature distributions of the new datasets with the original Nanjing framework. This strategy enabled the assessment of the models’ ability to generalize across different regional datasets while retaining learned knowledge from the source domain. The results provide valuable insights into the robustness and cross-center applicability of machine learning–based precision recruitment strategies.

Research on Precise Recruitment in Nanjing, Suzhou and Yangzhou

Assuming that blood donation records and SMS recruitment data can be used to rank and predict donor willingness, the selected MLP and RF models were jointly employed to conduct prospective blood donor recruitment in Nanjing, Suzhou, and Yangzhou, China.

The procedure was as follows: (1) Candidates were drawn from SMS recruitment lists provided by the local blood centers, and their key characteristics were input into the trained MLP and RF models to generate donation probability scores reflecting predicted willingness; (2) Donors with scores above 0·5 in both models were selected, prioritizing those appearing in the intersection of MLP and RF predictions, followed by those identified by either model individually; (3) To ensure comparability, the number of selected donors matched that of conventional recruitment efforts. These high-willingness donors were recruited for a controlled study, and donation outcomes within seven days of SMS delivery were recorded. Based on these trained models, an incremental learning paradigm 3 was further applied to fine-tune and iteratively update the MLP model using the most recent donation data. In total, 13 prospective recruitment studies were conducted across the three cities, leveraging the updated MLP and RF models and comparing their performance against traditional recruitment methods.

Research on prediction thresholds and on repetition ratios

To further enhance model performance, a threshold adjustment experiment in which the prediction thresholds of the MLP and RF models were systematically varied was conducted, to generate different levels of overlap between the model-recommended donor list and the conventional recruitment list. For each threshold setting, the corresponding improvement in recruitment performance was evaluated and defined as the increase in successfully recruited high-willing donors. This experiment aimed to assess whether reducing the redundancy between model-based recommendations and conventional recruitment could improve the overall effectiveness of machine learning–driven donor recruitment.

Statistical Analysis

For the analysis of blood donor feature means, the total sample was divided into two groups: successful recruitment and failed recruitment. A z-test was performed to evaluate the mean differences for each feature. In the modeling phase, features with p-values less than 10⁻⁵ were selected for Nanjing and Suzhou, while a threshold of 10⁻² was applied for Yangzhou. IBM SPSS Statistics 27·0 was used to analyze recruitment data during the practical implementation of the model. For categorical variables, the chi-square test or Fisher’s exact test was employed, with a p-value of less than 0·05 considered statistically significant.

Results

Data Collected

In Nanjing, a total of 622,368 blood donation records from 2017 to 2022 were analyzed, comprising 63·61% male and 36·38% female donors. Repeat donors accounted for 50·37%, and first-time donors for 49·63%, with an SMS response rate of 3·15%. In 2022, 11,968 recruitment messages were sent, of which 11,616 (8,131 male and 3,485 female) were used for model training and testing. The average volume of the most recent donation was 368 mL; total donation volume averaged 684 mL; the mean number of donations was 1·84; the average interval since the last donation was 434 days; and the mean age of donors was 34 years. In Suzhou, 907,230 records from 2018 to 2023 were included, with 67·70% male and 32·30% female donors. Repeat donors comprised 63·36% of the sample. Between 2022 and 2023, 47,922 recruitment messages were sent, with 32,669 (19,567 male and 13,102 female) used for model development. The average last donation volume was 306 mL, total donation volume was 726·52 mL, the mean number of donations was 2·32, the average interval was 781·5 days, and the mean donor age was 36 years. In Yangzhou, 456,955 records spanning 1998 to 2024 were analyzed, with 57·75% male and 42·24% female donors. Between 2022 and 2023, 20,459 recruitment messages were sent, all of which were used for modeling (11,581 male and 8,878 female). The average last donation volume was 370 mL, total donation volume was 1,052 mL, the mean number of donations was 2·6, the average interval since the last donation was 560 days, and the mean donor age was 40 years. The recruitment and model evaluation procedures are illustrated in Fig. 1.

Fig. 1

(A) Overview of the modeling process of the blood donor accurate recruitment model. The flow chart takes Nanjing as an example, and specific data for the other two cities can be seen in the table below according to the footnote. (B) Overview of the evaluation process of the blood donor accurate recruitment model.

Model Development Results in Nanjing

Individuals who responded and did not respond to SMS recruitment in 2022 were treated as two distinct populations. A two-sample mean test (z-test) was performed to compare the distributions of donor features, and the resulting p-values were ranked from smallest to largest, as shown in Table S1 of the Supplementary Material. Features identified through this analysis were considered candidate variables and selected based on recruitment objectives and model performance. For the final recruitment model in Nanjing, seven features were used: repeat donor status, total donation volume, number of donations, donation interval, donation frequency, age, and whether the donor had AB blood type.

The performance of various combinations of models and sampling methods on the test dataset is presented in Fig. 2, with detailed results provided in Table S2 of the Supplementary Material. Based on these results, the integrated MLP and RF models were selected for practical deployment, following the principle of prioritizing recall as the primary evaluation metric, with other metrics serving as secondary references to determine the final recruitment outcomes.

Fig. 2

The value of recall of the best models for the three cities on the corresponding testing set. Abbreviations: LR: Logistic Regression; KNN: K-Nearest Neighbors; XGBoost: eXtreme Gradient Boosting; DT: Decision Tree; SVM: Support Vector Machine; RF: Random Forest; MLP: Multi-Layer Perceptron; SMOTE: Synthetic Minority Oversampling TEchnique; US: Under-Sampling; MFE: Mean False Error; MSFE: Mean Squared False Error; MSE: Mean Squared Error.

Results of the Generalizability Study

The performance of the Nanjing model and its fine-tuned counterparts in Suzhou and Yangzhou—detailed in Table S3 of the Supplementary Material—provides empirical support for the strong generalizability of machine learning models in precision donor recruitment. Although both cities adopted Nanjing’s standardized 7-feature framework—differing from Suzhou’s original 8 features and Yangzhou’s 11 features—the models still achieved clinically acceptable recall rates (≥ 0.58), closely approaching those of their locally optimized baselines. Specifically, the RF model achieved a recall of 0.67 in Suzhou (compared to 0.77 with local optimization) and 0.63 in Yangzhou (versus 0.64 locally), while the MLP model attained recall scores of 0.63 and 0.58, respectively. Notably, the observed improvements in SMS recruitment efficiency following minimal fine-tuning using only 10% of local data further confirm that cross-center model transfer requires limited adaptation to achieve near-native performance.

Performance of the Models on Precise Recruitment in Nanjing, Suzhou and Yangzhou

The recruitment classification methods used for Nanjing, Suzhou, and Yangzhou are detailed in Tables S1, S4, and S5 of the Supplementary Material. Candidate features were initially selected based on recruitment objectives and model training performance. As previously described, the final recruitment model for Nanjing was constructed using seven features. For Suzhou, the final model incorporated eight features: last donation volume, repeat donor status, total donation volume, number of donations, donation frequency, age, donation interval, and whether the individual was classified as a worker. For Yangzhou, the final model included eleven features: number of donations, repeat donor status, last donation volume, donation interval, age, total donation volume, blood type A, blood type O, occupation (student or worker), and educational level (middle school and below).

The performance of various combinations of models and sampling methods across the three blood centers is illustrated in Fig. 2, with detailed results provided in Tables S2, S6, S7 of the Supplementary Material. Based on these results, we selected an integrated approach combining MLP and RF models for practical deployment, following the principle of prioritizing recall as the primary evaluation metric, with other metrics used as supplementary references to determine the final recruitment outcomes.

On average, the recruitment success rate for high-willing donors was 408·28%, 25·19%, and 47·31% higher than that of conventional methods in Nanjing (χ² = 35·471, p = 0·000), Suzhou (χ² = 9·384, p = 0·009), and Yangzhou (χ² = 96·701, p = 0·000), respectively, as shown in Table 1. When SMS messages were sent exclusively to donors recommended by the model, recruitment improvements remained statistically significant in Suzhou (χ² = 34·499, p = 0·000) and Yangzhou (χ² = 136·110, p = 0·000), but not in Nanjing (χ² = 0·451, p = 0·536). Moreover, SMS volume was reduced by 18·71%, 121·52%, and 138·75% in Nanjing, Suzhou, and Yangzhou, respectively, while recruitment efficiency per SMS increased by 9·54%, 53·61%, and 38·98% compared with conventional methods (Table 2).

Relation between the prediction thresholds and the repetition ratios

The experiments involved systematically adjusting the prediction thresholds of the MLP and RF models to produce varying repetition ratios, with corresponding changes in model performance evaluated, as shown in Fig. 3. The results reveal a clear inverse relationship between repetition ratio and performance improvement across Nanjing, Suzhou, and Yangzhou. Specifically, as the overlap between the model-recommended donor list and the conventional recruitment list decreases, recruitment performance gains become more pronounced. These findings highlight the potential of reducing redundancy in donor selection to enhance the effectiveness of machine learning–based recruitment strategies.

Fig. 3

Schematic diagram of the experiment on the repetition ratio of the high willing donors recommended by the models and the conventional method in the three cities. Each point in the figure represents the performance improvement of the model under a certain repetition ratio. The point marked as a star represents the performance improvement and repetition ratio in our practical recruitment process.

Discussion

Blood donor recruitment in China currently faces several critical challenges: (1) a nationwide decline in donations—voluntary blood donations (15·821 million) and total blood volume collected (26·924 million units) decreased by 6·9% compared to 2023; (2) few institutions seeing growth in blood collection—only 4 out of 395 prefecture-level or higher blood collection institutions reported positive growth in 2024; and (3) an aging donor population—with a noticeable decline in younger donors, as the proportion of donors under age 35 dropped from 65% in 2019 to 52% in 2024· These developments raise serious concerns about the sustainability and security of the national blood supply, underscoring the urgent need for targeted recruitment strategies, data-driven interventions, and improved engagement of younger populations to ensure a safe, stable, and clinically sufficient blood supply.

In recent years, data-driven statistical and artificial intelligence (AI) methods have become increasingly important for predicting donor willingness and forecasting future blood supply needs. Hanapi WHWH et al. employed logistic regression to identify key factors influencing donation intention, including donation interval, frequency, and total volume ¹⁶. Salazar-Concha C et al. applied decision trees, achieving 84·17% accuracy in predicting donor re-engagement ¹⁷. Marade C et al. utilized SVM, decision trees (DT), and KNN to forecast monthly blood donations, aiding inventory management ¹⁸. Ding X et al. proposed a hybrid SARIMAX-LSTM neural network for accurate donation estimation ¹⁹, while El-Rashidy N et al. and Selvaraj P et al. reported that random forest models yielded the best performance in predicting donation and re-donation behaviors ^20,21. Other studies have leveraged SVM for donor classification and donation outcome prediction ^22,23, and Zulfikar W.B. et al. found decision trees to outperform naive Bayes classifiers ²⁴. Cloutier M. focused on predicting return rates among young donors ²⁵. Several reviews—including those by Kiarie et al. ²⁶, Li Y et al. ²⁷, Buturovic L et al. ²⁸, Sivasankaran A et al. ²⁹, and Gupta V et al. ³⁰—have highlighted the growing application of machine learning in donor retention, recruitment optimization, post-transplant outcomes, and hematopoietic stem cell transplantation (HSCT), while also acknowledging current limitations and challenges. Collectively, these studies underscore the value of AI and big data analytics in enhancing blood donor recruitment and ensuring a safe, sufficient, and sustainable clinical blood supply.

This study demonstrates the generalizability and practical effectiveness of our machine learning–based recruitment framework by adopting recall as the primary evaluation metric—contrasting with previous studies that largely evaluated performance from a data modeling perspective. In earlier work, we developed seven recruitment models using XGBoost and SVM on a dataset comprising 697,174 blood donation records (October 1998–2019) and 95,476 SMS recruitment records (April 2016–July 2019) from Yangzhou, incorporating 13 donor features. This laid the groundwork for the Nanjing recruitment model. In the present study, we extend this foundation into a prospective setting, employing multi-layer perceptron (MLP) and random forest (RF) models, with recall as the central metric. While MLP, as a deep learning algorithm, offers strong representational capacity, its performance is sensitive to data scale and resource demands. To address this, we complemented MLP with RF, a more lightweight and robust traditional machine learning method. Given the limited size of available blood donation data, we adopted an incremental learning paradigm: following each recruitment round, newly collected data were used to fine-tune the MLP model, improving its performance before deployment in subsequent rounds. As illustrated by Table S8 (Supplementary Material), we began with an initial dataset D₀. After performing the recruitments in some universities in Nanjing, China in our previous work, we obtained some most up-to-date blood donation data, denoted as D1· Therefore, we can get an augmented dataset D0∪D1 (denoted as D01). We randomly selected a training set and a test set in a 7:3 ratio on D01 and selected a new model trained on such a training set with a higher value recall than the old model on such a testing set. This new model was applied in the 1st recruitment in this work. After performing the 1st recruitment, we can obtain some new data D2 and use D01∪D2 (denoted as D11) to fine-tune the MLP via similar procedures as stated above. As shown in Tables S9 and S10, the updated models consistently outperformed their predecessors in terms of identifying high-willing donors and improving recruitment efficiency (measured by success rate per SMS). Some exceptions were observed during simultaneously conducted recruitments (e.g., 1st and 2nd, 4th and 5th, 6th and 7th rounds), where a single model could not optimize performance across both settings. Nonetheless, the overall trend confirmed the effectiveness of the incremental learning paradigm. Detailed fine-tuning results are provided in Tables S11–S14. Furthermore, models initially trained on Nanjing data were successfully fine-tuned with data from Suzhou and Yangzhou, validating the cross-center generalizability of our approach. Across 13 recruitment rounds, the machine learning framework improved the recruitment success rate for high-willing donors by an average of 151·57% compared with conventional methods, reduced SMS volume by 96·52%, and increased recruitment efficiency per SMS by 34·42%. These results highlight the framework’s ability to identify high-potential donors, reduce unnecessary messaging, and lower overall recruitment costs. Analysis of the Venn diagrams in Figures S1–S8 revealed minimal overlap between donors identified by both MLP and RF and those recruited via conventional methods—likely due to high redundancy in traditional approaches. By adjusting prediction thresholds to vary repetition ratios, we observed that lower overlap corresponded to greater performance improvements, as shown in Fig. 3. Model hyperparameters were as follows: the MLP consisted of three layers with 6, 8, and 2 neurons for the three cities, ReLU activation after the second layer, and a Softmax layer for classification. The RF model used the Gini impurity criterion, a maximum depth of 9, a minimum of 5 samples per leaf, and balanced class weights (1:1). As shown in Tables S13 and S14, MLP consistently achieved higher overall recruitment success than RF—both for high-willing donors and the general donor population—aligning with its superior dataset-level performance.

Blood donor recruitment presents a highly imbalanced classification problem, with substantially more non-responders than responders. In such scenarios, simultaneously achieving high accuracy and high recall is challenging, and the F1 score may not always offer meaningful insight. Because the primary goal is to identify as many willing donors as possible, recall is prioritized as the key evaluation metric. This section explains the rationale for adopting the principle of “recall as the primary metric, with other metrics serving as auxiliary criteria” in model selection. We begin by reviewing standard evaluation metrics in binary classification. A correctly predicted positive sample is termed a True Positive (TP), while a negative sample incorrectly predicted as positive is a False Positive (FP). A positive sample incorrectly predicted as negative is a False Negative (FN), and a correctly predicted negative sample is a True Negative (TN). Based on these definitions: Accuracy = (TP + TN) / (TP + TN + FP + FN) Precision = TP / (TP + FP) Recall = TP / (TP + FN) F1 score = 2 × (Precision × Recall) / (Precision + Recall) Accuracy reflects the overall proportion of correct predictions, precision measures the correctness among predicted positives, and recall quantifies the proportion of actual positives that are successfully identified. The F1 score serves as a harmonic mean of precision and recall, balancing the two. However, these metrics may lead to misleading interpretations in imbalanced settings. Consider a population of 100 individuals, where 20 are true positives and 80 are negatives. Suppose Model A predicts all 100 as positive—yielding a recall of 1·0 but an accuracy of only 0·20· In contrast, Model B correctly identifies 10 of the 20 positive cases and misclassifies only 5 of the 80 negatives as positives—resulting in a recall of 0·67 and an accuracy of 0·85· Although Model A achieves perfect recall, it produces many unnecessary messages; Model B, by contrast, strikes a more practical balance by recruiting half of the willing donors while minimizing redundant outreach. Therefore, while recall remains our primary metric—given the high stakes of missing willing donors—it must be interpreted alongside precision and accuracy to ensure practical and efficient recruitment.

In our previous work 6, model selection was based on accuracy. However, in the current datasets from Nanjing, Suzhou, and Yangzhou, the proportion of positive cases is extremely low—only 4·71%, 0·49%, and 3·67%, respectively. In such imbalanced settings, a model that predicts all samples as negative would still achieve high accuracy (approximately 95%, 99%, and 96% for the three cities), but would be of little practical value for identifying potential donors. Consequently, we adopt the principle of prioritizing recall as the primary evaluation metric, while treating accuracy, precision, and other metrics as auxiliary. In practical implementation, models with accuracy below 0·5 are discarded, and the optimal model is selected from the remaining candidates based primarily on recall. Additionally, recruitment contexts differ notably across cities. In Nanjing, recruitment efforts primarily target university students, resulting in a relatively small and concentrated donor pool. In contrast, recruitment in Suzhou and Yangzhou is community-based, encompassing a broader and more diverse population. These differences in target populations and recruitment volumes are likely to influence the extent of performance improvements achieved by machine learning models across regions.

In future work, the performance of machine learning models for donor recruitment can be further improved through several avenues. For example, model parameters can be optimized using additional evaluation metrics—such as accuracy and F1 score—to identify models that perform well across multiple criteria, beyond recall alone. Moreover, building on the insights from the threshold adjustment experiment, heuristic algorithms could be employed to dynamically determine the optimal prediction threshold for each recruitment round. This approach would aim to minimize overlap with conventional recommendation lists, thereby improving the overall effectiveness and efficiency of the recruitment strategy.

Limitations

This study has several limitations. First, the proposed models are designed to recruit individuals with existing donation records and are not capable of identifying potential first-time donors. Second, certain contextual factors—such as donation location and the involvement of specific collection staff—may influence donor willingness, but could not be incorporated due to limited data availability. Finally, while we propose a unified recruitment framework to standardize the overall process, separate models were trained for each city to account for regional differences, rather than developing a single model applicable across all settings.

Conclusions

This study underscores and validates the effectiveness and generalizability of machine learning in enabling precise blood donor recruitment. The results demonstrate that machine learning–based recruitment models significantly outperform traditional methods by increasing recruitment success rates while substantially reducing the number of SMS messages required. These findings suggest that machine learning can greatly enhance the efficiency of SMS-based recruitment by minimizing outreach volume and maximizing donor response, thereby contributing to the development of more targeted and cost-effective donation strategies. Furthermore, the proposed models exhibit strong reproducibility and replicability, highlighting their potential for broad deployment across diverse settings. This data-driven approach represents a meaningful advancement in the modernization of blood services in China and holds important implications for improving donor engagement, ensuring the safety and stability of the blood supply, and promoting the long-term sustainability of transfusion systems.

List of abbreviations

SMS

Short Message Service

Logistic Regression

KNN

K-Nearest Neighbors

XGBoost

eXtreme Gradient Boosting

Decision Tree

SVM

Support Vector Machine

Random Forest

MLP

Multi-Layer Perceptron

SMOTE

Synthetic Minority Oversampling TEchnique

Under-Sampling

MFE

Mean False Error

MSFE

Mean Squared False Error

MSE

Mean Squared Error

Declarations

Ethics approval and consent to participate

This study was approved by the ethics committee because our research did not involve the privacy of blood donors.

All modeling and recruitment procedures were carried out following relevant guidelines and regulations, with all donors having previously signed consent forms.

Consent for publication

Not applicable

Availability of data and materials

The source data that support the findings of this study is available from the corresponding authors upon reasonable request.

Competing interests

The authors declare that they have no competing interests.

Funding

The Social Development Science and Technology Project of Jiangsu Province (No. BE2022811), the National Natural Science Foundation of China (Nos. 62476056 and 62306070), and the Big Data Computing Center of Southeast University.

Authors' contributions

SGX, WBY, HX and WBL conceptualized and designed the study. HX and WBL obtained funding for the study and supervised its implementation. EYF, YGT, TBS and HMY provided administrative, technical and material support. CZ, YCM, SBL, DXL and RHD acquired the data. XFY, YDJ, MYW, LNC and ZGL were involved in data analysis and interpretation. ZHZ drafted the manuscript. ZHZ, HX and WBL provided statistical analysis and critical review of the manuscript for important intellectual content. All authors reviewed and approved the final version of the manuscript and were responsible for the decision to submit for publication. All authors had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

In addition, we extend our gratitude to ZYZ and JYT for their assistance in model design and training, and to Dr. Chenyang Li from Nanjing Medical University for his valuable contribution to the discussion of this article.

Acknowledgments

This work was supported by the Social Development Science and Technology Project of Jiangsu Province (No. BE2022811) and the National Natural Science Foundation of China (Nos. 62476056 and 62306070). Furthermore, the work was also supported by the Big Data Computing Center of Southeast University. Additionally, we sincerely thank the assistance of Siyuan Fan (Southeast University) in revising this paper.

Electronic Supplementary Material

Below is the link to the electronic supplementary material

Supplementary Material 1

Supplementary Material 2

References

Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015;349:255–60.

Gauss T, James A, Colas C, et al. Comparison of machine learning and human prediction to identify trauma patients in need of hemorrhage control resuscitation (ShockMatrix study): a prospective observational study. Lancet Reg Health Eur. 2025;55:101340.

Liu H, Zhou Y, Liu B, et al. Incremental learning with neural networks for computer vision: a survey. Artif Intell Rev. 2023;56(5):4557–89.

Ma Y. Precise blood donor recruitment method based on deep learning. Chin J Comput Application. 2024;40(9):92–5.

Pabreja K, Bhasin A. A predictive analytics framework for blood donor classifcation. Int J Big Data Analytics Healthc (IJBDAH). 2021;6(2):1–14.

Wu H, Li Z, Sun X, et al. Predicting willingness to donate blood based on machine learning: two blood donor recruitments during COVID-19 outbreaks. Sci Rep. 2020;12(1):19165–.

Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2016; 785–794&#183.

Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.

Breiman L. Random forests. Mach Learn. 2001;45:5–32.

10.

Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65(6):386–409.

11.

Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. J Artif Int Res. 2002;16(1):321–57.

12.

Wang S, Liu W, Wu J et al. Training deep neural networks on imbalanced data sets. Proceedings of the International Joint Conference on Neural Networks (IJCNN) 2016; 4368–4374&#183.

13.

Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. J Big Data. 2019;6(1):27–.

14.

Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res (JMLR). 2011;12:2825–30.

15.

Paszke A, Gross S, Massa F et al. Pytorch: An imperative style, high-performance deep learning library. Proceedings of the 33rd Advances in Neural Information Processing Systems (NeurIPS). 2019; 8024–8035&#183.

16.

Hanapi WHWM, Sarkan H, Sjarif NNA, et al. A prediction model for blood donation using multiple logistic regression. Open Int J Inf (OIJI). 2019;7(Special Issue 2):147–57.

17.

Salazar-Concha C, Ramírez-Correa P. Predicting the intention to donate blood among blood donors using a decision tree algorithm. Symmetry. 2021;13(8):1460–.

18.

Marade C, Pradeep A, Mohanty D, et al. Forecasting blood donor response using predictive modeling approach. Int J Comput Sci Mob Comput. 2019;8(4):73–7.

19.

Ding X, Zhang X, Li X, et al. A hybrid neural network based model for blood donation forecasting. J Biomed Inform. 2023;146:104488–.

20.

El-rashidy N, El-Ghamry A, ElSayed NE. Machine learning for blood donors classification model using ensemble learning. Proceedings of the World Conference on Internet of Things: Applications & Future 2023; 173–181&#183.

21.

Selvaraj P, Sarin A, Seraphim BI. Blood donation prediction system using machine learning techniques. Proceedings of the International Conference on Computer Communication and Informatics (ICCCI) 2022; 1–4&#183.

22.

Selvaraj P, Sarin A, Seraphim BI. Forecasting system for donation of blood using SVM model. Int J Res Appl Sci Eng Technol 2022; 10(V).

23.

Handojo A, Purbowo AN, Karaeng GB et al. Predicting potential blood donors who can attend blood donation activities using a support vector machine. Proceedings of the 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) 2022; 351–355&#183.

24.

Zulfikar WB, Gerhana YA, Rahmania AF. An approach to classify eligibility blood donors using decision tree and naive bayes classifier. Proceedings of the 2018 6th International Conference on Cyber and IT Service Management (CITSM) 2018; 1–5&#183.

25.

Cloutier M, Grégoire Y, Choucha K, et al. Prediction of donation return rate in young donors using machine-learning models. ISBT Sci Ser. 2021;16(1):119–26.

26.

Kiarie N, Kirongo AC, Mwadulo M. A systematic review of predictive blood donor retention models. Afr J Sci Technol Social Sci. 2024;2(2):35–41.

27.

Li Y, Masiliune A, Winstone D, et al. Predicting the availability of hematopoietic stem cell donors using machine learning. Biol Blood Marrow Transpl. 2020;26(8):1406–13.

28.

Buturovic L, Shelton J, Spellman SR, et al. Evaluation of a machine learning-based prognostic model for unrelated hematopoietic cell transplantation donor selection. Biol Blood Marrow Transpl. 2018;24(6):1299–306.

29.

Sivasankaran A, Williams E, Albrecht M, et al. Machine learning approach to predicting stem cell donor availability. Biol Blood Marrow Transpl. 2018;24(12):2425–32.

30.

Gupta V, Braun TM, Chowdhury M, et al. A systematic review of machine learning techniques in hematopoietic stem cell transplantation (HSCT). Sensors. 2020;20(21):6100–.

Table 1. Performance improvement of the models compared with conventional methods with regard to the recommended high willing donors.

Table 2. Efficiency increase rate of SMS and that of recruitment success rate per text message between models and conventional methods.

Index	Total blood donors	Numbers of successful recruitment (conventional methods)	Percentages of successful recruitment (conventional methods, %)	Numbers of successful recruitment (models)	Percentages of successful recruitment (models, %)	Number of successful recruitment (intersection of both methods)	Performance improvement of the model compared to conventional methods (%)^g
1^a	136	23	16.91	25	18.38	20	8.69
2^b	31	1	3.23	5	16.13	1	399.38
3^c	148	1	0.68	13	8.78	1	1200.00
4^d	430	68	15.81	85	19.77	59	25.05
5^e	36	13	36.11	17	47.22	12	30.77
6	23	10	43.48	10	43.48	5	0.00
7	19	8	42.11	12	63.16	5	49.99
8	34	15	44.12	18	52.94	14	19.99
9^f	54	18	33.33	32	59.26	6	77.80
10	8	2	25.00	3	37.50	2	50.00
11	388	196	50.52	286	73.71	171	45.90
12	62	31	50.00	39	62.90	21	25.80
13	182	81	44.51	111	60.99	69	37.03

^a This recruitment was conducted in Nanjing University of Finance & Economics (NUFE), Xianlin Campus, in 2024. ^b This recruitment was conducted in Nanjing Polytechnic Institute in 2024, at the same time with recruitment 1. ^c This recruitment was conducted in Sanjiang University in 2024. ^d This recruitment was conducted in Nanjing Normal University, Xianlin Campus, in 2024. ^e Recruitment 5–8 was conducted in Suzhou, in 2023. ^f Recruitment 9–13 was conducted in Yangzhou, in 2023 and 2024. ^g Performance improvement of the model compared to conventional methods = (Percentages of successful recruitment (models, %) / Percentages of successful recruitment (conventional methods, %) − 1)×100%

Index	Numbers of successful recruitment (conventional methods)	Number of SMS messages planning to send (conventional methods)	Number of SMS messages actually send (conventional methods)	Recruitment success rate per text message (conventional methods, %)	Numbers of successful recruitment (models)	Number of SMS messages planning to send (models)	Number of SMS messages actually send (models)	Recruitment success rate per text message (models, %)	Efficiency increase rate of SMS (%)^g	Efficiency increase rate of recruitment success rate per text message (%)^h
1^a	53	837	806	6.58	49	735	706	6.94	13.88	5.55
2^b	7	813	732	0.96	7	709	640	1.09	14.67	13.54
3^c	60	1226	1163	5.16	60	1037	987	6.08	18.23	17.83
4^d	217	1159	1114	19.48	171	905	867	19.72	28.07	1.23
5^e	36	1500	1500	2.40	30	854	854	3.51	75.64	46.25
6	23	1500	1500	1.53	13	534	534	2.43	180.90	58.82
7	19	1500	1500	1.27	12	545	545	2.20	175.23	73.23
8	34	1500	1500	2.27	30	972	972	3.09	54.32	36.12
9^f	54	3138	3138	1.72	32	1482	1482	2.16	111.74	25.58
10	8	1797	1797	0.45	3	480	480	0.63	274.38	40.00
11	388	15173	15173	2.56	307	8668	8668	3.54	75.05	38.28
12	62	4483	4483	1.38	43	2202	2202	1.95	103.59	41.30
13	182	4999	4999	3.64	119	2183	2183	5.45	129.00	49.73

^a This recruitment was conducted in Nanjing University of Finance & Economics (NUFE), Xianlin Campus, in 2024. ^b This recruitment was conducted at Nanjing Polytechnic Institute in 2024, at the same time as recruitment 1. ^c This recruitment was conducted at Sanjiang University in 2024. ^d This recruitment was conducted in Nanjing Normal University, Xianlin Campus, in 2024. ^e Recruitment 5–8 was conducted in Suzhou, in 2023. ^f Recruitment 9–13 was conducted in Yangzhou, in 2023 and 2024. ^g Efficiency increase rate of SMS (%) = (Number of SMS messages planning to send (conventional methods) / Number of SMS messages planning to send (models) − 1)×100%. ^h Efficiency increase rate of recruitment success rate per text message (%) = (Recruitment success rate per text message (models, %) / Recruitment success rate per text message (conventional methods, %) − 1)×100%.