Functional Subpopulations of Hematopoietic Stem Cells and Multipotent Progenitors Classification Using Transfer Learning

VahidKhalkhali1Emailvahid.khalkhali@rutgers.edu

SayedMehediAzim1Emailsayedmehedi.azim@rutgers.edu

JianzhongHan2Emailjhan@coriell.org

JianHuang2Emailjhuang@coriell.org

ImanDehzangi1,3,4✉Emaili.dehzangi@rutgers.edu

Center for Computational and Integrative BiologyRutgers UniversityCamdenNJUSA

2Coriell Institute for Medical ResearchCamdenNJUSA

3Department of Computer ScienceRutgers UniversityCamdenNJUSA

4Rutgers Cancer InstituteRutgers UniversityNew BrunswickNJUSA

Vahid Khalkhali¹, Sayed Mehedi Azim¹, Jianzhong Han², Jian Huang², and Iman Dehzangi^1,3,4,*

¹Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA (vahid.khalkhali@rutgers.edu & sayedmehedi.azim@rutgers.edu)

²Coriell Institute for Medical Research, Camden, NJ, USA ( jhan@coriell.org &

jhuang@coriell.org)

³Department of Computer Science, Rutgers University, Camden, NJ, USA

⁴Rutgers Cancer Institute, Rutgers University, New Brunswick, NJ, USA

^*Corresponding Authors (email: i.dehzangi@rutgers.edu)

Abstract

Background

The functional classification of hematopoietic stem cells (HSCs) and multipotent progenitors (MPPs) is central to understanding hematopoiesis and developing regenerative therapies. Traditional fluorescence-activated cell sorting (FACS) has been the gold standard for distinguishing these subpopulations. However, it still remains labor-intensive, technically demanding, and limited in scalability. Automated, image-based approaches offer a promising alternative, yet their application to hematopoietic stem and progenitor cells has been constrained by the lack of large, annotated datasets and standardized analytic frameworks.

Methods

We present the largest publicly available microscopy dataset of hematopoietic stem and progenitor cells to date, encompassing three biologically distinct subpopulations: long-term HSCs (LT-HSCs), short-term HSCs (ST-HSCs), and MPPs. To analyze this resource, we developed a deep learning framework based on transfer learning using DenseNet architectures. A novel preprocessing strategy transformed multi-slice grayscale microscopy data into RGB composites, facilitating compatibility with pre-trained convolutional neural networks (CNNs). Two complementary pipelines were designed: (i) an image-level pipeline to classify entire microscopy fields and (ii) a cell-level pipeline incorporating Laplacian of Gaussian (LoG)–based blob detection for single-cell segmentation. Each model was trained and validated using stratified splits, with extensive data augmentation to enhance generalization.

Results

Among all architectures evaluated, DenseNet169 achieved the highest performance, attaining an area under the receiver operating characteristic curve (AUROC) of 99.5% and a balanced accuracy of 89.3% in the cell-level classification task. The model effectively distinguished LT-HSC, ST-HSC, and MPP populations, substantially outperforming previously reported single-channel or non-segmented approaches. Grad-CAM visualization confirmed that the model’s discriminative focus aligned with biologically relevant cellular regions, supporting interpretability and reproducibility. Comparative analyses demonstrated that integrating multi-channel image representation and optimized segmentation markedly enhanced accuracy and robustness.

Conclusion

This work introduces a reproducible and open-source deep learning framework for hematopoietic stem and progenitor cell classification. By integrating multi-channel imaging, transfer learning, and explainable AI, the proposed approach establishes a scalable, label-free alternative to conventional FACS, paving the way for high-throughput and automated phenotyping in hematopoietic and regenerative medicine research.

Keywords:

Hematopoietic Stem Cells (HSCs)

Multipotent Progenitors (MPPs)

Deep Learning

Transfer Learning

Cell Classification

Multi-Channel Microscopy

Grad-CAM Explainability

Background

Hematopoiesis is a lifelong process of creating blood cells and other bone marrow cells in the body. Hematopoietic stem cells (HSCs) and multipotent progenitors (MPPs) play a crucial role in maintaining this process, and they are uniquely defined by their capacity for self-renewal while contributing to the pool of differentiating cells. HSCs represent a rare population of human and mouse bone marrow, with only about 1 in 100,000 cells being transplantable HSCs (1). As HSCs differentiate, they generate a series of progenitor cells that progressively commit to specific lineages, ultimately maturing into various blood cell types (2, 3).

Extensive research has characterized the phenotypic and functional diversity within the HSC/MPP populations, uncovering multiple subpopulations with distinct capacities for proliferation, self-renewal, and differentiation (4, 5). These subpopulations can be divided based on their self-renewal potential into long-term (LT)-HSCs, short-term (ST)-HSCs, and MPPs. In the experimental setup, the separation of these subpopulations is performed using fluorescence-activated cell sorting (FACS) (6), relying on combinations of surface markers. However, FACS is labor-intensive, technically demanding, and relatively slow, highlighting the critical need for automated, efficient, and scalable sorting approaches to streamline this task.

Recent advances in fluorescence microscopy and digital cytometry have improved the throughput and resolution of image-based phenotyping of FACS-sorted cells. High-content imaging platforms can now capture multiple fluorescence channels per cell, enabling rich morphological and molecular characterization (7, 8). These multi-channel microscopy datasets are particularly amenable to deep learning methods, which can integrate spatial and intensity-based features to distinguish subtle phenotypic differences across rare hematopoietic subpopulations such as LT-HSCs, ST-HSCs, and MPPs.

Deep learning (DL), a subset of machine learning (ML), has achieved tremendous success in various computer vision tasks (9–12), offering efficient execution with short processing times. Its efficacy in classification tasks, particularly for cells and cell types, stems from its ability to learn complex patterns and features from large datasets, resulting in enhanced accuracy in identifying diverse cell types and their characteristics (13, 14). Among these models, transfer learning has emerged as a powerful approach in image classification, particularly in cases where labeled data is limited (15). By leveraging pre-trained models, transfer learning enables the efficient adaptation of learned features from large-scale datasets to new, domain-specific tasks. This significantly accelerates the training process and enhances model accuracy, even with smaller datasets (16). The ability to transfer knowledge from one domain to another reduces computational costs and also improves generalization, making it a highly effective strategy for complex image classification tasks (17).

Building on the potential of DL, Wang et al. developed a transfer learning-based system to distinguish murine LT-HSCs, ST-HSCs, and MPPs based on their morphological features (18). However, their reliance solely on brightfield images limited overall classification performance. In a different study, Buggenthin et al. focused on time-lapse experiments and developed a DL system for predicting lineage commitment in primary murine HSCs and progenitor cells differentiating into either the granulocytic/monocytic (GM) or megakaryocytic/erythroid (MegE) lineage (19). Unlike Wang et al.’s static subpopulation classification, Buggenthin et al. aimed at dynamic prediction of cell fate decisions well before conventional molecular markers emerged. Although both studies demonstrate feasibility for DL-based next-generation cell sorting systems, further improvements in accuracy and efficiency are required for practical and scalable applications. In addition, the absence of publicly available datasets presented a significant barrier for subsequent research validation and comparative benchmarking.

Complementing these classification approaches, recent advances in explainable deep learning have provided valuable tools to enhance the biological interpretability of model outputs. Techniques such as Grad-CAM, Grad-CAM++, and Shap-CAM offer visual attribution maps that highlight image regions most influential in classification decisions, helping to ensure that models attend to biologically relevant features (20–22). In single-cell morphological profiling tasks, these attribution tools have been applied to evaluate model trustworthiness and guide hypothesis generation (23). In the context of HSC/MPP phenotyping, integrating such explainability into classification frameworks is crucial for validating model focus and potentially uncovering novel morpho-functional features relevant to stem cell biology.

To address these limitations, our study systematically evaluates state-of-the-art transfer learning architectures to develop an optimized pipeline for classifying functional subpopulations of HSCs and MPPs. In this research, specifically, we explored transfer learning approaches based on ResNet (ResNet18, ResNet34, ResNet50, ResNet101, ResNet152) and DenseNet (DenseNet121, DenseNet161, DenseNet169, DenseNet201) architectures (24, 25). Additionally, we study the performance of our proposed model to better understand the decision-making process of the trained classifiers. We also integrate the Grad-CAM explainability technique to elucidate the decision-making process of the trained classifiers. Furthermore, we provide an open-source pipeline of our model to facilitate benchmarking and adoption of these methodologies, available at: https://github.com/MLBC-lab/HSC_vs_MPP.

Materials and Methods

In this study, we focus on three biologically distinct classes of hematopoietic stem and progenitor cells, namely, short-term hematopoietic stem cells (ST-HSC), long-term hematopoietic stem cells (LT-HSC), and multipotent progenitors (MPP). For the data collection, we followed the protocol described by Wang et al. [23]. In the following, we discuss dataset preparation, image-level, and cell-level classification tasks.

Dataset Preparation

Cell samples from each class were imaged using advanced microscopy techniques that capture multiple two-dimensional (2D) grayscale images, referred to as “slices”, per sample. These images correspond to different optical configurations or focal planes, and each slice reveals specific morphological or structural features of the cells. Depending on the imaging setup, each sample contains between two and four such grayscale slices.

To better understand the content and quality of each slice, we conducted a manual inspection across a representative subset of the dataset. We observed that when four slices were available, the fourth slice often exhibited extremely low contrast, appearing almost entirely black, and contributed negligible visual or structural information. This was likely due to diminished signal capture at certain focal depths or illumination conditions. Therefore, we determined that retaining this fourth slice would not enhance downstream analysis and could instead introduce noise or redundancy.

Given the aim to use image-based machine learning models, which typically require fixed-size and fixed-format input (24, 25), we opted to consolidate the multiple grayscale slices into a single RGB image per sample. This transformation allows for compatibility with pre-trained computer vision models (16) and facilitates standardized input across the dataset.

For samples with two grayscale slices, we assigned the first and second slices to the red and green channels of the output RGB image, respectively. The blue channel was then zero-padded, effectively setting its values to zero across the entire image. For samples with three usable slices, each slice was directly mapped to one of the RGB channels in sequence. In the case of four-slice samples, we discarded the fourth slice (i.e., the one with minimal contrast) and used the first three slices for the RGB mapping.

As a result of this preprocessing pipeline, each cell sample is ultimately represented as a single 2048×2048 RGB image, in which each channel encodes complementary visual information derived from the multi-slice grayscale input. This representation maintains the most informative aspects of the original multi-slice data while ensuring uniformity in input dimensions and format. An example of the resulting RGB image construction is provided in Fig. 1.

The majority of the acquired microscopy images were initially stored in the proprietary Olympus Image File Format (OIR) (26), which is commonly used by Olympus imaging systems for high-resolution, multi-channel data storage. While the OIR format retains detailed metadata and imaging parameters, it is not natively compatible with most image processing and deep learning frameworks. Therefore, prior to any downstream analysis, we performed a format conversion to facilitate accessibility and computational efficiency.

Each OIR file was parsed to extract the relevant grayscale image slices corresponding to individual optical channels or focal planes. Following the methodology described earlier, these grayscale slices were combined into a single RGB image representation for each sample.

The resulting RGB images were then exported and stored in the Portable Network Graphics (PNG) format, which preserves image quality with lossless compression while supporting efficient file handling across platforms. After the conversion process, we curated the dataset to ensure correct class labeling and balance across cell types. The final distribution of labeled samples is as follows:

ST-HSC: 1,457 samples

LT-HSC: 1,047 samples

MPP: 1,035 samples

Each sample is thus represented by a single 2048×2048 RGB image stored in PNG format, ready for use in model training and evaluation. The complete preprocessing pipeline, from OIR file parsing to PNG image generation, is illustrated in Fig. 2.

The total number of images and cells in the dataset is 3,539 and 74,579, respectively. To the best of our knowledge, this is the largest publicly available dataset of microscopic stem cell images (27), with both image-level and cell-level datasets are accessible via https://rutgers.box.com/s/80mb7brmultdwup3k8bj499d2kpbs7zp.

In our current imaging and labeling setup, each image contains cells of a single type, and labels are assigned at the image level. Based on this, we explore two classification approaches: image-level and cell-level classification. In the image-level approach, the entire image is treated as a single input unit, and the model predicts the class based on global features. In contrast, the cell-level approach involves detecting individual cells within an image and classifying them separately, allowing finer-grained analysis.

Image-Level Classification

In the image-level classification task, the objective is to assign each image to one of the three cell types, under the assumption that each image contains cells of a single, specific type. Accordingly, we developed classification models to categorize images into their respective classes, as illustrated in Fig. 3.

To ensure robust model evaluation, the dataset for each cell type was stratified into three subsets: training, validation, and testing. The split was performed with proportions of 60%, 20%, and 20%, respectively, maintaining class balance across these subsets.

Figure 2. RGB image preparation

To perform the classification tasks, we adopted two prominent convolutional neural network (CNN) architectures based on transfer learning: ResNet (24) and DenseNet (25). These architectures have shown state-of-the-art performance in numerous image classification benchmarks and are known for their efficiency in feature propagation and mitigation of the vanishing gradient problem, especially when implemented with increased depth.

Figure 3. Image-Level classification

ResNet addresses the degradation problem in deep neural networks by incorporating residual learning through the use of shortcut (skip) connections. These connections allow the model to learn residual functions with reference to the layer inputs, rather than directly learning unreferenced functions. Here, we explored different variants of ResNet, including ResNet-18, ResNet-34, ResNet-50, and ResNet101. These variants vary in depth with 11.7, 21.8, 25.6, and 44.5 million parameters, respectively. This allows us to evaluate the effect of network complexity on the classification performance.

DenseNet introduces direct connections from each layer to every subsequent layer (dense connectivity). Each layer receives as input the feature maps of all preceding layers, enhancing feature reuse, improving parameter efficiency, and further alleviating the vanishing-gradient problem. Here, we also investigated different DenseNet variants, including DenseNet121, DenseNet161, DenseNet169, and DenseNet201. These models have 8.0, 28.7, 14.2, and 20.0 million parameters, respectively. These variants differ in the number of dense blocks and layers within each block. DenseNets are considered more parameter-efficient than many other deep architectures due to feature reuse through dense connections, providing a good balance between computational cost and accuracy.

Each model was trained on a single NVIDIA A100 GPU, supported by 10 CPU cores and 128 GB of RAM. Due to the relatively limited size of the dataset, we applied data augmentation techniques (28) to reduce overfitting and improve model generalization. Specifically, input images underwent random horizontal and vertical flips with a 50% probability. Additionally, random affine transformations were applied, including translations up to 10% in both vertical and horizontal directions, rotations within ± 90 degrees, and scaling adjustments up to ± 20%. These augmentations help simulate realistic variations in cell orientation and size.

Prior to input into the neural networks, all images were resized to 512 × 512 pixels to standardize the input dimensions. During training, the models optimized the categorical cross-entropy loss function using the Adam optimizer (29) with a learning rate set to 0.0001. Training was conducted until convergence, with validation loss monitored to prevent overfitting.

Cell-Level Classification

In this approach, individual cells are extracted from the original microscopic images and subsequently classified into one of three predefined categories. Because labeling is performed at the image level, each image is associated with a single cell type, implying that all cells contained within a given image share the same class label. This presents a unique challenge in that cell-level ground truth labels are implicitly

inherited rather than explicitly annotated.

The first step in the pipeline involves the detection and segmentation of individual cells within each microscopy image. Following successful cell identification, each cell is cropped and assigned the label corresponding to its parent image. These extracted cell images are then saved in a standardized image format, creating a curated dataset tailored for training and evaluating machine learning models at the single-cell level.

This process allows the model to learn discriminative features specific to individual cells while leveraging the existing image-level annotations. The complete workflow, encompassing cell detection, labeling, extraction, and data preparation, is summarized in Fig. 4.

Figure 4. Cell-Level Classification

Due to the absence of annotated data specific to cell segmentation in our dataset, we employed classical image processing techniques to extract individual cells, treating them as “blobs” within the microscopic images. We evaluated three established blob detection methods: Laplacian of Gaussian (LoG), Difference of Gaussian (DoG), and Determinant of Hessian (DoH). Each of these methods relies on predefined parameters to identify blobs of varying sizes and intensities, most notably the minimum and maximum values of the standard deviation of the Gaussian probability distribution function (PDF). However, although these methods share similar input parameters, the way they interpret and utilize these parameters differs, which means parameter values optimized for one method are not necessarily effective for another.

To select the most suitable method for our application, we performed an empirical evaluation based on visual inspection. We randomly selected 100 representative images from the dataset and iteratively tuned the parameters of each blob detection algorithm. For each combination, we visually assessed the quality of cell segmentation, focusing on the accuracy of blob localization and the exclusion of artifacts. Examples of segmented images after parameter tuning are shown in Fig. 5.

Our qualitative assessment revealed that the LoG and DoG methods yielded comparable segmentation performance, both outperforming the DoH method, which showed inferior results under similar parameter ranges. However, further analysis indicated that DoG exhibited excessive sensitivity, resulting in numerous false positives and fragmented detections not consistent with cell morphology in this context. Consequently, we selected the LoG method as the most reliable and robust approach for cell segmentation in our subsequent analyses.

Following cell localization using the LoG method, individual cell patches were extracted from the multi-channel microscopy images. Cell patches are extracted from images that are divided into three distinct subsets: training, validation, and testing. Each image contributes patches to only one of these subsets, ensuring there is no overlap between datasets. Each extracted patch was cropped into a square shape centered on the detected cell coordinates to ensure consistent input dimensions. These patches were then saved in the lossless PNG format to preserve image quality for subsequent analysis.

As a result, we extracted 49,728 for ST-HSC, 12,759 for LT-HSC, and 12,092 for MPP, cell patches reflecting the relative abundance of cells captured in the dataset. To maintain consistency with the image-level classification protocol, the patches were stratified into training, validation, and test subsets with proportions of 60%, 20%, and 20%, respectively. We first describe the performance metrics used in our evaluation before presenting the results of these two classification strategies.

Results and Discussion

Before presenting the results of image-level and cell-level classifications, we first define and explain the performance metrics used in our evaluation.

Performance Measurement

To evaluate the classification performance, we employ two widely used metrics (30): balanced accuracy and the area under the receiver operating characteristic curve (AUROC).

Balanced accuracy is defined as the average of sensitivity and specificity:

$\:balanced\_accuracy\:=\:\frac{1}{2}\left(\frac{TP}{TP+FN}+\frac{TN}{TN+FP}\right)$

where

$\:TP,\:TN,\:FN,$

and

$\:FP$

are true positive, true negative, false negative, and false positive, respectively. In multi-class (current case), this value is computed per class, and an average over all balanced accuracy scores is calculated as the overall balanced accuracy.

AUROC is the area under the curve of true positives with respect to false positives and is inherently defined for binary classification. However, we use the definition of One-vs-All (OvA) (31) to compute it in the current multi-class classification task. In the OvA method, AUROCs are computed per class, and then the average of these values is determined.

Image-Level Classification

The classification performance of the trained models, evaluated on the test set, is presented in Table 1. As shown in this Table, DenseNet (25) and ResNet (24) architectures achieve comparable and promising performance. Among all, DenseNet169 architecture achieves slightly better results than other architectures in terms of both balanced accuracy and AUROC.

Cell-Level Classification

The classification models previously described were applied to this cell-level dataset with largely identical training parameters and data augmentation techniques. The only modification was the resizing of input patches to 64 × 64 pixels to accommodate the smaller scale of individual cells compared to whole images. Model performance on this task is summarized in Table 2.

Comparison and Discussion

As shown in Tables 1 and 2, across both the image-level and cell-level classification approaches, all evaluated models demonstrated strong predictive performance, indicating the effectiveness of our data preparation and model training strategies. Notably, the DenseNet169 architecture consistently outperformed other tested models in both tasks.

At the image-level classification, DenseNet169 achieves a balanced accuracy of 86.9% and an area under the receiver operating characteristic curve (AUROC) of 97.7%, reflecting robust discrimination between the three cell types when classifying entire images. Similarly, in the more granular cell-level classification task, DenseNet169 attained an improved balanced accuracy of 89.3% and an AUROC of 99.5%, underscoring its superior ability to distinguish individual cells across classes.

These results demonstrate that the DenseNet169 model provides a reliable and accurate framework for both image-based and single-cell classification in our dataset, making it a suitable candidate for further development and deployment in automated hematopoietic stem cell analysis.

The confusion matrix of DenseNet169, our top-performing model, is presented in Fig. 6. The results demonstrate that the model achieves excellent classification performance on the ST-HSC class, with near-perfect prediction accuracy. For the LT-HSC and MPP classes, the model also performs well, although with somewhat lower accuracy compared to ST-HSC. Notably, the MPP class appears to pose a greater challenge for the model, as indicated by a higher rate of misclassification relative to the other classes. This suggests that while DenseNet169 effectively captures discriminative features for the majority of cell types, additional refinement or more representative training data may be necessary to improve classification robustness for the MPP population.

Figure 6. DenseNet169 confusion matrix on the test dataset for cell patch classification

In a previous study, Wang et al. (18) applied a ResNet50 model for cell classification following a custom-developed cell extraction method. To contextualize our findings, we compare their reported results with those obtained using our cell-level classification models.

Figure 7 presents a direct comparison between Wang et al.’s model performance, our implementation of cell-level ResNet50, and our top-performing model, DenseNet169, applied at the cell level. This comparison highlights differences in classification AUROC, illustrating the improvements achieved through our cell extraction pipeline and model training strategies. To ensure a fair comparison with the approach described by Wang et al., we reproduced their ResNet-50 model using a single-channel input, combined with our improved preprocessing and cell extraction steps. Even under these optimized conditions, our implementation of the single-channel ResNet-50 underperformed compared to our proposed multi-channel models and more advanced architectures. This confirms that our improvements are consistent and significant.

Figure 7. Comparison of performance between the single-channel ResNet50 [18] and our multi-channel models.

As the results suggest, the integration of multiple imaging channels into a composite representation, coupled with the use of more advanced and tailored cell detection methods, substantially enhances the accuracy of cell identification.

Furthermore, leveraging well-trained transfer learning models, such as DenseNet169, significantly improves classification performance at both the image and cell levels. These combined strategies contribute to more reliable and precise characterization of hematopoietic stem cell populations, demonstrating the critical importance of optimized preprocessing, detection, and model selection in biomedical image analysis.

Explainability of our Proposed Model

To gain a clearer understanding of the specific regions that our model emphasizes when making predictions, we conducted a visual analysis of the Grad-CAM (Gradient-weighted Class Activation Mapping) outputs generated by our best-performing model, DenseNet169, as illustrated in Fig. 8. Grad-CAM highlights the areas within the input image that contribute most strongly to the model’s decision-making process.

Our observations reveal that the model predominantly focuses on the interior of the cell, including its surface, rather than the surrounding background or neighboring cells. This suggests that the model identifies distinctive textural features within the cell itself that allow it to differentiate one cell from another effectively. In other words, the unique texture patterns present inside the cell appear to be key discriminative factors that guide the model’s classification decisions.

We generate and visualize the Grad-CAM maps for the image classification task, as shown in Fig. 9. Through careful visual inspection, we found that DenseNet169 consistently concentrates its attention on particular cells within each category. These highlighted cells appear to serve as stronger indicators of their respective classes, exhibiting more pronounced differences compared to cells associated with other categories. This suggests that the model is able to identify and leverage distinct cellular features that are highly characteristic and discriminative for each class, thereby improving classification accuracy by focusing on the most representative samples within the images.

Limitations and Future Direction

Despite these promising results, several limitations remain. First, the dataset size, particularly at the image level where class counts range from 1,035 to 1,457 samples, is relatively modest by deep learning standards. Although the application of transfer learning mitigates some of the challenges associated with limited data, the generalizability of the models to external datasets, imaging platforms, or biological conditions remains to be established. Second, the cell-level classification task exhibits inherent class imbalance at the patch level, with the ST-HSC class significantly overrepresented relative to LT-HSC and MPP. This imbalance may affect model robustness, particularly for the minority classes. While class weighting and sampling strategies were employed during training, future efforts may benefit from more balanced or augmented datasets. Third, the current analysis is based on static, endpoint imaging and does not incorporate temporal or spatial contextual information. Incorporating time-lapse microscopy or spatially resolved imaging modalities could support the development of dynamic, context-aware classification systems, potentially enabling real-time stem cell sorting and lineage tracking. In future iterations of this work, we aim to address these limitations by expanding the dataset, ensuring balanced representation of classes, and developing a scalable framework to support automated stem cell sorting.

Conclusion

In this work, we addressed the challenge of classifying hematopoietic stem cells (HSCs) and multipotent progenitors (MPPs) into functional subpopulations by combining new data resources with advanced deep learning methods. We first curated and released what is, to the best of our knowledge, the largest publicly available dataset of microscopic stem cell images, including both image-level and cell-level data. This dataset establishes a foundation for reproducible research and provides the community with a benchmark to advance automated cytometry and stem cell characterization.

Using this resource, we developed a comprehensive deep learning pipeline that integrates RGB-based preprocessing of multi-slice images, classical blob detection for cell segmentation, and transfer learning with ResNet and DenseNet architectures. Across multiple experiments, DenseNet169 consistently achieved the best performance, reaching an AUROC of 99.5% and a balanced accuracy of 89.3% in cell-level classification, thereby outperforming previously reported single-channel approaches. Grad-CAM explainability confirmed that the models focused on biologically meaningful features, strengthening confidence in the automated system. The open-source implementation of our pipeline is available at: https://github.com/MLBC-lab/HSC_vs_MPP. We curated and released the largest open-source dataset of multi-channel microscopic stem cell images, which is publicly available at: https://rutgers.box.com/s/80mb7brmultdwup3k8bj499d2kpbs7zp.

Taken together, these results demonstrate that large-scale datasets combined with optimized deep learning pipelines can substantially improve the scalability and accuracy of HSC/MPP classification. Looking ahead, we plan to extend this work by releasing stem cell video recordings and developing improved cell tracking algorithms to capture dynamic cellular behaviors. Such temporal extensions will further enhance the ability of imaging-based methods to complement or replace fluorescence-activated cell sorting, ultimately advancing precision hematology.

Data availability

The open-source implementation of our pipeline is available at: https://github.com/MLBC-lab/HSC_vs_MPP. We curated and released the largest open-source dataset of multi-channel microscopic stem cell images, which is publicly available at: https://rutgers.box.com/s/80mb7brmultdwup3k8bj499d2kpbs7zp.

Funding

This study is funded by NSF-NRT grant number 2152059.

Author Contribution

JH and JH* developed the dataset and conducted the laboratory microscopic imaging. SMA cleaned and organized the data and contributed to writing the paper. VK developed and designed the model, ran and tested it on the data, and contributed to writing the paper. ID defined and supervised the project, analyzed the results, and contributed to writing and revising the paper. All authors wrote and reviewed the manuscript prior to submission.

(*Jianzhong Han and Jian Huang)

Data Availability

The open-source implementation of our pipeline is available at: [https://github.com/MLBC-lab/HSC\_vs\_MPP](https:/github.com/MLBC-lab/HSC_vs_MPP) . We curated and released the largest open-source dataset of multi-channel microscopic stem cell images, which is publicly available at: [https://rutgers.box.com/s/80mb7brmultdwup3k8bj499d2kpbs7zp](https:/rutgers.box.com/s/80mb7brmultdwup3k8bj499d2kpbs7zp) .

References

Harrison DE, Stone M, Astle CM. Effects of transplantation on the primitive immunohematopoietic stem cell. J Exp Med. 1990;172(2):431–7.

Bryder D, Rossi DJ, Weissman IL. Hematopoietic stem cells: the paradigmatic tissue-specific stem cell. Am J Pathol. 2006;169(2):338–46.

Orkin SH, Zon LI. Hematopoiesis: an evolving paradigm for stem cell biology. Cell. 2008;132(4):631–44.

Morrison SJ, Uchida N, Weissman IL. The biology of hematopoietic stem cells. Annu Rev Cell Dev Biol. 1995;11(1):35–71.

Seita J, Weissman IL. Hematopoietic stem cell: self-renewal versus differentiation. Wiley Interdisciplinary Reviews: Syst Biology Med. 2010;2(6):640–53.

Shapiro HM. Practical Flow Cytometry [Internet]. 1st ed. Wiley; 2003 [cited 2025 Oct 6]. Available from: https://onlinelibrary.wiley.com/doi/book/10.1002/0471722731

Caicedo JC, Cooper S, Heigwer F, Warchal S, Qiu P, Molnar C, et al. Data-analysis strategies for image-based cell profiling. Nat Methods. 2017 Sept;14(9):849–63.

Ljosa V, Sokolnicki KL, Carpenter AE. Annotated high-throughput microscopy image sets for validation. Nat Methods. 2012 July;9(7):637–637.

Azim SM, Corbett B, Dehzangi I, ROIsGAN:. A Region Guided Generative Adversarial Framework for Murine Hippocampal Subregion Segmentation [Internet]. arXiv; 2025 [cited 2025 June 3]. Available from: https://arxiv.org/abs/2505.10687

10.

Chai J, Zeng H, Li A, Ngai EW. Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Mach Learn Appl. 2021;6:100134.

11.

Khalkhali V, Azim SM, Dehzangi I. ExShall-CNN: An Explainable Shallow Convolutional Neural Network for Medical Image Segmentation. MAKE. 2025;7(1):19.

12.

Lecun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44.

13.

Amitay Y, Bussi Y, Feinstein B, Bagon S, Milo I, Keren L. CellSighter: a neural network to classify cells in highly multiplexed images. Nat Commun. 2023;14(1):4302.

14.

Beau M, Herzfeld DJ, Naveros F, Hemelt ME, D’Agostino F, Oostland M et al. A deep-learning strategy to identify cell types across species from high-density extracellular recordings. bioRxiv. 2024.

15.

Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey on deep transfer learning. In Springer; 2018. pp. 270–9.

16.

Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, et al. A Comprehensive Survey on Transfer Learning. Proc IEEE. 2021;109(1):43–76.

17.

Kornblith S, Shlens J, Le QV. Do better imagenet models transfer better? In 2019. pp. 2661–71.

18.

Wang S, Han J, Huang J, Islam K, Shi Y, Zhou Y, et al. Deep learning-based predictive classification of functional subpopulations of hematopoietic stem cells and multipotent progenitors. Stem Cell Res Ther. 2024;15(1):74.

19.

Buggenthin F, Buettner F, Hoppe PS, Endele M, Kroiss M, Strasser M, et al. Prospective identification of hematopoietic lineage choice by deep learning. Nat Methods. 2017;14(4):403–6.

20.

Chattopadhay A, Sarkar A, Howlader P, Balasubramanian VN. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) [Internet]. 2018 [cited 2025 June 8]. pp. 839–47. Available from: https://ieeexplore.ieee.org/document/8354201

21.

Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In: 2017 IEEE International Conference on Computer Vision (ICCV) [Internet]. 2017 [cited 2025 June 8]. pp. 618–26. Available from: https://ieeexplore.ieee.org/document/8237336

22.

Zheng Q, Wang Z, Zhou J, Lu J, Shap-CAM. Visual Explanations for Convolutional Neural Networks Based on Shapley Value. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors. Computer Vision – ECCV 2022. Cham: Springer Nature Switzerland; 2022. pp. 459–74.

23.

Gopalakrishnan V, Ma J, Xie Z. Grad-CAMO: Learning Interpretable Single-Cell Morphological Profiles from 3D Cell Painting Images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 2024. pp. 6988–96.

24.

He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [Internet]. Las Vegas, NV, USA: IEEE; 2016 [cited 2025 Mar 26]. pp. 770–8. Available from: http://ieeexplore.ieee.org/document/7780459/

25.

Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [Internet]. Honolulu, HI: IEEE; 2017 [cited 2025 Mar 26]. pp. 2261–9. Available from: https://ieeexplore.ieee.org/document/8099726/

26.

Olympus. Olympus Global Homepage [Internet]. 2025. Available from: https://www.olympus-global.com/

27.

Maška M, Ulman V, Delgado-Rodriguez P, Gómez-de-Mariscal E, Nečasová T, Guerrero Peña FA, et al. The Cell Tracking Challenge: 10 years of objective benchmarking. Nat Methods. 2023 July;20(7):1010–20.

28.

Xu M, Yoon S, Fuentes A, Park DS. A Comprehensive Survey of Image Augmentation Techniques for Deep Learning. 2022 [cited 2025 Oct 6]; Available from: https://arxiv.org/abs/2205.01491

29.

Kingma DP, Ba JL. Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR. 2015 - Conference Track Proceedings. 2015;1–15.

30.

Rainio O, Teuho J, Klén R. Evaluation metrics and statistical tests for machine learning. Sci Rep. 2024;14(1):6086.

31.

Wang L, Carvalho L, Multiclass arXiv. 2024 [cited 2025 Oct 6]. Available from: https://arxiv.org/abs/2404.13147

Yes

Abstract

Background: The functional classification of hematopoietic stem cells (HSCs) and multipotent progenitors (MPPs) is central to understanding hematopoiesis and developing regenerative therapies. Traditional fluorescence-activated cell sorting (FACS) has been the gold standard for distinguishing these subpopulations but remains labor-intensive, technically demanding, and limited in scalability. Automated, image-based approaches offer a promising alternative, yet their application to hematopoietic stem and progenitor cells has been constrained by the lack of large, annotated datasets and standardized analytic frameworks. Methods: We present the largest publicly available microscopy dataset of hematopoietic stem and progenitor cells to date, encompassing three biologically distinct subpopulations: long-term HSCs (LT-HSCs), short-term HSCs (ST-HSCs), and MPPs. To analyze this resource, we developed a deep learning framework based on transfer learning using DenseNet architectures. A novel preprocessing strategy transformed multi-slice grayscale microscopy data into RGB composites, facilitating compatibility with pre-trained convolutional neural networks (CNNs). Two complementary pipelines were designed: (i) an image-level pipeline to classify entire microscopy fields and (ii) a cell-level pipeline incorporating Laplacian of Gaussian (LoG)–based blob detection for single-cell segmentation. Each model was trained and validated using stratified splits, with extensive data augmentation to enhance generalization. Results: Among all architectures evaluated, DenseNet169 achieved the highest performance, attaining an area under the receiver operating characteristic curve (AUROC) of 99.5% and a balanced accuracy of 89.3% in the cell-level classification task. The model effectively distinguished LT-HSC, ST-HSC, and MPP populations, substantially outperforming previously reported single-channel or non-segmented approaches. Grad-CAM visualization confirmed that the model’s discriminative focus aligned with biologically relevant cellular regions, supporting interpretability and reproducibility. Comparative analyses demonstrated that integrating multi-channel image representation and optimized segmentation markedly enhanced accuracy and robustness. Conclusion: This work introduces a reproducible and open-source deep learning framework for hematopoietic stem and progenitor cell classification. By integrating multi-channel imaging, transfer learning, and explainable AI, the proposed approach establishes a scalable, label-free alternative to conventional FACS, paving the way for high-throughput and automated phenotyping in hematopoietic and regenerative medicine research.