A
An integrated single-cell transcriptomic dataset for Mouse cortex
Authors & Affiliations
Author names:
Title
A
Xuefeng Shi 1
Zhi-hui Qi 1
Hong Huang 4
Zhi-ming Ye 4
YuMin Wu 1
Kahei Chan 1
Maojin Yao 2
Zhongxing Wang 1
Zhong-xing Wang 1✉ Email
Mao-jin Yao 5✉ Email
1 Department of Anesthesia The First Affiliated Hospital of Sun Yat-sen University 510080 Guangzhou Guangdong China
2 Brain Research Center Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University 510120 Guangzhou China
3 Department of Neurology Sun Yat- Sen Memorial Hospital, Sun Yat-Sen University 510120 Guangzhou China
4 Department of Thoracic Surgery and Oncology, State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Disease the First Affiliated Hospital of Guangzhou Medical University 510120 Guangzhou China
5 Affiliated Hospital of Sun Yat-sen University 510080 Guangzhou Guangdong China
Xuefeng Shi a, Zhi-hui Qi a, Hong Huang c, Zhi-ming Ye c, YuMin Wu a, Kahei Chan a, Maojin Yao b*, Zhongxing Wang a*
Xuefeng Shi, Zhi-hui Qi, Hong Huang, and Zhi-ming Ye contributed equally to this work.
Affiliations:
a. The First Affiliated Hospital of Sun Yat-sen University, Department of Anesthesia, Guangzhou, Guangdong, 510080, China.
b. Brain Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou 510120, China, Department of Neurology, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou 510120, China.
c. Department of Thoracic Surgery and Oncology, the First Affiliated Hospital of Guangzhou Medical University, State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Disease, Guangzhou 510120, China
Corresponding author:
Zhong-xing Wang, Mao-jin Yao
Corresponding author email
Zhong-xing Wang: wzhxing@mail.sysu.edu.cn
Mao-jin Yao: maojin.yao@aliyun.com
Present/permanent address:
Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, 510080, China.
A
Abstract
As the central functional hub of the central nervous system, the cerebral cortex has long been a major frontier in neuroscience research. With the increasing maturity and widespread application of single-cell RNA sequencing technologies, multiple studies leveraging this technology have been conducted to systematically decipher the complexity and diversity of cortical cellular composition. However, substantial variations in sequencing platforms, cohort sizes, and sequencing depth have impeded deeper investigation into the functions of cortical cells and their underlying molecular mechanisms. Here we present a comprehensive cortical transcriptome dataset integrating multimodal data from 9 mouse datasets, encompassing 10x and Drop-seq (single-cell/nucleus) profiling. Following rigorous quality control, we systematically analyzed 173,081 high-quality cells, providing a comprehensive characterization of cellular composition, intercellular communication networks, chromatin accessibility, and functional properties. To evaluate the cross-species relevance of our findings, we performed comparative analyses with single-cell datasets of mixed cortical tissues from humans, chimpanzees, bonobos, and macaques (n = 29,353). This integrated resource provides a foundational reference for cortical transcriptomes and a standardized framework for cross-platform integration.
Subject terms: Cortex, transcriptome sequencing
Background & Summary
The cerebral cortex represents the most intricate and integrative structure of the central nervous system, orchestrating higher-order functions such as sensory perception, consciousness, decision-making, and motor coordination1. Distinct cortical regions exhibit specialized functional architectures shaped by differences in cytoarchitecture and synaptic connectivity2. For instance, the primary motor cortex is enriched in projection neurons that initiate voluntary movements, whereas the primary somatosensory cortex decodes peripheral sensory inputs. The prefrontal cortex, through its highly interconnected microcircuits, integrates multimodal information to mediate executive control, cognitive processing, and memory regulation3. This hierarchical and functionally diversified organization underpins the neural basis of complex behaviors and higher cognition.
Over the past decades, extensive studies have elucidated the fundamental principles governing laminar organization and neural circuit formation4. However, the intricate cellular heterogeneity and dynamic state transitions within the cortex remain incompletely resolved5. The advent of single-cell and single-nucleus RNA sequencing (scRNA-seq and snRNA-seq) has enabled high-resolution reconstruction of the cortical transcriptomic landscape, uncovering profound molecular diversity and functional specialization across neuronal and glial lineages68. Nevertheless, discrepancies in sequencing platforms, sample processing, and data analysis strategies across studies have resulted in fragmented datasets and inconsistent outcomes, hindering the systematic integration of cortical cell fate determination, transcriptional regulatory networks, and other critical processes9.
To overcome the limitations imposed by data heterogeneity and technical discrepancies across studies, it is essential to establish a unified, high-resolution cortical transcriptome framework to lay the foundation for systematic research in cortical transcriptomics. In response to this need, we have developed a comprehensive cortical transcriptome resource integrating multimodal sequencing data. This dataset compiles single-cell and single-nucleus transcriptomic information from multiple mouse cortical samples, covering major sequencing platforms such as 10x Genomics and Drop-seq. Through rigorous data standardization, quality control, and multimodal integration, we have created a systematic reference atlas of the cortical transcriptome, providing a standardized benchmark for in-depth analysis of cortical cell composition and functional states10. Additionally, utilizing this resource, we systematically evaluated the core functions and intercellular communication networks of cortical cells11. To validate the consistency and complementarity of different sequencing technologies in cell type identification and functional state characterization, we performed integrated analyses with mouse cortical ATAC-seq data and multi-species cortical transcriptomes. The results show that it has strong applicability in deciphering chromatin accessibility and assessing evolutionary conservation.
Methods
RNA-seq Data preprocessing and integration
A
A
A total of 10 publicly available scRNA-seq and snRNA-seq datasets were collected, comprising 9 mouse cerebral cortex samples and a mixed sample containing data from humans, chimpanzees, bonobos, and macaques. Detailed information on the datasets included in the analysis is summarized in Table S1. The standard workflow for data preprocessing and cell clustering was followed using the Seurat package (v4.1.0) in R (v4.3.0). Each RNA-seq dataset was analyzed uniformly through a pipeline that included quality control (QC), normalization, feature selection, data scaling, principal component analysis (PCA) for dimension reduction, Harmony integration, clustering, and Uniform Manifold Approximation and Projection (UMAP) projection for visualization12. QC for each dataset involved filtering cells to retain only those with mitochondrial gene counts below 20% and feature counts between 200 and 3000. Log-normalization was applied to normalize gene expression in each dataset13. The Elbow plot generated for each dataset was used to determine the number of PCs to include in further analysis. Harmony was used to integrate data from different sequencing batches14. To evaluate the potential influence of the cell cycle, cell cycle scores were assigned to each cell based on known cell cycle–related genes. Cell clustering was performed using the FindNeighbors and FindClusters functions, and nonlinear dimensionality reduction was conducted using the RunUMAP function. Cluster-specific marker genes were identified using FindAllMarkers, and clusters were annotated based on these marker genes. For visualization, UMAP plots displaying the annotated clusters were generated.
scATAC-seq data processing
For scATAC-seq data, we applied quality thresholds of 1e05 ≥ nFrags ≥ 1000 and TSS enrichment score ≥ 5 to retain high-quality nuclei, while excluding the chrM and chrY chromosomes15. We performed iterative latent semantic indexing (LSI) dimensionality reduction using the addIterativeLSI function, followed by unsupervised clustering via the addClusters function with Seurat's Leiden algorithm16. Cell populations were visualized using UMAP.
Cell Type Annotation and Cross-Dataset Cross-Validation
We annotated all identified clusters with cell types based on the expression or co-expression patterns of canonical marker genes. For the ATAC-seq datasets, cell type annotation was performed by integrating chromatin accessibility profiles at key genomic loci with label transfer from reference single-cell RNA-seq datasets. To validate the robustness of the annotations, we performed non-negative least squares (NNLS) regression analysis to assess the similarity between the identified clusters and reference cell types across multiple independently generated datasets17. All gene expression visualizations were generated using the ggplot2 package.
Gene activity scores of the scATAC-seq datasets
The gene activity matrix was computed using the GeneActivity function in Signac, which quantifies accessibility in gene bodies and promoter regions. To mitigate technical noise inherent in sparse scATAC-seq data, the resulting gene activity scores were normalized via the NormalizeData function and scaled using the ScaleData function. We then performed dimensionality reduction using RunPCA on the gene activity matrix. Cell clusters were identified based on chromatin accessibility profiles, and the FindAllMarkers function was subsequently applied to identify marker genes for each cluster. Finally, cell types were annotated by integrating with a reference single-cell RNA-seq dataset via label transfer using the FindTransferAnchors and TransferData functions15.
Identification of cell-type-specific regulatory regions
Cell-type-specific peaks were identified using the FindAllMarkers function (test.use = "LR") on the peak assay, with a log-fold change threshold of 0.25 and a minimum fraction threshold of 0.25. Significant differential peaks (adjusted p-value ≤ 0.05) were annotated to their nearest genes using the ClosestFeature function. To visualize chromatin accessibility patterns, we generated coverage plots for key marker genes across all major cell types using the CoveragePlot function.
Chromatin Co-accessibility Analysis
To identify coordinately regulated cis-regulatory elements, we performed chromatin co-accessibility analysis using ArchR on the ATAC-seq datasets18. We constructed a genome-wide co-accessibility network based on correlations of chromatin accessibility across all cells19. Genomic loci demonstrating tight connectivity within this network were defined as chromatin co-accessibility modules, representing elements with potential synergistic regulatory functions. By linking these modules to annotated gene promoters, we inferred their potential roles in cell type-specific transcriptional regulation.
Cell–Cell Communication analysis
To systematically investigate intercellular signaling networks across distinct datasets, we performed a computational analysis of cell-cell communication using CellChat (v1.6.1) on annotated cell populations. The gene expression matrix was used to calculate potential ligand-receptor interaction probabilities between all pairwise combinations of cell types. This quantitative framework enabled the identification of significantly enriched signaling pathways and supported the systematic inference of intercellular communication probabilities11.
scWGCNA Analysis
We performed single-cell weighted gene co-expression network analysis (scWGCNA) using the hdWGCNA package (v0.2.5)20. The merged single-cell RNA-seq object was initialized with genes expressed in at least 5% of cells. Metacells were constructed by grouping cells within each annotated subcluster using k-nearest neighbors (k = 10) with a maximum shared-cell limit of 10. Following metacell normalization, we generated the expression matrix for network construction.
The soft power-threshold was determined through scale-free topology analysis (signed network type), selecting the lowest power that achieved a scale-free fit index > 0.8. The gene co-expression network was constructed using this optimized power value. Module eigengenes were computed with batch correction for sample origin. We assessed module connectivity by calculating kME values within each subcluster and assigned modules with systematic identifiers.
Hub genes were defined as the top 25 genes ranked by kME within each module. Module expression scores were computed using both Seurat and UCell algorithms, based on the top 25 genes per module21. Finally, we visualized module eigengene patterns using feature plotting and exported all results for downstream analysis.
Data Records
The single-cell RNA-seq and ATAC-seq data in this study were all sourced from the Gene Expression Omnibus (GEO) under the accession codes listed in Table S1. Additionally, All data were uploaded to Figshare
(https://figshare.com/articles/dataset/An_integrated_single-cell_transcriptomic_dataset_for_Mouse_cortex/30672836).
Technical Validation
Based on data obtained from the GEO platform, we collected transcriptomic data from adult mouse cortical tissues and a mixed dataset from human, chimpanzee, bonobo, and macaque cortical tissues. Additionally, we analyzed publicly available scATAC-seq data from the cortex. The reliability of the transcriptomic data was supported by standard quality metrics including nFeature_RNA, nCount_RNA, mitochondrial gene percentage (percent.mt), and hemoglobin gene percentage (HB_percent), while the scATAC-seq data quality was confirmed by high fragment counts, transcription start site (TSS) enrichment, and the fraction of reads in peaks (FRiP) (Fig. 1A-C). According to the different sequencing platforms and experimental methods, a total of 11 libraries were analyzed and annotated 22,23,2331. Detailed information on the cell type marker genes used for all transcriptomic analyses in this study is provided in Fig. 1D-E.
Fig. 1
Quality control of 12 data. (A) Boxplots showing the number of detected genes per cell (nFeature_RNA), total RNA counts per cell (nCount_RNA), mitochondrial gene percentage (percent.mt), and hemoglobin gene percentage (HB_percent) across 9 adult mouse cortical datasets. (B) Boxplots showing nFeature_RNA and nCount_RNA distributions across mixed samples from human, chimpanzee, bonobo, and macaque cortical tissues. (C) Boxplot showing the number of unique nuclear fragments, the distribution of the transcription start site (TSS) enrichment score, and the fraction of reads in peak (FRiP) of mouse cortical ATAC datasets. (D) Heatmap of cell type-specific marker genes from a mixed cortical dataset of human, chimpanzee, bonobo, and macaque. (E) Heatmap of cell type-specific marker genes from 9 adult mouse cortical datasets
Click here to Correct
A total of 173,081 high-quality RNA-seq cells were obtained from adult mouse brain tissue, and a total of 8 cell types were annotated. In addition, cell cycle scoring analysis indicated that cell clustering was not driven by cell cycle effects32, and cell density visualization further confirmed that no single cluster was disproportionately influenced by variations in cell number (Fig. 2A). We applied Harmony to remove batch effects arising from differences in sequencing depth. Through cross-dataset non-negative least squares regression analysis, we demonstrated that all annotated cell types exhibit highly consistent similarity across different technological platforms, confirming the robustness of our cell type annotations (Fig. 2B).
Fig. 2
Integration and similarity assessment of the Mouse datasets (A) UMAP embeddings of the 4 datasets, colored by cell type, data source, cell cycle phase, and cell density, respectively. (B) Heatmap of similarity across the four datasets based on Non-negative Matrix Least Squares (NMLF).
Click here to Correct
To further explore the function of these cells under different sequencing methods, we used WGCNA to identify the functional hub genes of 8 cells. A total of 33 co-expression modules were identified: 7 in drop-sn, 8 in drop-sc, 6 in 10x-sc, and 12 in 10x-sn. We subsequently performed functional enrichment analysis on each gene co-expression module to systematically identify the dominant functional characteristics of different cell types (Fig. 3A). Modules that did not yield any meaningful functional enrichment results were excluded from subsequent analyses. For each module, we calculated the intramodular connectivity of every gene, selected the top 25 genes with the highest connectivity as hub genes, and subsequently determined the correlation between module gene expression levels and cell types (Fig. 3B). Specifically, the results revealed that both glial cells and neuronal populations play significant roles in synaptic-related functions and metabolic support pathways, while exhibiting distinct functional specializations: neuronal functions primarily focus on synaptic signaling and neuromodulation, whereas glial cells mainly govern the assembly, maintenance, and fine-tuned regulation of synaptic structures. Additionally, to investigate the core functional genes within co-expression modules, we extracted the top 40 genes from each module and further partitioned them using the Louvain community detection algorithm. The results indicated that core genes identified across four datasets predominantly involved protein synthesis (Rpl23a, Rpl35, Rps23), energy metabolism (Aldoa, Slc25a4, Ftl1)33, synaptic signaling (Gria4, Dlg2)34, and glial support functions (Slc1a3, Apoe, Mbp) (Fig. 3C-F). Overall, these findings collectively reveal the key regulatory mechanisms of neurons and glial cells.
Fig. 3
WGCNA Analysis of 4 Datasets. (A) GO Enrichment Heatmap of Co-expression Modules (top 10 terms ordered by log_padjust). (B) Heatmap of correlations between cell types and co-expression modules (based on the top 10 genes by expression level) (C) Core Gene Interaction Network of Co-expression Modules from 10xsc Data. (D), (E), and (F) are the same as (C) but derived from 10xsn, dropsc, and dropsn data, respectively.
Click here to Correct
To gain deeper insights into the cellular interaction network within the cortex, we performed cell-cell communication analysis on the datasets using CellChat. The results revealed significant and strong interactions among excitatory neurons, astrocytes, and oligodendrocyte precursor cells (OPCs), except in the dataset in which OPCs were not detected (Fig. 4A). Further pattern recognition analysis identified that the core signaling pathways shared across the four datasets were primarily enriched in key biological processes, including cell adhesion, neuron-glia interactions, synaptogenesis, and axon guidance (Fig. 4B-C). Notably, we found that the communication among these three cell types is predominantly mediated by the NRXN and PTN signaling pathways. Neurexin (NRXN), primarily localized at the neuronal presynaptic membrane, functions by trans-synaptically organizing and stabilizing both excitatory and inhibitory synapses through interactions with postsynaptic ligands such as Neuroligin, serving as a central molecule in synapse formation and specificity regulation35. Pleiotrophin (PTN), a signaling molecule secreted by astrocytes, binds to its receptors (e.g., Syndecan-3, RPTPβ/Z) and concurrently promotes neuronal survival, dendritic spine formation, and synapse maturation, while also regulating the migration and differentiation of OPCs36. Therefore, investigating the NRXN and PTN signaling pathways is likely to provide valuable insights into cortical cellular interactions. Furthermore, we discovered that signaling pathways related to cell adhesion molecules—including NEGR, CADM, and CDH—serve as essential communication mediators between astrocytes and excitatory neurons37. These molecules form specific trans-synaptic adhesion complexes that directly facilitate the physical envelopment and spatial positioning of synaptic structures by astrocytes. This finding highlights the critical role of intercellular physical contact in neuron-glia interactions and provides a new molecular theoretical foundation for understanding nervous system development and synaptic plasticity (Fig. 4D).
Fig. 4
Intercellular Communication Analysis of 4 Datasets. (A) Cellular communication networks across 4 datasets. (B) Upset plot of signaling pathways across four datasets. (C) Ligand-receptor interaction network of conserved signaling pathways across four datasets. (D) Sankey diagram depicting conserved signaling pathway activity among three cell types.
Click here to Correct
To systematically evaluate the performance of transcriptomic datasets in scATAC-seq data integration, we collected 10,055 high-quality cortical scATAC-seq cells for cross-modal integration with RNA-seq data (Fig. 5A)38. Our integration consistency assessment revealed that 4,139 cells (41.1%) obtained consistent cell-type annotations across at least three datasets, indicating that most cells achieve stable annotations across different reference datasets. Further analysis demonstrated that excitatory and inhibitory neurons were prone to mutual misclassification during cross-dataset integration. Surprisingly, despite their substantial heterogeneity, astrocytes exhibited remarkable stability, suggesting that astrocytes may maintain more conserved regulatory patterns at the chromatin accessibility level (Fig. 5B)39.
To delineate the regulatory architecture underlying cell type-specific chromatin landscapes in cortex, we extracted 4139 cells with consistent cell type annotations in at least three datasets for subsequent analysis(Fig. 5C).We identified 6 major cell types, namely, Astrocytes, OPCs, Excitatory neurons, Inhibitory neurons, Oligodendrocytes, and Microglial cells, each exhibiting distinct chromatin accessibility patterns at established marker loci (Fig. 5D). Subsequently, we systematically constructed chromatin accessibility networks using Cicero co-accessibility analysis. Network topology analysis based on betweenness centrality (threshold: >80th percentile) identified 18 genes as topologically central hubs (Fig. 5E–F). Among them, Cacna1b encodes the pore-forming subunit of neuronal N-type voltage-gated calcium channels, directly mediating presynaptic calcium influx and triggering neurotransmitter release40. The axonal guidance molecule Ntng2 (Netrin-G2) orchestrates excitatory synaptogenesis via mechanisms involving specific cell adhesion41. Notably, we discovered the central hub positions of metabolism-related genes (Rapgef1, Ass1), which suggests the intrinsic metabolic state of a cell may serves as a key driver in reshaping the chromatin accessibility landscape42.
Fig. 5
Single-cell chromatin accessibility landscape of the cerebral cortex. (A) UMAP embedding of scATAC-seq data from cortical regions, with dashed lines indicating misprojected cell types. (B) Heatmap of cell type misclassification probabilities in scATAC-seq data. (C) UMAP embedding of filtered scATAC-seq data. (D) The marker peak features corresponding to the 6 cortical celltypes. (E) Chromatin accessibility co-accessibility network. (F) Peak accessibility features of topologically central hub genes.
Click here to Correct
To investigate the performance of the four datasets in cross-species analysis, we collected a publicly available single-nucleus transcriptomic dataset of mixed cortical tissues from humans, chimpanzees, bonobos, and macaques (n = 29,353). Due to variations in sequencing depth, only four major cortical cell types were reliably detected (astrocytes, oligodendrocytes, inhibitory neurons, and excitatory neurons) (Fig. 6A–B). We then performed a correlation analysis based on the average expression of conserved highly variable genes across species within each cluster. The results demonstrated highly consistent patterns among the four datasets across all cell types (similarity > 0.7). Furthermore, we analyzed the evolutionary rate of gene expression across cell types, which suggested that astrocytes exhibit lower interspecies divergence compared with other cell types, potentially reflecting their highly conserved functional architecture (Fig. 6C)43.
To capture subtle species-specific differences in gene expression, we further analyzed significantly upregulated differentially expressed genes (DEGs) within each cluster across species. Surprisingly, astrocytes showed fewer overlapping DEGs (Fig. 6D). Further principal component analysis of shared gene expression patterns revealed that oligodendrocytes, inhibitory neurons, and excitatory neurons displayed similar cross-species expression profiles, whereas astrocytes exhibited highly species-specific expression patterns (Fig. 6E). Specifically, in addition to canonical astrocyte markers such as AQP4, SLC1A3, and GFAP, astrocytes were enriched for genes involved in core functional pathways44. These include genes regulating immune response and signaling (e.g., PON2, ID3)45, lipid metabolism and transport (e.g., ETNPPL, CLU, FADS2, APOE)46, as well as specialized molecular transport (e.g., SLC39A12, SLCO1C1)47. In contrast, although oligodendrocytes are also glial cells, their core myelin structural components (MOG, PLP1, MAG, MBP)48, signaling-related factors (LPAR1, EFNB3, S1PR5)49, and cytoskeletal regulatory proteins (ERMN, CRYAB, GSN) exhibit conserved expression patterns across species that are highly consistent with neurons. This finding suggests that oligodendrocytes, as glial cells closely coupled with neuronal function, may have undergone evolutionary conservation pressure similar to those of neurons for their core functional modules, thereby forming a coordinated and unified cross-species expression pattern.
Fig. 6
Cross-species evolutionary analysis of cortical cell types (A) UMAP visualization of integrated cross-species data, colored by cell type. (B) Same UMAP as (A), colored by species (B: Bonobo; C: Chimpanzee; H: Human; M: Macaque). (C) Left: Spearman correlation of per-cluster mean normalized gene counts between 4 mouse datasets and different species. Center: Similarity heatmap of correlation results from the 4 datasets. Right: Evolutionary rate of gene expression across cell types in mouse (y-axis: variance of log-ratio of gene expression levels). (D) Volcano plots of DEGs between 4 datasets and different species, with connecting lines indicating co-upregulated orthologs across species in the same cell type. (E) PCA of cross-species expression patterns for DEGs identified in four datasets.
Click here to Correct
This study constructed multiple transcriptomic atlases of mouse cortical cells across different sequencing platforms through systematic technical validation. We rigorously evaluated data reliability and uncovered core gene modules, key signaling pathways, and chromatin regulatory networks that coordinate between neurons and glial cells to maintain cortical function. In addition, through cross-omics, and cross-species analyses, we confirmed the robustness of data integration across technologies. In summary, we provide a high-quality data resource and analytical framework for advancing the exploration of cellular heterogeneity, molecular regulatory mechanisms, and evolutionary conservation in the cerebral cortex.
Usage Notes
All data processing pipeline, including cell filtering, clustering and annotating, which were run using R version 4.3.1. The Python/R codes used for relative analyses are provided online. (https://figshare.com/articles/dataset/An_integrated_single-cell_transcriptomic_dataset_for_Mouse_cortex/30672836)
A
Funding
Maojin Yao, National Natural Science Foundation of China, General Program, "Mechanism of Airway GFAP + Glial Cells in Promoting Epithelial Barrier Repair Following Viral Infection", Grant Nos.32470888
Maojin Yao, Guangzhou National Laboratory - State Key Laboratory of Respiratory Disease (Guangzhou Medical University) Joint Funding Project 2024, "Study on the Mechanisms and Intervention Strategies for Small Cell Lung Cancer Development", Project No. GZNL2024B01004, Grant Nos.GZNL2024B01004
Zhongxing Wang, National Natural Science Foundation of China, Grant Nos.82272224
Zhongxing Wang, the Basic and Applied Basic Research Foundation of Guangdong Province, Grant Nos. 2021A1515220042
Zhongxing Wang, Natural Science Foundation of Guangdong Province, Grant Nos. 2022A1515012475
A
Data Availability
Among the input data processed in this reanalysis, twelve datasets were all acquired from the NCBI GEO database, with specific accession codes including: GSE255405, GSE137665, GSE276683, GSE273765, GSE239477, GSE172382, GSE160519, GSE106678, GSE126074,and GSE127774. The set of samples used in this study is summarized in Table S1. Seurat objects for all datasets have been deposited in the Figshare repository (https://figshare.com/articles/dataset/An_integrated_single-cell_transcriptomic_dataset_for_Mouse_cortex/30672836).
A
A
Author Contribution
Conceptualization, Xuefeng Shi and Zhihui Qi; Methodology, YuMin Wu, Kahei Chan, and Zhihui Qi; Formal Analysis, Zhihui Qi, Hong Huang, and Zhiming Ye; Resources, Maojin Yao and Zhongxing Wang; Data Curation, Xuefeng Shi, Kahei Chan, and Hong Huang; Writing – Original Draft, Xuefeng Shi and Zhihui Qi; Writing – Review & Editing, Hong Huang, Zhiming Ye, YuMin Wu, Kahei Chan, Maojin Yao, and Zhongxing Wang; Visualization, Xuefeng Shi and Hong Huang; Supervision, Maojin Yao and Zhongxing Wang.
References
1.
Cadwell CR, Bhaduri A, Mostajo-Radji MA, Keefe MG, Nowakowski TJ. Development and Arealization of the Cerebral Cortex. Neuron. 2019;103(6):980–1004. doi:10.1016/j.neuron.2019.07.009
2.
Burger PM, Mehl E, Cameron PL, et al. Synaptic vesicles immunoisolated from rat cerebral cortex contain high levels of glutamate. Neuron. 1989;3(6):715–720. doi:10.1016/0896-6273(89)90240-7
3.
Bear MF. A synaptic basis for memory storage in the cerebral cortex. Proc Natl Acad Sci U S A. 1996;93(24):13453–13459. doi:10.1073/pnas.93.24.13453
4.
Mitchell KJ. Variability in Neural Circuit Formation. Cold Spring Harb Perspect Biol. 2024;16(3):a041504. doi:10.1101/cshperspect.a041504
5.
Stuart T, Butler A, Hoffman P, et al. Comprehensive Integration of Single-Cell Data. Cell. 2019;177(7):1888–1902.e21. doi:10.1016/j.cell.2019.05.031
6.
Galvão IC, Lemoine M, Messias LA, et al. Multimodal single-cell profiling reveals neuronal vulnerability and pathological cell states in focal cortical dysplasia. iScience. 2024;27(12):111337. doi:10.1016/j.isci.2024.111337
7.
Kohara K, Okada M. Single-Cell Labeling Strategies to Dissect Neuronal Structures and Local Functions. Biology (Basel). 2023;12(2):321. doi:10.3390/biology12020321
8.
Smajić S, Prada-Medina CA, Landoulsi Z, et al. Single-cell sequencing of human midbrain reveals glial activation and a Parkinson-specific neuronal state. Brain. 2022;145(3):964–978. doi:10.1093/brain/awab446
9.
Korsunsky I, Millard N, Fan J, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;16(12):1289–1296. doi:10.1038/s41592-019-0619-0
10.
Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36(5):411–420. doi:10.1038/nbt.4096
11.
Jin S, Guerrero-Juarez CF, Zhang L, et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun. 2021;12(1):1088. doi:10.1038/s41467-021-21246-9
12.
Tran HTN, Ang KS, Chevrier M, et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 2020;21(1):12. doi:10.1186/s13059-019-1850-9
13.
Wang L, Li X, Xu C, et al. Unveiling novel cell clusters and biomarkers in glioblastoma and its peritumoral microenvironment at the single-cell perspective. J Transl Med. 2024;22(1):551. doi:10.1186/s12967-024-05313-5
14.
Hu Y, Wan S, Luo Y, et al. Benchmarking algorithms for single-cell multi-omics prediction and integration. Nat Methods. 2024;21(11):2182–2194. doi:10.1038/s41592-024-02429-w
15.
Stuart T, Srivastava A, Madad S, Lareau CA, Satija R. Single-cell chromatin state analysis with Signac. Nat Methods. 2021;18(11):1333–1341. doi:10.1038/s41592-021-01282-5
16.
Misra R, Ferrena A, Zheng D. Facilitate integrated analysis of single cell multiomic data by binarizing gene expression values. Nat Commun. 2025;16(1):5763. doi:10.1038/s41467-025-60899-8
17.
Ren T, Huang S, Liu Q, Wang G. scWECTA: A weighted ensemble classification framework for cell type assignment based on single cell transcriptome. Comput Biol Med. 2023;152:106409. doi:10.1016/j.compbiomed.2022.106409
18.
Granja JM, Corces MR, Pierce SE, et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat Genet. 2021;53(3):403–411. doi:10.1038/s41588-021-00790-6
19.
Pliner HA, Packer JS, McFaline-Figueroa JL, et al. Cicero Predicts cis-Regulatory DNA Interactions from Single-Cell Chromatin Accessibility Data. Mol Cell. 2018;71(5):858–871.e8. doi:10.1016/j.molcel.2018.06.044
20.
Morabito S, Reese F, Rahimzadeh N, Miyoshi E, Swarup V. hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data. Cell Rep Methods. 2023;3(6):100498. doi:10.1016/j.crmeth.2023.100498
21.
Andreatta M, Carmona SJ. UCell: Robust and scalable single-cell gene signature scoring. Comput Struct Biotechnol J. 2021;19:3796–3798. doi:10.1016/j.csbj.2021.06.043
22.
Dennis DJ, Wang BS, Karamboulas K, Kaplan DR, Miller FD. Single-cell approaches define two groups of mammalian oligodendrocyte precursor cells and their evolution over developmental time. Stem Cell Reports. 2024;19(5):654–672. doi:10.1016/j.stemcr.2024.03.002
23.
Zhan R, Meng X, Tian D, et al. NAD + rescues aging-induced blood-brain barrier damage via the CX43-PARP1 axis. Neuron. 2023;111(22):3634–3649.e7. doi:10.1016/j.neuron.2023.08.010
24.
Jha PK, Valekunja UK, Ray S, Nollet M, Reddy AB. Single-cell transcriptomics and cell-specific proteomics reveals molecular signatures of sleep. Commun Biol. 2022;5(1):846. doi:10.1038/s42003-022-03800-3
25.
Currey L, Mitchell B, Al-Khalily M, et al. Polycomb repressive complex 2 is critical for mouse cortical glutamatergic neuron development. Cereb Cortex. 2024;34(7):bhae268. doi:10.1093/cercor/bhae268
26.
Yim KM, Baumgartner M, Krenzer M, et al. Cell type-specific dysregulation of gene expression due to Chd8 haploinsufficiency during mouse cortical development. bioRxiv. Published online August 15, 2024:2024.08.14.608000. doi:10.1101/2024.08.14.608000
27.
Zeppilli S, Gurrola AO, Demetci P, et al. Single-cell genomics of the mouse olfactory cortex reveals contrasts with neocortex and ancestral signatures of cell type evolution. Nat Neurosci. 2025;28(5):937–948. doi:10.1038/s41593-025-01924-3
28.
Pfau SJ, Langen UH, Fisher TM, et al. Characteristics of blood-brain barrier heterogeneity between brain regions revealed by profiling vascular and perivascular cells. Nat Neurosci. 2024;27(10):1892–1903. doi:10.1038/s41593-024-01743-y
29.
Hu P, Fabyanic E, Kwon DY, Tang S, Zhou Z, Wu H. Dissecting Cell-Type Composition and Activity-Dependent Transcriptional State in Mammalian Brains by Massively Parallel Single-Nucleus RNA-Seq. Mol Cell. 2017;68(5):1006–1015.e7. doi:10.1016/j.molcel.2017.11.017
30.
Chen S, Lake BB, Zhang K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat Biotechnol. 2019;37(12):1452–1457. doi:10.1038/s41587-019-0290-0
31.
Khrameeva E, Kurochkin I, Han D, et al. Single-cell-resolution transcriptome map of human, chimpanzee, bonobo, and macaque brains. Genome Res. 2020;30(5):776–789. doi:10.1101/gr.256958.119
32.
O’Connor SA, Garcia L, Hoover R, et al. Classifying cell cycle states and a quiescent-like G0 state using single-cell transcriptomics. bioRxiv. Published online January 15, 2025:2024.04.16.589816. doi:10.1101/2024.04.16.589816
33.
Cimadamore-Werthein C, Jaiquel Baron S, King MS, Springett R, Kunji ER. Human mitochondrial ADP/ATP carrier SLC25A4 operates with a ping-pong kinetic mechanism. EMBO Rep. 2023;24(8):e57127. doi:10.15252/embr.202357127
34.
Rinaldi B, Bayat A, Zachariassen LG, et al. Gain-of-function and loss-of-function variants in GRIA3 lead to distinct neurodevelopmental phenotypes. Brain. 2024;147(5):1837–1855. doi:10.1093/brain/awad403
35.
Fanlo L, Gómez-González S, Rozalén C, et al. Neural crest-related NXPH1/α-NRXN signaling opposes neuroblastoma malignancy by inhibiting organotropic metastasis. Oncogene. 2023;42(28):2218–2233. doi:10.1038/s41388-023-02742-2
36.
Song Y, Li H, Li Y, et al. Astrocyte-derived PTN alleviates deficits in hippocampal neurogenesis and cognition in models of multiple sclerosis. Stem Cell Reports. 2025;20(1):102383. doi:10.1016/j.stemcr.2024.11.013
37.
Läubli H, Borsig L. Altered Cell Adhesion and Glycosylation Promote Cancer Immune Suppression and Metastasis. Front Immunol. 2019;10:2120. doi:10.3389/fimmu.2019.02120
38.
Hao Y, Stuart T, Kowalski MH, et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol. 2024;42(2):293–304. doi:10.1038/s41587-023-01767-y
39.
Trevino AE, Müller F, Andersen J, et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell. 2021;184(19):5053–5069.e23. doi:10.1016/j.cell.2021.07.039
40.
Wang N, Yin J, You N, et al. TWIST1 preserves hematopoietic stem cell function via the CACNA1B/Ca2+/mitochondria axis. Blood. 2021;137(21):2907–2919. doi:10.1182/blood.2020007489
41.
Dias CM, Punetha J, Zheng C, et al. Homozygous Missense Variants in NTNG2, Encoding a Presynaptic Netrin-G2 Adhesion Protein, Lead to a Distinct Neurodevelopmental Disorder. Am J Hum Genet. 2019;105(5):1048–1056. doi:10.1016/j.ajhg.2019.09.025
42.
Yu T, Tian X, Li D, et al. Transcriptome, proteome and metabolome analysis provide insights on fat deposition and meat quality in pig. Food Res Int. 2023;166:112550. doi:10.1016/j.foodres.2023.112550
43.
Mosti F, Kawasaki H, Babbit C, et al. Shaping the Neocortex: Radial Glia and Astrocytes in Development and Evolution. J Neurosci. 2025;45(46):e1301252025. doi:10.1523/JNEUROSCI.1301-25.2025
44.
Feng S, Wu C, Zou P, et al. High-intensity interval training ameliorates Alzheimer’s disease-like pathology by regulating astrocyte phenotype-associated AQP4 polarization. Theranostics. 2023;13(10):3434–3450. doi:10.7150/thno.81951
45.
Manco G, Porzio E, Carusone TM. Human Paraoxonase-2 (PON2): Protein Functions and Modulation. Antioxidants (Basel). 2021;10(2):256. doi:10.3390/antiox10020256
46.
Serrano-Pozo A, Das S, Hyman BT. APOE and Alzheimer’s disease: advances in genetics, pathophysiology, and therapeutic approaches. Lancet Neurol. 2021;20(1):68–80. doi:10.1016/S1474-4422(20)30412-9
47.
Chowanadisai W, Graham DM, Keen CL, Rucker RB, Messerli MA. Neurulation and neurite extension require the zinc transporter ZIP12 (slc39a12). Proc Natl Acad Sci U S A. 2013;110(24):9903–9908. doi:10.1073/pnas.1222142110
48.
Matsumoto Y, Kaneko K, Takahashi T, et al. Diagnostic implications of MOG-IgG detection in sera and cerebrospinal fluids. Brain. 2023;146(9):3938–3948. doi:10.1093/brain/awad122
49.
Luo YL, Li Y, Zhou W, Wang SY, Liu YQ. Inhibition of LPA-LPAR1 and VEGF-VEGFR2 Signaling in IPF Treatment. Drug Des Devel Ther. 2023;17:2679–2690. doi:10.2147/DDDT.S415453
Abstract
As the central functional hub of the central nervous system, the cerebral cortex has long been a major frontier in neuroscience research. With the increasing maturity and widespread application of single-cell RNA sequencing technologies, multiple studies leveraging this technology have been conducted to systematically decipher the complexity and diversity of cortical cellular composition. However, substantial variations in sequencing platforms, cohort sizes, and sequencing depth have impeded deeper investigation into the functions of cortical cells and their underlying molecular mechanisms. Here we present a comprehensive cortical transcriptome dataset integrating multimodal data from 9 mouse datasets, encompassing 10x and Drop-seq (single-cell/nucleus) profiling. Following rigorous quality control, we systematically analyzed 173,081 high-quality cells, providing a comprehensive characterization of cellular composition, intercellular communication networks, chromatin accessibility, and functional properties. To evaluate the cross-species relevance of our findings, we performed comparative analyses with single-cell datasets of mixed cortical tissues from humans, chimpanzees, bonobos, and macaques (n = 29,353). This integrated resource provides a foundational reference for cortical transcriptomes and a standardized framework for cross-platform integration.
Total words in MS: 3608
Total words in Title: 8
Total words in Abstract: 173
Total Keyword count: 0
Total Images in MS: 6
Total Tables in MS: 0
Total Reference count: 49