Running title: CYP1B1 Biomarker/Target in COPD
A
Lingfeng
Liu
1
Mingjun
Jiang
2
Muzi
Lei
3
Kun
Du
4
Dafei
Wei
5
Xiangyang
Ye
6
Xiaowu
Tan
1
Zhuang
Mi
1
Yao
Tang
7
Ruiting
Sun
8✉
EmailDr.srt@hotmail.com
Sha
Liu
1✉
Email2018020001@usc.edu.cn
R.
S.
1
1
The Second Affiliated Hospital, Department of Pulmonary and Critical Care Medicine
University of South China
421001
Hengyang
China
2
Vascular Disease Center, Shanghai Fourth People’s Hospital, School of Medicine
Tongji University
200092
Shanghai
China
3
The Second Affiliated Hospital, Department of Intensive Care Medicine
University of South China
421001
Hengyang
China
4
MOE Key Laboratory of Rare Pediatric Diseases, College of Basic Medical Sciences, Hengyang Medical School
University of South China
421009
Hengyang
China
5
The Second Affiliated Hospital, Department of Children’s Medical Center
University of South China
421001
Hengyang
China
6
Department of Infectious Diseases
Xiangya Hospital of Central South University
410083
Changsha
China
7
The Second Affiliated Hospital, Department of Cardiovascular Medicine
University of South China
421001
Hengyang
China
8
State Key Laboratory of Respiratory Diseases, National Clinical Research Center for Respiratory Diseases, Guangzhou Institute of Respiratory Health
the First Affiliated Hospital of Guangzhou Medical University
511495
Guangzhou
China
Lingfeng Liu1, †, Mingjun Jiang2, †, Muzi Lei3, Kun Du4, Dafei Wei5, Xiangyang Ye6, Xiaowu Tan1, Zhuang Mi1, Yao Tang7, Ruiting Sun8, *, Sha Liu1, *
1The Second Affiliated Hospital, Department of Pulmonary and Critical Care Medicine, University of South China, Hengyang 421001, China
2Vascular Disease Center, Shanghai Fourth People’s Hospital, School of Medicine, Tongji University, Shanghai 200092, China
3The Second Affiliated Hospital, Department of Intensive Care Medicine, University of South China, Hengyang 421001, China
4MOE Key Laboratory of Rare Pediatric Diseases, College of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang 421009, China
5The Second Affiliated Hospital, Department of Children's Medical Center, University of South China, Hengyang 421001, China
6Department of Infectious Diseases, Xiangya Hospital of Central South University, Changsha 410083, China.
7The Second Affiliated Hospital, Department of Cardiovascular Medicine, University of South China, Hengyang 421001, China
8State Key Laboratory of Respiratory Diseases, National Clinical Research Center for Respiratory Diseases, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou 511495, China
*Correspondence address: Email: 2018020001@usc.edu.cn (S. L.); Dr.srt@hotmail.com (R. S.).
Lingfeng Liu and Mingjun Jiang contributed equally to this work.
A
Highlights
1. Multi-omics integration identifies CYP1B1 as a potential COPD diagnostic biomarker, with external validation.
2. CYP1B1 is upregulated in airway secretory cells and is linked to the activation of ferroptosis.
3. ML and ANN demonstrate strong predictive performance (AUC up to 0.996 in training; 0.834 in external validation).
4. Mouse and cell models demonstrate increased CYP1B1 expression in the context of inflammation and fibrosis; proteomics corroborate these findings.
5. Docking highlights α/β-naphthoflavone, chrysin, and naringenin as CYP1B1 binders ( ≈ − 7.11 to − 5.45 kcal/mol).
Structured Abstract
Objective Primary: discover and validate biomarkers/pathways underlying COPD via bulk transcriptomics, single-cell analyses, machine learning (ML), and experimental validation. Secondary: evaluate CYP1B1 as a diagnostic biomarker and explore druggability through molecular docking.
Keywords
COPD
CYP1B1
ferroptosis
multi-omics
single-cell RNA sequencing
biomarker discovery
A
Introduction
Chronic obstructive pulmonary disease (COPD) remains a leading global cause of morbidity and mortality [1]. Clinically, COPD is defined by persistent respiratory symptoms and progressive airflow limitation, quantified by spirometry; however, current diagnostic criteria incompletely capture the disease heterogeneity, which spans complex structural remodeling of the lung, diverse inflammatory endotypes, and shifts in the airway microbiome [2, 3]. Against this backdrop, there is a sustained need for mechanism-anchored biomarkers and actionable targets. Increasing evidence implicates oxidative stress and lipid peroxidation as central features of chronic airway injury, with ferroptosis—an iron-dependent, lipid peroxidation-driven cell-death program—emerging as a plausible contributor to epithelial dysfunction and tissue remodeling in COPD.
Airflow obstruction in COPD typically progresses over several years and reflects a fixed pathobiology, including fibrotic narrowing of the small airways and loss of elastic recoil, leading to peripheral airway collapse and air trapping [4]. The small airway epithelium is the principal locus of cellular and histologic change [5], comprising a pseudostratified layer of secretory (club) cells, ciliated cells, and basal cells [6]. Inflammatory remodeling features increased neutrophils and macrophages in the airway lumina, as well as the accumulation of macrophages, T lymphocytes, and B lymphocytes in the airway wall and parenchyma [7, 8]. These observations place epithelial injury and immune–epithelial crosstalk at the center of COPD progression, motivating an interrogation of epithelial cell-intrinsic death pathways that may amplify chronic inflammation.
Genomic and transcriptomic studies suggest that interindividual susceptibility to COPD is related to gene expression programs within the airway epithelium [9], yet the upstream regulators that couple xenobiotic/redox metabolism to ferroptosis remain poorly understood. CYP1B1, a cytochrome P450 implicated in xenobiotic biotransformation and oxidative metabolism, is a biologically plausible node at the interface of redox stress and ferroptosis in airway secretory cells, where lipid peroxidation may promote persistent epithelial dysfunction. However, causal links between CYP1B1 activity, ferroptosis signaling, and COPD pathogenesis are not established, underscoring the need for systems-level discovery integrated with cellular and in vivo validation.
To address this gap, we employed an integrative multi-omics strategy combining bulk transcriptomics with weighted gene co-expression network analysis (WGCNA), differential expression, and pathway enrichment (GO/KEGG/GSEA), followed by a machine-learning framework (12 algorithms across 113 combinations) and an artificial neural network (ANN) to derive and test diagnostic models (ROC/AUC). Immune deconvolution (CIBERSORT) characterized the inflammatory context, while single-cell RNA sequencing mapped candidate gene expression to specific lung cell types. Experimental validation in cigarette–smoke–exposed models and orthogonal proteomics supported the biological plausibility, and molecular docking explored the therapeutic tractability. We hypothesized that CYP1B1 marks—and may contribute to—ferroptosis-linked epithelial pathology in COPD, would generalize across multi-omics layers, demonstrate diagnostic utility, and reveal druggable chemistry space. We aimed to identify robust COPD biomarkers/pathways, localize them to disease-relevant cell types, link them to ferroptosis signatures, validate them in models, and nominate candidate CYP1B1-targeting ligands for future experimental testing.
Materials and Methods
Data sources and preprocessing (bulk expression, single-cell, proteomics)
We integrated bulk gene-expression datasets, single-cell RNA sequencing (scRNA-seq) data, and discovery proteomics. Bulk datasets were retrieved from GEO and included GSE47460 (platforms GPL6480 and GPL14550; healthy controls n = 17 and COPD n = 75 for GPL6480; healthy controls n = 91 and COPD n = 145 for GPL14550) [10], with GSE76925 (COPD n = 111, controls n = 40) [11] and GSE37768 (COPD n = 18, controls n = 9) serving as independent validation cohorts. Probe-level preprocessing removed empty probes and those mapping to multiple genes; for genes with multiple probes, the probe with the highest mean expression was retained, and when multiple probes per gene were required, median aggregation was used. Datasets were merged after re-annotation to gene symbols using the official platform annotation files and were batch-corrected and normalized using ComBat (R package sva) [12]. Subsequent analyses used these normalized matrices. scRNA-seq data were obtained from GSE196341 and GSE135893 (12 COPD and 12 control donors in total) [13]. Discovery proteomics were deposited to ProteomeXchange (iProX partner) under PXD068247 [14, 15]. A summary of datasets is provided in Table 1.
Table 1
Datasets included in the study
|
Datasets ID
|
Year
|
Date type
|
Country
|
N (samples)
|
|
GSE47460
|
2013
|
Bulk RNA-Seq
|
American
|
Case = 220, Control = 108
|
|
GSE76925
|
2017
|
Bulk RNA-Seq
|
American
|
Case = 111, Control = 40
|
|
GSE37768
|
2016
|
Bulk RNA-Seq
|
Spain
|
Case = 18, Control = 9
|
|
GSE196341
|
2022
|
scRNA-Seq
|
American
|
Case = 12
|
|
GSE135893
|
2019
|
scRNA-Seq
|
American
|
Control = 12
|
|
PXD068247
|
2025
|
Proteomics
|
China
|
Case = 3, Control = 3
|
Differentially expressed genes (DEGs)
Differential expression between COPD and control groups was assessed using limma, with model terms including group and batch/platform, where applicable. Unless otherwise indicated, significance is used |log2FC| > 1.5 and p < 0.01 after multiple-testing correction. Results were visualized with volcano plots and clustered heatmaps (ggplot2, pheatmap) [16–18].
Weighted gene co-expression network analysis (WGCNA)
Co-expression networks were constructed using WGCNA. Signed networks were generated from pairwise Pearson correlations, selecting the soft-thresholding power β to approximate scale-free topology (reporting scale-free fit R2 and mean connectivity). Adjacency matrices were transformed to topological overlap matrices (TOM), and hierarchical clustering with dynamic tree cutting identified gene modules (report minimum module size and merge thresholds). Module eigengenes were correlated with COPD status, and the midnight blue module, showing the strongest association with COPD, was selected for downstream analysis [19].
Machine-learning feature selection and classification
Batch-corrected, z-score–standardized expression matrices were used to derive features and train classifiers. We implemented a multi-algorithm screen comprising 12 algorithms across 113 combinations (e.g., Logistic Regression, LASSO/Elastic Net, Ridge, SVM-Linear/RBF, Random Forest, ExtraTrees, GBM, XGBoost, k-NN, Naïve Bayes), with feature selection embedded within folds to prevent information leakage. Model selection employed stratified k-fold cross-validation on the training data, followed by external validation on independent cohorts (GSE76925, GSE37768). Performance metrics included ROC-AUC with 95% CIs, PR-AUC, accuracy, F1, and calibration (Brier score, calibration curves). Feature importance was summarized via permutation importance and SHAP, where applicable. The highest mean external AUC chose the optimal model.
Gene set enrichment analysis (GSEA) and functional annotation
To contextualize biological processes, we performed GSEA using MSigDB KEGG (c2.cp.kegg.Hs.symbols) and GO (c5.go.symbols) collections, and complementary over-representation analyses with clusterProfiler; visualizations employed enrichplot. Significance thresholds were based on field standards for GSEA and ORA, as specified in the Results [20].
Single-cell RNA-seq analysis
A
scRNA-seq data (GSE196341, GSE135893) were processed in Seurat. Cells were filtered by mitochondrial RNA > 15%, ribosomal RNA > 5%, erythrocyte RNA > 1%, detected genes < 300 or > 7,000. Data were normalized (method as specified in the Results), and batch effects were mitigated via Harmony integration. Highly variable genes (n = 2,000) were used for PCA and UMAP (dimensions = 22). Clustering utilized a shared nearest neighbor graph (resolution 0.2) and was manually annotated using the top 10 marker genes per cluster. We visualized donor-wise UMAPs and cell-type distributions, and mapped CYP1B1 expression to airway secretory cells. Pseudotime analysis was performed using Monocle, and intercellular communication was inferred using CellChat [
13].
Artificial neural network (ANN)
A feed-forward ANN was trained to classify Control vs COPD using binarized signatures derived from characteristic genes (upregulated genes coded 1 if ≥ median; downregulated genes reverse-coded). The network consisted of an input layer (size equal to the number of features), a single hidden layer (5 neurons, using the sigmoid activation function), and a binary output. Training was performed using a neural network with Rprop + optimization (seed = 12345678). Performance was evaluated via confusion matrices, accuracy per class, and ROC-AUC; feature contributions were estimated using the Garson algorithm.
Immune cell deconvolution
From normalized bulk expression, immune cell proportions were estimated using CIBERSORT (LM22), with default permutations (state number) and quantile normalization settings appropriate for the tissue type. Samples with CIBERSORT p < 0.05 were retained. Group differences were tested using the Wilcoxon rank-sum test with the Benjamini–Hochberg correction across 22 cell types. Spearman correlations between characteristic genes and immune subsets were computed with multiple-testing correction.
A
Cigarette smoke extract (CSE) preparation
Mainstream smoke from Hongta brand filter-tipped cigarettes (Yunnan, China) was drawn by a peristaltic pump and bubbled through 10 mL DMEM per cigarette. The pH was adjusted to 7.2, and the solution was sterilized using a 0.22-µm filter to yield 10% CSE (v/v); working concentrations were prepared freshly for each experiment, for particulate matter (PM2.5), stock suspensions were prepared in DMSO at 400 mg/mL, vortexed and sonicated for 2 h, centrifuged to remove insoluble particulates, and supernatants stored at − 20°C[21].
Cell culture and treatments
16HBE cells were cultured in DMEM + 10% FBS at 37°C, 5% CO₂. Cells were exposed to 2.5% CSE for 24 h. Unless otherwise stated, vehicle-treated cells served as controls[21].
Mouse COPD model
A
Specific pathogen-free C57BL/6J mice (15–20 g) were housed at 25°C, 12/12 h light–dark, with food and water ad libitum. The institutional ethics committee approved protocols. Mice were randomized into two groups: control (n = 8) and COPD model (n = 8). COPD was induced by whole-body exposure in a 60×57×100 cm chamber to 9 cigarettes twice daily, 6 days/week, for 6 months; each session lasted ~ 2 h with ≥ 4 h interval. Control mice were exposed to room air for the same duration. Before exposure, mice received sterile saline aerosol (30 minutes, twice daily). Tissues (lung), bronchoalveolar lavage fluid (BALF), and blood were collected at endpoint
[22].
Lung function testing
Lung mechanics were evaluated using a FinePointe system (Buxco) according to the manufacturer’s instructions, recording airway resistance/compliance indices[23].
Enzyme-linked immunosorbent assays (ELISAs)
Concentrations of IL-1β, IL-6, IL-8, and TNF-α in BALF and serum were quantified per the manufacturers’ protocols.
Histology and staining
The lungs were fixed in 4% paraformaldehyde, processed, embedded in paraffin, sectioned, and stained with H&E, PAS, and Picrosirius Red using standard protocols.
Immunohistochemistry (IHC)
IHC was performed on paraffin-embedded lung sections using standard antigen retrieval and detection procedures.
Molecular docking
To explore therapeutic tractability, we performed molecular docking against CYP1B1. Three-dimensional structures for ligands were retrieved (PubChem) and prepared in AutoDockTools 1.5.7; receptor structure (CYP1B1) was obtained from RCSB PDB (
www.rcsb.org) and prepared analogously. The grid center and box size were defined around the active site, and docking was performed using the Lamarckian Genetic Algorithm under default parameters. Rigid docking was performed (AutoDockTools 1.6.7), and the top-scoring pose was retained.
Statistical analysis
Analyses were performed in R 4.4.2. Unless otherwise stated, two-sided tests were used, with α = 0.05. For multiple comparisons, the Benjamini–Hochberg correction was applied. Classification performance was summarized by ROC-AUC (95% CI), PR-AUC, accuracy, F1 score, confusion matrices, and calibration curves. Data are reported as mean ± SD (or median [IQR]) with n indicating biological replicates.
Results
Multi-omics discovery and hub-gene prioritization
A study overview is provided in Fig. 1. Using WGCNA on the training cohort, we selected a soft-threshold power of β = 5 (Fig. 2A) and identified 20 gene modules through dynamic tree cutting (Fig. 2B). The midnightblue module showed the strongest association with COPD status (module–trait r = 0.893, P = 5.4×10− 32; Fig. 2C–D) and contained 89 genes. After normalization of the GSE47460 expression matrix, differential expression analysis identified 170 DEGs between COPD and control samples (volcano plot, Fig. 2E; top 50 heat map, Fig. 2F). Intersecting module genes with DEGs yielded 24 hub genes (Fig. 2G), which were advanced for biomarker screening.
Machine-learning screen and diagnostic performance
We evaluated 12 algorithms across 113 model combinations to prioritize diagnostic features (Fig. 3A). In the training set, the combined model achieved an AUC of 0.996 (95% CI, 0.991–0.999; Fig. 3B). External validation in GSE76925 yielded an AUC of 0.834 (95% CI, 0.755–0.906; Fig. 3C), indicating preserved discrimination in an independent cohort. Confusion matrices for the training and validation sets demonstrated high sensitivity and specificity (Figs. 3D and 3E). Across models, BHLHE22, DPP6, DHRS9, and CYP1B1 were most frequently selected and ranked highly in terms of importance. Correlation analyses revealed strong positive co-variation between BHLHE22 and CYP1B1 (r = 0.62) and an inverse correlation between DPP6 and CYP1B1 (r = -0.55; Fig. 3G). Differential expression visualizations confirmed that three genes were upregulated in COPD, and DPP6 was downregulated (Fig. 3F).
External validation and pathway context for top biomarkers
Per-gene ROC curves in the independent GSE37768 cohort showed that BHLHE22, DPP6, and CYP1B1 achieved AUCs of 0.764–0.795, whereas DHRS9 performed lower (Fig. 4A). A nomogram integrating these predictors emphasized CYP1B1 as the strongest contributor to diagnostic probability (Fig. 4B–C). To contextualize biology, GSEA revealed that BHLHE22-high samples were enriched for collagen fibril organization and extracellular matrix remodeling (GO) and for chemokine signaling, complement/coagulation, cytokine–receptor interaction, and NK cell–mediated cytotoxicity (KEGG) (Fig. 4D–E). Similarly, CYP1B1-high samples showed GO enrichment for collagen processes, response to interleukin-1, and metallopeptidase activity, alongside KEGG enrichment for chemokine signaling, complement/coagulation, intestinal immune network for IgA, and NK cell–mediated cytotoxicity (Fig. 4F–G).
ANN construction, calibration, and independent checks
We built an ANN using BHLHE22, DPP6, and CYP1B1 (architecture and weights, Fig. 5A). The model demonstrated close agreement between the predicted and observed risks in both the training and validation datasets (calibration plots, Fig. 5B). In GSE37768, all three biomarkers were significantly upregulated in COPD versus controls, consistent with discovery analyses (Fig. 5C–D). Collectively, the ANN achieved robust discrimination with good calibration, supporting the translational feasibility of a parsimonious gene-based classifier.
Single-cell resolution of CYP1B1 and ferroptosis signatures
Integration of human lung scRNA-seq from COPD and control donors resolved 16 major cell types (marker panels in Fig. 6A; UMAP in Fig. 6B), with proportion differences summarized in Fig. 6C. While BHLHE22, DPP6, and CYP1B1 were detected across multiple lineages, CYP1B1 showed significantly higher expression in airway secretory cells (ASCs) from COPD relative to controls (Fig. 6D; report log₂FC, adj. P). Stratifying ASCs by CYP1B1 median defined high- and low-expressing subpopulations; differential analysis indicated ferroptosis pathway enrichment in CYP1B1-high ASCs (KEGG) with increased AUCell ferroptosis scores (Fig. 6E–F).
Pseudotime and cell–cell communication implicate ASC programs in the pathogenesis of COPD.
Monocle-based trajectories suggested bifurcating epithelial lineages, with ASCs enriched at a terminal branch (Fig. 7A–B), consistent with a remodeled secretory program in COPD. Global CellChat networks (Figs. 7C–D) revealed increased outgoing and incoming signaling to/from ASCs, particularly with alveolar macrophages and other immune lineages. An ASC-centric view (Fig. 7E) highlighted ASCs as communication hubs, supporting a model in which secretory epithelial programs shape the COPD inflammatory microenvironment.
Immune infiltration correlates with hub genes.
Bulk deconvolution revealed significant differences in multiple immune subsets between COPD and control subjects, including plasma cells, M0 macrophages, activated mast cells, and follicular helper T cells (Fig. 8A), with inter-subset correlations summarized in Fig. 8B. Expression of BHLHE22 and CYP1B1 correlated positively with plasma cells and M0 macrophages, whereas DPP6 correlated negatively with these subsets and positively with M2 macrophages (Fig. 8C–E), suggesting coordinated epithelial–immune remodeling in COPD.
In vivo and in vitro evidence of CYP1B1 upregulation with smoke exposure
In a 6-month cigarette-smoke mouse model, systemic and airway inflammation increased over time, as indicated by ELISAs (Fig. 9A–H), accompanied by higher pulmonary CYP1B1 expression (Fig. 9I). In 16HBE cells exposed to CSE, proteomic profiling corroborated CYP1B1 upregulation (Fig. 9J). Histopathology in smoke-exposed mice revealed enlarged alveolar septa, increased glycogen deposition, and fibrosis (H&E, PAS, Picrosirius Red; Fig. 10A–C), accompanied by a significant impairment of lung function (Fig. 10D–K). Ferroptosis-related indicators also increased significantly (Fig. 10L-N). These findings support a link between cigarette smoke exposure, epithelial remodeling, and CYP1B1 induction.
Inhibition of CYP1B1 attenuates ferroptosis signals
In the CSE model constructed by cigarette smoke-induced 16-HBE cells, lentiviral transfection was used to knock down the expression of CYP1B1, and we found that ferroptosis signal-related indicators such as GPX4, MDA, and 4-HNE had a significant decline (Fig. 11A-C). This suggests that inhibition of CYP1B1 expression can inhibit ferroptosis in airway epithelial cells, which may be helpful in the treatment of COPD.
Docking nominates natural compounds with predicted CYP1B1 binding.
DGIdb-guided filtering of traditional Chinese medicine (TCM) compounds prioritized α-naphthoflavone, β-naphthoflavone, chrysin, and naringenin as putative CYP1B1 binders. Rigid docking (PyMOL/AutoDock workflow) predicted binding energies < − 5.0 kcal/mol (− 7.11, − 6.86, − 5.56, − 5.45 kcal/mol; Fig. 12A–D), consistent with favorable interactions in the CYP1B1 active site. These in silico data are hypothesis-generating and motivate biochemical validation of enzyme inhibition and selectivity.
Discussion
COPD remains a leading global contributor to morbidity and mortality [24]. While the disease is heterogeneous [25], its clinical definition centers on persistent airflow limitation and chronic respiratory symptoms. To enhance diagnostic precision and mechanistic understanding, we employed an integrative multi-omics pipeline to identify and validate biomarkers with potential clinical utility. Our analyses identify CYP1B1 as a promising biomarker and mechanistic node in COPD, with convergent evidence spanning bulk transcriptomics, machine learning, single-cell mapping, and experimental models.
Using publicly available COPD cohorts, we defined 170 DEGs, mapped them onto co-expression modules, and intersected signals to yield 24 hub genes. A 12-algorithm/113-combination screen prioritized BHLHE22, CYP1B1, and DPP6, with external validation supporting diagnostic discrimination (combined model AUC up to 0.996 in training and 0.834 in GSE76925; single-gene AUCs 0.764–0.795 in GSE37768). Single-cell analyses localized CYP1B1 upregulation to airway secretory cells (ASCs) in COPD, where high CYP1B1 expression coincided with enrichment of ferroptosis signatures. In smoke-exposed mouse lungs and CSE-treated bronchial epithelial cells, CYP1B1 expression increased alongside histologic remodeling and impaired lung function. Collectively, these observations support an association between CYP1B1 upregulation, epithelial stress programs consistent with ferroptosis, and the pathobiology of COPD, while acknowledging that causality requires targeted perturbation.
CYP1B1, a xenobiotic-metabolizing cytochrome P450, bioactivates environmental toxicants (including polycyclic aromatic hydrocarbons) and modulates steroid and eicosanoid pathways [26, 27]. In the lung, CYP1B1 activity has been implicated in oxidative stress and inflammatory signaling pertinent to chronic airway disease [27, 28]. Altered CYP1B1 expression has been linked to epithelial dysfunction—airway remodeling, impaired mucociliary clearance, and heightened susceptibility to inhaled insults—features that align with COPD pathogenesis [29]. Moreover, CYP1B1 polymorphisms have been associated with interindividual variation in lung function decline and COPD risk [30]. Integrating these observations, our data position CYP1B1 as a biologically plausible mediator of smoke-related redox imbalance and epithelial remodeling in COPD, and a tractable node for therapeutic exploration [31, 32].
Primary immunodeficiencies (PIDs) encompass defects in cellular and humoral immunity that predispose to recurrent airway infections and chronic inflammation. More than half of PID subtypes impair antibody production and contribute to recurrent sinusitis and pulmonary infections; persistent upper/lower respiratory infections can drive airway inflammation, obstruction, and, in some cases, structural remodeling compatible with COPD phenotypes [33]. Respiratory complications of PIDs are associated with downstream risks—including severe asthma, bronchiectasis, and COPD—and increased mortality [34]. In the context of our findings, these observations highlight how host-defense perturbations can exacerbate epithelial injury and immune–epithelial crosstalk in susceptible individuals.
Single-cell results indicated that differentially expressed genes were enriched in epithelial compartments, with ASCs showing the clearest CYP1B1 signal. Epithelial–macrophage crosstalk is central to airway remodeling, as alveolar and airway macrophages orchestrate pathogen clearance, surfactant turnover, tissue repair, and homeostasis. In COPD, they exhibit impaired phagocytosis/efferocytosis, dysregulated cytokine release, and heightened oxidative stress [35]. Beyond the lung, macrophage inflammatory phenotypes and lysosomal signaling influence disease processes across organs [36, 37]; by analogy, macrophage dysfunction in COPD may intersect with epithelial redox pathways, including lipid peroxidation and iron handling. Our data suggest that CYP1B1-high ASCs reside within an inflamed network where macrophage interactions are increased, consistent with a feed-forward loop of epithelial stress and immune activation.
Limitation
This study has limitations. First, although we leveraged multiple GEO cohorts, sample availability limits power for some subgroup analyses. Second, the external validation cohorts differ in clinical covariates and platforms, which may affect generalizability. Third, while single-cell analyses map CYP1B1 to airway secretory cells and associate high expression with ferroptosis signatures, we did not perform genetic or pharmacologic perturbation of CYP1B1 with standardized ferroptosis rescue controls (e.g., ferrostatin-1, liproxstatin-1, deferoxamine). Finally, docking results are in silico and require biochemical confirmation (enzyme inhibition, selectivity, and cellular target engagement).
Conclusion
In aggregate, our integrative analyses identify CYP1B1 as a potential biomarker for COPD and a putative mediator of epithelial stress, consistent with ferroptosis in airway secretory cells. Clinically, CYP1B1 expression and derived gene-based models may aid diagnostic stratification; mechanistically, CYP1B1 presents a druggable entry point for modulating redox-lipid peroxidation pathways. Prioritized natural compounds from docking provide testable leads, but their translation will require validating target engagement, ferroptosis-specific rescue, and safety/PK. Considerations. Prospective, multi-center studies integrating single-cell profiling, circulating biomarkers, and interventional perturbations should establish clinical utility and clarify therapeutic potential.
List of abbreviations
COPD
Chronic Obstructive Pulmonary Disease
DEGs
Differentially Expressed Genes
WGCNA
Weighted Gene Co-Expression Network Analysis
ROC
Receiver Operating Characteristic
GSEA
Gene Set Enrichment Analysis
KEGG
Kyoto Encyclopedia of Genes and Genomes
ANN
Artificial Neural Network
UMAP
Uniform Manifold Approximation and Projection
AT2s
Alveolar Type II epithelial cells
PMVECs
Pulmonary microvascular endothelial cells
LECs
Lymphatic endothelial cells
TCM
Traditional Chinese medicine
A
Data Availability
All data generated or analysed during this study are included in this published article.
A
Author Contribution
LL and MJ: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Software, Visualization, Writing– original draft. ML and KD: Data curation, Investigation, Methodology, Software, Visualization. DW, XY and XT: Data curation, Methodology, Software, Visualization. ZM, YT and RS: Data curation, Software, Visualization. SL: Conceptualization, Funding acquisition, Investigation, Project administration, Resources, Supervision, Writing– review & editing.
Acknowledgments
We thank the participants and staff of the National Center for Biotechnology Information. We thank Wen Qianmei and Zuo yujie form the First Affiliated Hospital of Guangzhou Medical University for their assistance in the construction of the mouse model. At the same time, we would also like to acknowledge assistance from proofreaders and editors.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Electronic Supplementary Material
Below is the link to the electronic supplementary material
References
1.Christenson SA, Smith BM, Bafadhel M, Putcha N. Chronic obstructive pulmonary disease. Lancet. 2022;399(10342):2227–42.
2.Labaki WW, Rosenberg SR. Chronic Obstructive Pulmonary Disease. Ann Intern Med. 2020;173(3):Itc17–32.
3.Brightling C, Greening N. Airway inflammation in COPD: progress to precision medicine. Eur Respir J, 2019. 54(2).
4.Barnes PJ. Cellular and molecular mechanisms of asthma and COPD. Clin Sci (Lond). 2017;131(13):1541–58.
5.Wistuba II, Mao L, Gazdar AF. Smoking molecular damage in bronchial epithelium. Oncogene. 2002;21(48):7298–306.
6.Mercer RR, Russell ML, Roggli VL, Crapo JD. Cell number and distribution in human and rat airways. Am J Respir Cell Mol Biol. 1994;10(6):613–24.
7.Brusselle GG, Joos GF, Bracke KR. New insights into the immunology of chronic obstructive pulmonary disease. Lancet. 2011;378(9795):1015–26.
8.Hogg JC, Chu F, Utokaparch S, et al. The nature of small-airway obstruction in chronic obstructive pulmonary disease. N Engl J Med. 2004;350(26):2645–53.
9.Zhao J, Cheng W, He X, et al. Chronic Obstructive Pulmonary Disease Molecular Subtyping and Pathway Deviation-Based Candidate Gene Identification. Cell J. 2018;20(3):326–32.
10.Tan J, Tedrow JR, Dutta JA, et al. Expression of RXFP1 Is Decreased in Idiopathic Pulmonary Fibrosis. Implications for Relaxin-based Therapies. Am J Respir Crit Care Med. 2016;194(11):1392–402.
11.Morrow JD, Zhou X, Lao T, et al. Functional interactors of three genome-wide association study genes are differentially expressed in severe chronic obstructive pulmonary disease lung tissue. Sci Rep. 2017;7:44232.
12.Leek JT, Johnson WE, Parker HS, et al. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28(6):882–3.
13.Habermann AC, Gutierrez AJ, Bui LT et al. Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis. Sci Adv, 2020. 6(28): p. eaba1972.
14.Ma J, Chen T, Wu S, et al. iProX: an integrated proteome resource. Nucleic Acids Res. 2019;47(D1):D1211–7.
15.Chen T, Ma J, Liu Y, et al. iProX in 2021: connecting proteomics data sharing with big data. Nucleic Acids Res. 2022;50(D1):D1522–7.
16.Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.
17.Kolde R. pheatmap: Pretty Heatmaps. 2010.
18.Wickham H. Ggplot2: Elegant graphics for data analysis. Cham, Switzerland: Springer International Publishing; 2016.
19.Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.
20.Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.
21.Fan X, Dong T, Yan K et al. PM2.5 increases susceptibility to acute exacerbation of COPD via NOX4/Nrf2 redox imbalance-mediated mitophagy. Redox Biol, 2023. 59.
22.Guan R, Wang J, Cai Z et al. Hydrogen sulfide attenuates cigarette smoke-induced airway remodeling by upregulating SIRT1 signaling pathway. Redox Biol, 2020. 28.
23.He F, Liao B, Pu J et al. Exposure to Ambient Particulate Matter Induced COPD in a Rat Model and a Description of the Underlying Mechanism. Sci Rep, 2017. 7(1).
24.Raherison C, Girodet PO. Epidemiology of COPD. Eur Respir Rev. 2009;18(114):213–21.
25.Mannino DM. COPD: epidemiology, prevalence, morbidity and mortality, and disease heterogeneity. Chest. 2002;121(5 Suppl):s121–6.
26.Esteve-Codina A, Hofer TP, Burggraf D, et al. Gender specific airway gene expression in COPD sub-phenotypes supports a role of mitochondria and of different types of leukocytes. Sci Rep. 2021;11(1):12848.
27.Zhu Y, Dutta S, Han Y, et al. Oxidative stress promotes lipid-laden macrophage formation via CYP1B1. Redox Biol. 2025;79:103481.
28.Zhang S, Duan H, Yan J. Identifying biomarkers of endoplasmic reticulum stress and analyzing immune cell infiltration in chronic obstructive pulmonary disease using machine learning. Front Med (Lausanne). 2024;11:1462868.
29.Reddy KD, Lan A, Boudewijn IM, et al. Current Smoking Alters Gene Expression and DNA Methylation in the Nasal Epithelium of Patients with Asthma. Am J Respir Cell Mol Biol. 2021;65(4):366–77.
30.Obernolte H, Niehof M, Braubach P, et al. Cigarette smoke alters inflammatory genes and the extracellular matrix - investigations on viable sections of peripheral human lungs. Cell Tissue Res. 2022;387(2):249–60.
31.Misiukiewicz-Stępien P, Mierzejewski M, Zajusz-Zubek E et al. RNA-Seq Analysis of UPM-Exposed Epithelium Co-Cultivated with Macrophages and Dendritic Cells in Obstructive Lung Diseases. Int J Mol Sci, 2022. 23(16).
32.Fitzgerald LF, Lackey J, Moussa A, et al. Chronic aryl hydrocarbon receptor activity impairs muscle mitochondrial function with tobacco smoking. J Cachexia Sarcopenia Muscle. 2024;15(2):646–59.
33.Berger M, Geng B, Cameron DW, et al. Primary immune deficiency diseases as unrecognized causes of chronic respiratory disease. Respir Med. 2017;132:181–8.
34.Jang JH, Kim JH, Park HS. Current Issues in the Management of IgG Subclass Deficiencies in Adults With Chronic Respiratory Diseases. Allergy Asthma Immunol Res. 2023;15(5):562–79.
35.Hu YS, Shao XJ, Xing L et al. Single-Cell Sequencing of Lung Macrophages and Monocytes Reveals Novel Therapeutic Targets in COPD. Cells, 2023. 12(24).
36.Ogawa D, Stone JF, Takata Y, et al. Liver x receptor agonists inhibit cytokine-induced osteopontin expression in macrophages through interference with activator protein-1 signaling pathways. Circ Res. 2005;96(7):e59–67.
37.Weber K, Schilling JD. Distinct lysosome phenotypes influence inflammatory function in peritoneal and bone marrow-derived macrophages. Int J Inflam, 2014. 2014: p. 154936.
Figure Legends