A
CYP1B1-Mediated Ferroptosis Defines a Biomarker and Therapeutic Target in COPD Across Multi-omics and Single-Cell
Running title: CYP1B1 Biomarker/Target in COPD
A
Lingfeng Liu 1
Mingjun Jiang 2
Muzi Lei 3
Kun Du 4
Dafei Wei 5
Xiangyang Ye 6
Xiaowu Tan 1
Zhuang Mi 1
Yao Tang 7
Ruiting Sun 8✉ Email
Sha Liu 1✉ Email
R. S. 1
1 The Second Affiliated Hospital, Department of Pulmonary and Critical Care Medicine University of South China 421001 Hengyang China
2 Vascular Disease Center, Shanghai Fourth People’s Hospital, School of Medicine Tongji University 200092 Shanghai China
3 The Second Affiliated Hospital, Department of Intensive Care Medicine University of South China 421001 Hengyang China
4 MOE Key Laboratory of Rare Pediatric Diseases, College of Basic Medical Sciences, Hengyang Medical School University of South China 421009 Hengyang China
5 The Second Affiliated Hospital, Department of Children’s Medical Center University of South China 421001 Hengyang China
6 Department of Infectious Diseases Xiangya Hospital of Central South University 410083 Changsha China
7 The Second Affiliated Hospital, Department of Cardiovascular Medicine University of South China 421001 Hengyang China
8 State Key Laboratory of Respiratory Diseases, National Clinical Research Center for Respiratory Diseases, Guangzhou Institute of Respiratory Health the First Affiliated Hospital of Guangzhou Medical University 511495 Guangzhou China
Lingfeng Liu1, †, Mingjun Jiang2, †, Muzi Lei3, Kun Du4, Dafei Wei5, Xiangyang Ye6, Xiaowu Tan1, Zhuang Mi1, Yao Tang7, Ruiting Sun8, *, Sha Liu1, *
1The Second Affiliated Hospital, Department of Pulmonary and Critical Care Medicine, University of South China, Hengyang 421001, China
2Vascular Disease Center, Shanghai Fourth People’s Hospital, School of Medicine, Tongji University, Shanghai 200092, China
3The Second Affiliated Hospital, Department of Intensive Care Medicine, University of South China, Hengyang 421001, China
4MOE Key Laboratory of Rare Pediatric Diseases, College of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang 421009, China
5The Second Affiliated Hospital, Department of Children's Medical Center, University of South China, Hengyang 421001, China
6Department of Infectious Diseases, Xiangya Hospital of Central South University, Changsha 410083, China.
7The Second Affiliated Hospital, Department of Cardiovascular Medicine, University of South China, Hengyang 421001, China
8State Key Laboratory of Respiratory Diseases, National Clinical Research Center for Respiratory Diseases, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou 511495, China
*Correspondence address: Email: 2018020001@usc.edu.cn (S. L.); Dr.srt@hotmail.com (R. S.).
Lingfeng Liu and Mingjun Jiang contributed equally to this work.
A
Highlights
1. Multi-omics integration identifies CYP1B1 as a potential COPD diagnostic biomarker, with external validation.
2. CYP1B1 is upregulated in airway secretory cells and is linked to the activation of ferroptosis.
3. ML and ANN demonstrate strong predictive performance (AUC up to 0.996 in training; 0.834 in external validation).
4. Mouse and cell models demonstrate increased CYP1B1 expression in the context of inflammation and fibrosis; proteomics corroborate these findings.
5. Docking highlights α/β-naphthoflavone, chrysin, and naringenin as CYP1B1 binders ( ≈ − 7.11 to − 5.45 kcal/mol).
Structured Abstract
Background
Chronic obstructive pulmonary disease (COPD) causes progressive airflow limitation and remains without disease-modifying therapy. Ferroptosis—iron-dependent lipid peroxidation—has emerged as a potential mechanism driving epithelial dysfunction and chronic inflammation; however, its upstream regulators in COPD remain incompletely defined. We hypothesized that integrative multi-omics could identify robust biomarkers and pathways, with translational potential for diagnosis and targeted therapy.
Objective Primary: discover and validate biomarkers/pathways underlying COPD via bulk transcriptomics, single-cell analyses, machine learning (ML), and experimental validation. Secondary: evaluate CYP1B1 as a diagnostic biomarker and explore druggability through molecular docking.
Methods
A
Public cohorts (GSE47460, GSE76925, GSE37768; total discovery/validation samples reported) were integrated with ComBat batch correction. Single-cell RNA-seq datasets (GSE196341, GSE135893; 12 COPD vs 12 controls) profiled cell-type localization. Differential expression, WGCNA, and enrichment (GO/KEGG/GSEA) were performed. A 12-algorithm ML framework (113 combinations) plus ANN modeling assessed diagnostic performance (ROC/AUC, confusion matrices, calibration). CIBERSORT estimated immune infiltration. A cigarette-smoke mouse model and 16HBE cell assays provided experimental validation; small-n proteomics were deposited (PXD068247; 3 vs 3). Molecular docking screened candidate CYP1B1-binding compounds and recorded binding energies. Statistical reporting included n, effect sizes, 95% CIs, and multiple-testing control.
Results
Twenty-four hub genes were initially identified; four (BHLHE22, DPP6, DHRS9, CYP1B1) were prioritized by ML across 113 model combinations. In the training set, the combined model achieved an AUC of 0.996 (95% CI, 0.991–0.999), with an external validation AUC of 0.834 (95% CI, 0.755–0.906). Single-gene ROC in an external cohort yielded AUCs of 0.764–0.795, with CYP1B1 being the most consistent. Single-cell analysis localized CYP1B1 upregulation to airway secretory cells (ASCs) and linked high CYP1B1 expression to the activation of the ferroptosis pathway. In mouse and 16HBE models, cigarette smoke increased lung inflammation/fibrosis and upregulated CYP1B1; proteomics corroborated expression changes. Docking identified α-/β-naphthoflavone, chrysin, and naringenin as CYP1B1 binders (best energies ≈ − 7.11 to − 5.45 kcal/mol). Immune deconvolution associated CYP1B1 with macrophage and plasma-cell signals.
Conclusions
A
Integrative multi-omics implicates CYP1B1-mediated ferroptosis in ASCs as a central pathway in COPD, with diagnostic promise and a tractable chemistry space. Prospective validation in larger cohorts, causal perturbation of CYP1B1–ferroptosis in vitro/in vivo, and pharmacology against prioritized ligands are warranted to translate these findings.
Keywords
COPD
CYP1B1
ferroptosis
multi-omics
single-cell RNA sequencing
biomarker discovery
A
Introduction
Chronic obstructive pulmonary disease (COPD) remains a leading global cause of morbidity and mortality [1]. Clinically, COPD is defined by persistent respiratory symptoms and progressive airflow limitation, quantified by spirometry; however, current diagnostic criteria incompletely capture the disease heterogeneity, which spans complex structural remodeling of the lung, diverse inflammatory endotypes, and shifts in the airway microbiome [2, 3]. Against this backdrop, there is a sustained need for mechanism-anchored biomarkers and actionable targets. Increasing evidence implicates oxidative stress and lipid peroxidation as central features of chronic airway injury, with ferroptosis—an iron-dependent, lipid peroxidation-driven cell-death program—emerging as a plausible contributor to epithelial dysfunction and tissue remodeling in COPD.
Airflow obstruction in COPD typically progresses over several years and reflects a fixed pathobiology, including fibrotic narrowing of the small airways and loss of elastic recoil, leading to peripheral airway collapse and air trapping [4]. The small airway epithelium is the principal locus of cellular and histologic change [5], comprising a pseudostratified layer of secretory (club) cells, ciliated cells, and basal cells [6]. Inflammatory remodeling features increased neutrophils and macrophages in the airway lumina, as well as the accumulation of macrophages, T lymphocytes, and B lymphocytes in the airway wall and parenchyma [7, 8]. These observations place epithelial injury and immune–epithelial crosstalk at the center of COPD progression, motivating an interrogation of epithelial cell-intrinsic death pathways that may amplify chronic inflammation.
Genomic and transcriptomic studies suggest that interindividual susceptibility to COPD is related to gene expression programs within the airway epithelium [9], yet the upstream regulators that couple xenobiotic/redox metabolism to ferroptosis remain poorly understood. CYP1B1, a cytochrome P450 implicated in xenobiotic biotransformation and oxidative metabolism, is a biologically plausible node at the interface of redox stress and ferroptosis in airway secretory cells, where lipid peroxidation may promote persistent epithelial dysfunction. However, causal links between CYP1B1 activity, ferroptosis signaling, and COPD pathogenesis are not established, underscoring the need for systems-level discovery integrated with cellular and in vivo validation.
To address this gap, we employed an integrative multi-omics strategy combining bulk transcriptomics with weighted gene co-expression network analysis (WGCNA), differential expression, and pathway enrichment (GO/KEGG/GSEA), followed by a machine-learning framework (12 algorithms across 113 combinations) and an artificial neural network (ANN) to derive and test diagnostic models (ROC/AUC). Immune deconvolution (CIBERSORT) characterized the inflammatory context, while single-cell RNA sequencing mapped candidate gene expression to specific lung cell types. Experimental validation in cigarette–smoke–exposed models and orthogonal proteomics supported the biological plausibility, and molecular docking explored the therapeutic tractability. We hypothesized that CYP1B1 marks—and may contribute to—ferroptosis-linked epithelial pathology in COPD, would generalize across multi-omics layers, demonstrate diagnostic utility, and reveal druggable chemistry space. We aimed to identify robust COPD biomarkers/pathways, localize them to disease-relevant cell types, link them to ferroptosis signatures, validate them in models, and nominate candidate CYP1B1-targeting ligands for future experimental testing.
Materials and Methods
Data sources and preprocessing (bulk expression, single-cell, proteomics)
We integrated bulk gene-expression datasets, single-cell RNA sequencing (scRNA-seq) data, and discovery proteomics. Bulk datasets were retrieved from GEO and included GSE47460 (platforms GPL6480 and GPL14550; healthy controls n = 17 and COPD n = 75 for GPL6480; healthy controls n = 91 and COPD n = 145 for GPL14550) [10], with GSE76925 (COPD n = 111, controls n = 40) [11] and GSE37768 (COPD n = 18, controls n = 9) serving as independent validation cohorts. Probe-level preprocessing removed empty probes and those mapping to multiple genes; for genes with multiple probes, the probe with the highest mean expression was retained, and when multiple probes per gene were required, median aggregation was used. Datasets were merged after re-annotation to gene symbols using the official platform annotation files and were batch-corrected and normalized using ComBat (R package sva) [12]. Subsequent analyses used these normalized matrices. scRNA-seq data were obtained from GSE196341 and GSE135893 (12 COPD and 12 control donors in total) [13]. Discovery proteomics were deposited to ProteomeXchange (iProX partner) under PXD068247 [14, 15]. A summary of datasets is provided in Table 1.
Table 1
Datasets included in the study
Datasets ID
Year
Date type
Country
N (samples)
GSE47460
2013
Bulk RNA-Seq
American
Case = 220, Control = 108
GSE76925
2017
Bulk RNA-Seq
American
Case = 111, Control = 40
GSE37768
2016
Bulk RNA-Seq
Spain
Case = 18, Control = 9
GSE196341
2022
scRNA-Seq
American
Case = 12
GSE135893
2019
scRNA-Seq
American
Control = 12
PXD068247
2025
Proteomics
China
Case = 3, Control = 3
Differentially expressed genes (DEGs)
Differential expression between COPD and control groups was assessed using limma, with model terms including group and batch/platform, where applicable. Unless otherwise indicated, significance is used |log2FC| > 1.5 and p < 0.01 after multiple-testing correction. Results were visualized with volcano plots and clustered heatmaps (ggplot2, pheatmap) [1618].
Weighted gene co-expression network analysis (WGCNA)
Co-expression networks were constructed using WGCNA. Signed networks were generated from pairwise Pearson correlations, selecting the soft-thresholding power β to approximate scale-free topology (reporting scale-free fit R2 and mean connectivity). Adjacency matrices were transformed to topological overlap matrices (TOM), and hierarchical clustering with dynamic tree cutting identified gene modules (report minimum module size and merge thresholds). Module eigengenes were correlated with COPD status, and the midnight blue module, showing the strongest association with COPD, was selected for downstream analysis [19].
Machine-learning feature selection and classification
Batch-corrected, z-score–standardized expression matrices were used to derive features and train classifiers. We implemented a multi-algorithm screen comprising 12 algorithms across 113 combinations (e.g., Logistic Regression, LASSO/Elastic Net, Ridge, SVM-Linear/RBF, Random Forest, ExtraTrees, GBM, XGBoost, k-NN, Naïve Bayes), with feature selection embedded within folds to prevent information leakage. Model selection employed stratified k-fold cross-validation on the training data, followed by external validation on independent cohorts (GSE76925, GSE37768). Performance metrics included ROC-AUC with 95% CIs, PR-AUC, accuracy, F1, and calibration (Brier score, calibration curves). Feature importance was summarized via permutation importance and SHAP, where applicable. The highest mean external AUC chose the optimal model.
Gene set enrichment analysis (GSEA) and functional annotation
To contextualize biological processes, we performed GSEA using MSigDB KEGG (c2.cp.kegg.Hs.symbols) and GO (c5.go.symbols) collections, and complementary over-representation analyses with clusterProfiler; visualizations employed enrichplot. Significance thresholds were based on field standards for GSEA and ORA, as specified in the Results [20].
Single-cell RNA-seq analysis
A
scRNA-seq data (GSE196341, GSE135893) were processed in Seurat. Cells were filtered by mitochondrial RNA > 15%, ribosomal RNA > 5%, erythrocyte RNA > 1%, detected genes < 300 or > 7,000. Data were normalized (method as specified in the Results), and batch effects were mitigated via Harmony integration. Highly variable genes (n = 2,000) were used for PCA and UMAP (dimensions = 22). Clustering utilized a shared nearest neighbor graph (resolution 0.2) and was manually annotated using the top 10 marker genes per cluster. We visualized donor-wise UMAPs and cell-type distributions, and mapped CYP1B1 expression to airway secretory cells. Pseudotime analysis was performed using Monocle, and intercellular communication was inferred using CellChat [13].
Artificial neural network (ANN)
A feed-forward ANN was trained to classify Control vs COPD using binarized signatures derived from characteristic genes (upregulated genes coded 1 if ≥ median; downregulated genes reverse-coded). The network consisted of an input layer (size equal to the number of features), a single hidden layer (5 neurons, using the sigmoid activation function), and a binary output. Training was performed using a neural network with Rprop + optimization (seed = 12345678). Performance was evaluated via confusion matrices, accuracy per class, and ROC-AUC; feature contributions were estimated using the Garson algorithm.
Immune cell deconvolution
From normalized bulk expression, immune cell proportions were estimated using CIBERSORT (LM22), with default permutations (state number) and quantile normalization settings appropriate for the tissue type. Samples with CIBERSORT p < 0.05 were retained. Group differences were tested using the Wilcoxon rank-sum test with the Benjamini–Hochberg correction across 22 cell types. Spearman correlations between characteristic genes and immune subsets were computed with multiple-testing correction.
A
Cigarette smoke extract (CSE) preparation
Mainstream smoke from Hongta brand filter-tipped cigarettes (Yunnan, China) was drawn by a peristaltic pump and bubbled through 10 mL DMEM per cigarette. The pH was adjusted to 7.2, and the solution was sterilized using a 0.22-µm filter to yield 10% CSE (v/v); working concentrations were prepared freshly for each experiment, for particulate matter (PM2.5), stock suspensions were prepared in DMSO at 400 mg/mL, vortexed and sonicated for 2 h, centrifuged to remove insoluble particulates, and supernatants stored at − 20°C[21].
Cell culture and treatments
16HBE cells were cultured in DMEM + 10% FBS at 37°C, 5% CO₂. Cells were exposed to 2.5% CSE for 24 h. Unless otherwise stated, vehicle-treated cells served as controls[21].
Mouse COPD model
A
Specific pathogen-free C57BL/6J mice (15–20 g) were housed at 25°C, 12/12 h light–dark, with food and water ad libitum. The institutional ethics committee approved protocols. Mice were randomized into two groups: control (n = 8) and COPD model (n = 8). COPD was induced by whole-body exposure in a 60×57×100 cm chamber to 9 cigarettes twice daily, 6 days/week, for 6 months; each session lasted ~ 2 h with ≥ 4 h interval. Control mice were exposed to room air for the same duration. Before exposure, mice received sterile saline aerosol (30 minutes, twice daily). Tissues (lung), bronchoalveolar lavage fluid (BALF), and blood were collected at endpoint[22].
Lung function testing
Lung mechanics were evaluated using a FinePointe system (Buxco) according to the manufacturer’s instructions, recording airway resistance/compliance indices[23].
Enzyme-linked immunosorbent assays (ELISAs)
Concentrations of IL-1β, IL-6, IL-8, and TNF-α in BALF and serum were quantified per the manufacturers’ protocols.
Histology and staining
The lungs were fixed in 4% paraformaldehyde, processed, embedded in paraffin, sectioned, and stained with H&E, PAS, and Picrosirius Red using standard protocols.
Immunohistochemistry (IHC)
IHC was performed on paraffin-embedded lung sections using standard antigen retrieval and detection procedures.
Molecular docking
To explore therapeutic tractability, we performed molecular docking against CYP1B1. Three-dimensional structures for ligands were retrieved (PubChem) and prepared in AutoDockTools 1.5.7; receptor structure (CYP1B1) was obtained from RCSB PDB (www.rcsb.org) and prepared analogously. The grid center and box size were defined around the active site, and docking was performed using the Lamarckian Genetic Algorithm under default parameters. Rigid docking was performed (AutoDockTools 1.6.7), and the top-scoring pose was retained.
Statistical analysis
Analyses were performed in R 4.4.2. Unless otherwise stated, two-sided tests were used, with α = 0.05. For multiple comparisons, the Benjamini–Hochberg correction was applied. Classification performance was summarized by ROC-AUC (95% CI), PR-AUC, accuracy, F1 score, confusion matrices, and calibration curves. Data are reported as mean ± SD (or median [IQR]) with n indicating biological replicates.
Results
Multi-omics discovery and hub-gene prioritization
A study overview is provided in Fig. 1. Using WGCNA on the training cohort, we selected a soft-threshold power of β = 5 (Fig. 2A) and identified 20 gene modules through dynamic tree cutting (Fig. 2B). The midnightblue module showed the strongest association with COPD status (module–trait r = 0.893, P = 5.4×10− 32; Fig. 2C–D) and contained 89 genes. After normalization of the GSE47460 expression matrix, differential expression analysis identified 170 DEGs between COPD and control samples (volcano plot, Fig. 2E; top 50 heat map, Fig. 2F). Intersecting module genes with DEGs yielded 24 hub genes (Fig. 2G), which were advanced for biomarker screening.
Fig. 1
Click here to Correct
Fig. 2
Click here to Correct
Machine-learning screen and diagnostic performance
We evaluated 12 algorithms across 113 model combinations to prioritize diagnostic features (Fig. 3A). In the training set, the combined model achieved an AUC of 0.996 (95% CI, 0.991–0.999; Fig. 3B). External validation in GSE76925 yielded an AUC of 0.834 (95% CI, 0.755–0.906; Fig. 3C), indicating preserved discrimination in an independent cohort. Confusion matrices for the training and validation sets demonstrated high sensitivity and specificity (Figs. 3D and 3E). Across models, BHLHE22, DPP6, DHRS9, and CYP1B1 were most frequently selected and ranked highly in terms of importance. Correlation analyses revealed strong positive co-variation between BHLHE22 and CYP1B1 (r = 0.62) and an inverse correlation between DPP6 and CYP1B1 (r = -0.55; Fig. 3G). Differential expression visualizations confirmed that three genes were upregulated in COPD, and DPP6 was downregulated (Fig. 3F).
Fig. 3
Click here to Correct
External validation and pathway context for top biomarkers
Per-gene ROC curves in the independent GSE37768 cohort showed that BHLHE22, DPP6, and CYP1B1 achieved AUCs of 0.764–0.795, whereas DHRS9 performed lower (Fig. 4A). A nomogram integrating these predictors emphasized CYP1B1 as the strongest contributor to diagnostic probability (Fig. 4B–C). To contextualize biology, GSEA revealed that BHLHE22-high samples were enriched for collagen fibril organization and extracellular matrix remodeling (GO) and for chemokine signaling, complement/coagulation, cytokine–receptor interaction, and NK cell–mediated cytotoxicity (KEGG) (Fig. 4D–E). Similarly, CYP1B1-high samples showed GO enrichment for collagen processes, response to interleukin-1, and metallopeptidase activity, alongside KEGG enrichment for chemokine signaling, complement/coagulation, intestinal immune network for IgA, and NK cell–mediated cytotoxicity (Fig. 4F–G).
Fig. 4
Click here to Correct
ANN construction, calibration, and independent checks
We built an ANN using BHLHE22, DPP6, and CYP1B1 (architecture and weights, Fig. 5A). The model demonstrated close agreement between the predicted and observed risks in both the training and validation datasets (calibration plots, Fig. 5B). In GSE37768, all three biomarkers were significantly upregulated in COPD versus controls, consistent with discovery analyses (Fig. 5C–D). Collectively, the ANN achieved robust discrimination with good calibration, supporting the translational feasibility of a parsimonious gene-based classifier.
Fig. 5
Click here to Correct
Single-cell resolution of CYP1B1 and ferroptosis signatures
Integration of human lung scRNA-seq from COPD and control donors resolved 16 major cell types (marker panels in Fig. 6A; UMAP in Fig. 6B), with proportion differences summarized in Fig. 6C. While BHLHE22, DPP6, and CYP1B1 were detected across multiple lineages, CYP1B1 showed significantly higher expression in airway secretory cells (ASCs) from COPD relative to controls (Fig. 6D; report log₂FC, adj. P). Stratifying ASCs by CYP1B1 median defined high- and low-expressing subpopulations; differential analysis indicated ferroptosis pathway enrichment in CYP1B1-high ASCs (KEGG) with increased AUCell ferroptosis scores (Fig. 6E–F).
Fig. 6
Click here to Correct
Pseudotime and cell–cell communication implicate ASC programs in the pathogenesis of COPD.
Monocle-based trajectories suggested bifurcating epithelial lineages, with ASCs enriched at a terminal branch (Fig. 7A–B), consistent with a remodeled secretory program in COPD. Global CellChat networks (Figs. 7C–D) revealed increased outgoing and incoming signaling to/from ASCs, particularly with alveolar macrophages and other immune lineages. An ASC-centric view (Fig. 7E) highlighted ASCs as communication hubs, supporting a model in which secretory epithelial programs shape the COPD inflammatory microenvironment.
Fig. 7
Click here to Correct
Immune infiltration correlates with hub genes.
Bulk deconvolution revealed significant differences in multiple immune subsets between COPD and control subjects, including plasma cells, M0 macrophages, activated mast cells, and follicular helper T cells (Fig. 8A), with inter-subset correlations summarized in Fig. 8B. Expression of BHLHE22 and CYP1B1 correlated positively with plasma cells and M0 macrophages, whereas DPP6 correlated negatively with these subsets and positively with M2 macrophages (Fig. 8C–E), suggesting coordinated epithelial–immune remodeling in COPD.
Fig. 8
Click here to Correct
In vivo and in vitro evidence of CYP1B1 upregulation with smoke exposure
In a 6-month cigarette-smoke mouse model, systemic and airway inflammation increased over time, as indicated by ELISAs (Fig. 9A–H), accompanied by higher pulmonary CYP1B1 expression (Fig. 9I). In 16HBE cells exposed to CSE, proteomic profiling corroborated CYP1B1 upregulation (Fig. 9J). Histopathology in smoke-exposed mice revealed enlarged alveolar septa, increased glycogen deposition, and fibrosis (H&E, PAS, Picrosirius Red; Fig. 10A–C), accompanied by a significant impairment of lung function (Fig. 10D–K). Ferroptosis-related indicators also increased significantly (Fig. 10L-N). These findings support a link between cigarette smoke exposure, epithelial remodeling, and CYP1B1 induction.
Fig. 9
Click here to Correct
Fig. 10
Click here to Correct
Inhibition of CYP1B1 attenuates ferroptosis signals
In the CSE model constructed by cigarette smoke-induced 16-HBE cells, lentiviral transfection was used to knock down the expression of CYP1B1, and we found that ferroptosis signal-related indicators such as GPX4, MDA, and 4-HNE had a significant decline (Fig. 11A-C). This suggests that inhibition of CYP1B1 expression can inhibit ferroptosis in airway epithelial cells, which may be helpful in the treatment of COPD.
Fig. 11
Click here to Correct
Docking nominates natural compounds with predicted CYP1B1 binding.
DGIdb-guided filtering of traditional Chinese medicine (TCM) compounds prioritized α-naphthoflavone, β-naphthoflavone, chrysin, and naringenin as putative CYP1B1 binders. Rigid docking (PyMOL/AutoDock workflow) predicted binding energies < − 5.0 kcal/mol (− 7.11, − 6.86, − 5.56, − 5.45 kcal/mol; Fig. 12A–D), consistent with favorable interactions in the CYP1B1 active site. These in silico data are hypothesis-generating and motivate biochemical validation of enzyme inhibition and selectivity.
Fig. 12
Click here to Correct
Discussion
COPD remains a leading global contributor to morbidity and mortality [24]. While the disease is heterogeneous [25], its clinical definition centers on persistent airflow limitation and chronic respiratory symptoms. To enhance diagnostic precision and mechanistic understanding, we employed an integrative multi-omics pipeline to identify and validate biomarkers with potential clinical utility. Our analyses identify CYP1B1 as a promising biomarker and mechanistic node in COPD, with convergent evidence spanning bulk transcriptomics, machine learning, single-cell mapping, and experimental models.
Using publicly available COPD cohorts, we defined 170 DEGs, mapped them onto co-expression modules, and intersected signals to yield 24 hub genes. A 12-algorithm/113-combination screen prioritized BHLHE22, CYP1B1, and DPP6, with external validation supporting diagnostic discrimination (combined model AUC up to 0.996 in training and 0.834 in GSE76925; single-gene AUCs 0.764–0.795 in GSE37768). Single-cell analyses localized CYP1B1 upregulation to airway secretory cells (ASCs) in COPD, where high CYP1B1 expression coincided with enrichment of ferroptosis signatures. In smoke-exposed mouse lungs and CSE-treated bronchial epithelial cells, CYP1B1 expression increased alongside histologic remodeling and impaired lung function. Collectively, these observations support an association between CYP1B1 upregulation, epithelial stress programs consistent with ferroptosis, and the pathobiology of COPD, while acknowledging that causality requires targeted perturbation.
CYP1B1, a xenobiotic-metabolizing cytochrome P450, bioactivates environmental toxicants (including polycyclic aromatic hydrocarbons) and modulates steroid and eicosanoid pathways [26, 27]. In the lung, CYP1B1 activity has been implicated in oxidative stress and inflammatory signaling pertinent to chronic airway disease [27, 28]. Altered CYP1B1 expression has been linked to epithelial dysfunction—airway remodeling, impaired mucociliary clearance, and heightened susceptibility to inhaled insults—features that align with COPD pathogenesis [29]. Moreover, CYP1B1 polymorphisms have been associated with interindividual variation in lung function decline and COPD risk [30]. Integrating these observations, our data position CYP1B1 as a biologically plausible mediator of smoke-related redox imbalance and epithelial remodeling in COPD, and a tractable node for therapeutic exploration [31, 32].
Primary immunodeficiencies (PIDs) encompass defects in cellular and humoral immunity that predispose to recurrent airway infections and chronic inflammation. More than half of PID subtypes impair antibody production and contribute to recurrent sinusitis and pulmonary infections; persistent upper/lower respiratory infections can drive airway inflammation, obstruction, and, in some cases, structural remodeling compatible with COPD phenotypes [33]. Respiratory complications of PIDs are associated with downstream risks—including severe asthma, bronchiectasis, and COPD—and increased mortality [34]. In the context of our findings, these observations highlight how host-defense perturbations can exacerbate epithelial injury and immune–epithelial crosstalk in susceptible individuals.
Single-cell results indicated that differentially expressed genes were enriched in epithelial compartments, with ASCs showing the clearest CYP1B1 signal. Epithelial–macrophage crosstalk is central to airway remodeling, as alveolar and airway macrophages orchestrate pathogen clearance, surfactant turnover, tissue repair, and homeostasis. In COPD, they exhibit impaired phagocytosis/efferocytosis, dysregulated cytokine release, and heightened oxidative stress [35]. Beyond the lung, macrophage inflammatory phenotypes and lysosomal signaling influence disease processes across organs [36, 37]; by analogy, macrophage dysfunction in COPD may intersect with epithelial redox pathways, including lipid peroxidation and iron handling. Our data suggest that CYP1B1-high ASCs reside within an inflamed network where macrophage interactions are increased, consistent with a feed-forward loop of epithelial stress and immune activation.
Limitation
This study has limitations. First, although we leveraged multiple GEO cohorts, sample availability limits power for some subgroup analyses. Second, the external validation cohorts differ in clinical covariates and platforms, which may affect generalizability. Third, while single-cell analyses map CYP1B1 to airway secretory cells and associate high expression with ferroptosis signatures, we did not perform genetic or pharmacologic perturbation of CYP1B1 with standardized ferroptosis rescue controls (e.g., ferrostatin-1, liproxstatin-1, deferoxamine). Finally, docking results are in silico and require biochemical confirmation (enzyme inhibition, selectivity, and cellular target engagement).
Conclusion
In aggregate, our integrative analyses identify CYP1B1 as a potential biomarker for COPD and a putative mediator of epithelial stress, consistent with ferroptosis in airway secretory cells. Clinically, CYP1B1 expression and derived gene-based models may aid diagnostic stratification; mechanistically, CYP1B1 presents a druggable entry point for modulating redox-lipid peroxidation pathways. Prioritized natural compounds from docking provide testable leads, but their translation will require validating target engagement, ferroptosis-specific rescue, and safety/PK. Considerations. Prospective, multi-center studies integrating single-cell profiling, circulating biomarkers, and interventional perturbations should establish clinical utility and clarify therapeutic potential.
List of abbreviations
Abbreviation
Definition
COPD
Chronic Obstructive Pulmonary Disease
scRNA
Single-cell RNA
DEGs
Differentially Expressed Genes
WGCNA
Weighted Gene Co-Expression Network Analysis
ROC
Receiver Operating Characteristic
AUC
Area Under the Curve
GSEA
Gene Set Enrichment Analysis
GO
Gene Ontology
KEGG
Kyoto Encyclopedia of Genes and Genomes
ANN
Artificial Neural Network
UMAP
Uniform Manifold Approximation and Projection
CSEs
Cigarette Smoke Extracts
Ams
Alveolar Macrophages
AT2s
Alveolar Type II epithelial cells
CD8Ts
CD8 + T cells
ECs
Endothelial cells
Cils
Ciliated cells
Monos
Monocytes
ASCs
Airway secretory cells
BCs
B cells
AT1s
Alveolar Type I cells
PMVECs
Pulmonary microvascular endothelial cells
Fbs
Fibroblasts
Sers
Airway serous cells
MCs
Mast cells
LECs
Lymphatic endothelial cells
PCs
Proliferating cells
Mac_Monos
Macrophages/Monocytes
TCM
Traditional Chinese medicine
Declarations
Ethics approval and consent to participate
The study involving animals was reviewed and approved by Ethics Committee of Guangzhou Medical University (GY2025-036).
A
Data Availability
All data generated or analysed during this study are included in this published article.
Competing interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
A
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This work was sponsored by National Natural Science Foundation of China (81900044 and 82170500), Shanghai Municipal Education Commission (2021 Technology and Innovation-03-163), Natural Science Foundation of Hunan Province (2021JJ40484 and 2024JJ5350), Major Project of Guangzhou National Laboratory (GZNL2024A02005) and the grant of State Key Laboratory of Respiratory Disease (SKLRD-Z-202315).
A
Author Contribution
LL and MJ: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Software, Visualization, Writing– original draft. ML and KD: Data curation, Investigation, Methodology, Software, Visualization. DW, XY and XT: Data curation, Methodology, Software, Visualization. ZM, YT and RS: Data curation, Software, Visualization. SL: Conceptualization, Funding acquisition, Investigation, Project administration, Resources, Supervision, Writing– review & editing.
Acknowledgments
We thank the participants and staff of the National Center for Biotechnology Information. We thank Wen Qianmei and Zuo yujie form the First Affiliated Hospital of Guangzhou Medical University for their assistance in the construction of the mouse model. At the same time, we would also like to acknowledge assistance from proofreaders and editors.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Electronic Supplementary Material
Below is the link to the electronic supplementary material
References
1.
Christenson SA, Smith BM, Bafadhel M, Putcha N. Chronic obstructive pulmonary disease. Lancet. 2022;399(10342):2227–42.
2.
Labaki WW, Rosenberg SR. Chronic Obstructive Pulmonary Disease. Ann Intern Med. 2020;173(3):Itc17–32.
3.
Brightling C, Greening N. Airway inflammation in COPD: progress to precision medicine. Eur Respir J, 2019. 54(2).
4.
Barnes PJ. Cellular and molecular mechanisms of asthma and COPD. Clin Sci (Lond). 2017;131(13):1541–58.
5.
Wistuba II, Mao L, Gazdar AF. Smoking molecular damage in bronchial epithelium. Oncogene. 2002;21(48):7298–306.
6.
Mercer RR, Russell ML, Roggli VL, Crapo JD. Cell number and distribution in human and rat airways. Am J Respir Cell Mol Biol. 1994;10(6):613–24.
7.
Brusselle GG, Joos GF, Bracke KR. New insights into the immunology of chronic obstructive pulmonary disease. Lancet. 2011;378(9795):1015–26.
8.
Hogg JC, Chu F, Utokaparch S, et al. The nature of small-airway obstruction in chronic obstructive pulmonary disease. N Engl J Med. 2004;350(26):2645–53.
9.
Zhao J, Cheng W, He X, et al. Chronic Obstructive Pulmonary Disease Molecular Subtyping and Pathway Deviation-Based Candidate Gene Identification. Cell J. 2018;20(3):326–32.
10.
Tan J, Tedrow JR, Dutta JA, et al. Expression of RXFP1 Is Decreased in Idiopathic Pulmonary Fibrosis. Implications for Relaxin-based Therapies. Am J Respir Crit Care Med. 2016;194(11):1392–402.
11.
Morrow JD, Zhou X, Lao T, et al. Functional interactors of three genome-wide association study genes are differentially expressed in severe chronic obstructive pulmonary disease lung tissue. Sci Rep. 2017;7:44232.
12.
Leek JT, Johnson WE, Parker HS, et al. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28(6):882–3.
13.
Habermann AC, Gutierrez AJ, Bui LT et al. Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis. Sci Adv, 2020. 6(28): p. eaba1972.
14.
Ma J, Chen T, Wu S, et al. iProX: an integrated proteome resource. Nucleic Acids Res. 2019;47(D1):D1211–7.
15.
Chen T, Ma J, Liu Y, et al. iProX in 2021: connecting proteomics data sharing with big data. Nucleic Acids Res. 2022;50(D1):D1522–7.
16.
Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.
17.
Kolde R. pheatmap: Pretty Heatmaps. 2010.
18.
Wickham H. Ggplot2: Elegant graphics for data analysis. Cham, Switzerland: Springer International Publishing; 2016.
19.
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.
20.
Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.
21.
Fan X, Dong T, Yan K et al. PM2.5 increases susceptibility to acute exacerbation of COPD via NOX4/Nrf2 redox imbalance-mediated mitophagy. Redox Biol, 2023. 59.
22.
Guan R, Wang J, Cai Z et al. Hydrogen sulfide attenuates cigarette smoke-induced airway remodeling by upregulating SIRT1 signaling pathway. Redox Biol, 2020. 28.
23.
He F, Liao B, Pu J et al. Exposure to Ambient Particulate Matter Induced COPD in a Rat Model and a Description of the Underlying Mechanism. Sci Rep, 2017. 7(1).
24.
Raherison C, Girodet PO. Epidemiology of COPD. Eur Respir Rev. 2009;18(114):213–21.
25.
Mannino DM. COPD: epidemiology, prevalence, morbidity and mortality, and disease heterogeneity. Chest. 2002;121(5 Suppl):s121–6.
26.
Esteve-Codina A, Hofer TP, Burggraf D, et al. Gender specific airway gene expression in COPD sub-phenotypes supports a role of mitochondria and of different types of leukocytes. Sci Rep. 2021;11(1):12848.
27.
Zhu Y, Dutta S, Han Y, et al. Oxidative stress promotes lipid-laden macrophage formation via CYP1B1. Redox Biol. 2025;79:103481.
28.
Zhang S, Duan H, Yan J. Identifying biomarkers of endoplasmic reticulum stress and analyzing immune cell infiltration in chronic obstructive pulmonary disease using machine learning. Front Med (Lausanne). 2024;11:1462868.
29.
Reddy KD, Lan A, Boudewijn IM, et al. Current Smoking Alters Gene Expression and DNA Methylation in the Nasal Epithelium of Patients with Asthma. Am J Respir Cell Mol Biol. 2021;65(4):366–77.
30.
Obernolte H, Niehof M, Braubach P, et al. Cigarette smoke alters inflammatory genes and the extracellular matrix - investigations on viable sections of peripheral human lungs. Cell Tissue Res. 2022;387(2):249–60.
31.
Misiukiewicz-Stępien P, Mierzejewski M, Zajusz-Zubek E et al. RNA-Seq Analysis of UPM-Exposed Epithelium Co-Cultivated with Macrophages and Dendritic Cells in Obstructive Lung Diseases. Int J Mol Sci, 2022. 23(16).
32.
Fitzgerald LF, Lackey J, Moussa A, et al. Chronic aryl hydrocarbon receptor activity impairs muscle mitochondrial function with tobacco smoking. J Cachexia Sarcopenia Muscle. 2024;15(2):646–59.
33.
Berger M, Geng B, Cameron DW, et al. Primary immune deficiency diseases as unrecognized causes of chronic respiratory disease. Respir Med. 2017;132:181–8.
34.
Jang JH, Kim JH, Park HS. Current Issues in the Management of IgG Subclass Deficiencies in Adults With Chronic Respiratory Diseases. Allergy Asthma Immunol Res. 2023;15(5):562–79.
35.
Hu YS, Shao XJ, Xing L et al. Single-Cell Sequencing of Lung Macrophages and Monocytes Reveals Novel Therapeutic Targets in COPD. Cells, 2023. 12(24).
36.
Ogawa D, Stone JF, Takata Y, et al. Liver x receptor agonists inhibit cytokine-induced osteopontin expression in macrophages through interference with activator protein-1 signaling pathways. Circ Res. 2005;96(7):e59–67.
37.
Weber K, Schilling JD. Distinct lysosome phenotypes influence inflammatory function in peritoneal and bone marrow-derived macrophages. Int J Inflam, 2014. 2014: p. 154936.
Figure Legends
Figure 1. Study workflow for integrative discovery and validation.
Overview of the multi‑omics and validation pipeline. Steps include: dataset assembly (bulk transcriptomics, scRNA‑seq, proteomics), preprocessing and batch correction, DEG and WGCNA discovery, machine‑learning screening across 12 algorithms/113 combinations, single‑cell localization, immune deconvolution, experimental validation in CSE‑treated cells and cigarette smoke–exposed mice, proteomics confirmation, and molecular docking against CYP1B1. Arrows indicate analysis flow; boxes list key outputs and Figures referenced in the Results.
Figure 2. WGCNA module associated with COPD and differential‑expression filtering.
(A) Scale‑free topology analysis used to select soft threshold β = 5. (B) Dendrogram with dynamic tree cutting identifying 20 modules. (C) Module–trait heat map showing correlations with COPD. The midnightblue module shows the strongest association. (D) Gene significance versus module membership in the midnight blue module. (E) Volcano plot of DEGs (thresholds: |log2FC|>0.585). (F) Heat map of top 50 DEGs (z‑score normalized). (G) Venn/UpSet plot intersecting DEGs with midnightblue module genes to yield 24 hub genes.
Figure 3. Machine‑learning feature prioritization and diagnostic performance.
(A) Heat map of feature selection frequency/importance across 12 algorithms and 113 model combinations. (B) ROC curve in training (AUC with 95% CI). (C) ROC in external validation cohort GSE76925 (AUC with 95% CI). (D–E) Confusion matrices for training and validation sets. (F) Volcano plots highlighting BHLHE22, CYP1B1, DHRS9, and DPP6 expression differences. (G) Inter‑gene correlation matrix.
Figure 4. External Validation and Functional Enrichment for Prioritized Biomarkers.
(A) Per‑gene ROC curves in GSE37768 with AUC (95% CI) for BHLHE22, DPP6, DHRS9, and CYP1B1; include DeLong P values vs 0.5. (B) Nomogram integrating top predictors; points, linear predictor, and predicted probability scales are displayed. (C) Calibration curve for nomogram. (D–E) GSEA results for BHLHE22‑high group of GO/KEGG. (F–G) GSEA results for the CYP1B1‑high group.
Figure 5. ANN architecture, calibration, and external discrimination.
(A) ANN topology (input: 3 genes; hidden layer with 5 neurons; sigmoid activation); edge weights visualized. (B) Calibration curves (Brier score) for training and external validation. (C) ROC and PR curves with 95% CIs. (D) Decision‑curve analysis demonstrating net benefit across clinically relevant thresholds.
Figure 6. Single‑cell mapping of biomarker expression and ferroptosis signatures.
(A) Marker‑based annotation of 16 cell types; top markers per cluster listed. (B) UMAP of integrated donors showing major epithelial and immune populations. (C) Bar plots of cell‑type proportions in control vs COPD donors. (D) Feature plots/violin plots showing higher CYP1B1 in airway secretory cells (ASCs) in COPD. (E) KEGG enrichment for CYP1B1‑high vs low ASCs showing ferroptosis pathway. (F) AUCell/GSVA ferroptosis scores (adjusted P < 0.05).
Figure 7. Pseudotime trajectories and cell–cell communication networks.
(A) Monocle pseudotime ordering of epithelial lineages with bifurcation; ASCs are enriched in a terminal branch. (B) Same trajectory colored by annotated cell types. (C–D) CellChat global signaling networks comparing control vs COPD; edge width reflects communication strength. (E) ASC‑centric subnetworks highlighting increased outgoing/incoming interactions with alveolar macrophages and immune cells.
Figure 8. Immune infiltration landscape and correlations with hub genes.
(A) CIBERSORT LM22 deconvolution across cohorts showing differences in immune subset proportions between COPD and controls. (B) Correlation heat map among immune subsets. (C–E) Associations of BHLHE22, CYP1B1, and DPP6 expression with selected immune subsets.
Figure 9. Smoke exposure increases CYP1B1 expression.
(A–H) BALF/serum cytokines (IL‑1β, IL‑6, IL‑8, TNF‑α) across exposure time points (mean ± SD; n per group; statistical tests with exact P and multiple‑testing correction). (I) Lung CYP1B1 expression (IHC/Western/qPCR as applicable; effect size and CI). (J) Proteomics confirmation of CYP1B1 upregulation in CSE‑treated 16HBE cells (FDR > 0.5).
Figure 10. Smoke exposure causes airway dysfunction in the lungs.
(A-C) Representative histology (H&E, PAS, Picrosirius Red) with quantification. (D-K) Lung function parameters. (L-N) Altered levels of ferroptosis-related indicators (GSH/GSSG, MDA, GPX4 and SOD1).
Figure 11. Knockdown of CYP1B1 expression inhibits ferroptosis signaling.
(A-C) Expression of CYP1B1 and altered levels of ferroptosis-related indicators (GSH/GSSG, MDA, GPX4 and 4-HNE).
Figure 12. Molecular docking identifies natural compounds that target the CYP1B1 enzyme.
(A–D) Predicted binding poses and scores (kcal/mol) for α‑naphthoflavone, β‑naphthoflavone, chrysin, and naringenin within the CYP1B1 active site. Annotate key interactions (H‑bonds, π–π, hydrophobic contacts).
CYP1B1-Mediated Ferroptosis Defines a Biomarker and Therapeutic Target in COPD Across Multi-omics and Single-Cell Running title: CYP1B1 Biomarker/Target in COPD
Abstract
Structured Abstract Background: Chronic obstructive pulmonary disease (COPD) causes progressive airflow limitation and remains without disease-modifying therapy. Ferroptosis—iron-dependent lipid peroxidation—has emerged as a potential mechanism driving epithelial dysfunction and chronic inflammation; however, its upstream regulators in COPD remain incompletely defined. We hypothesized that integrative multi-omics could identify robust biomarkers and pathways, with translational potential for diagnosis and targeted therapy. Objective Primary: discover and validate biomarkers/pathways underlying COPD via bulk transcriptomics, single-cell analyses, machine learning (ML), and experimental validation. Secondary: evaluate CYP1B1 as a diagnostic biomarker and explore druggability through molecular docking. Methods: Public cohorts (GSE47460, GSE76925, GSE37768; total discovery/validation samples reported) were integrated with ComBat batch correction. Single-cell RNA-seq datasets (GSE196341, GSE135893; 12 COPD vs 12 controls) profiled cell-type localization. Differential expression, WGCNA, and enrichment (GO/KEGG/GSEA) were performed. A 12-algorithm ML framework (113 combinations) plus ANN modeling assessed diagnostic performance (ROC/AUC, confusion matrices, calibration). CIBERSORT estimated immune infiltration. A cigarette-smoke mouse model and 16HBE cell assays provided experimental validation; small-n proteomics were deposited (PXD068247; 3 vs 3). Molecular docking screened candidate CYP1B1-binding compounds and recorded binding energies. Statistical reporting included n, effect sizes, 95% CIs, and multiple-testing control. Results: Twenty-four hub genes were initially identified; four (BHLHE22, DPP6, DHRS9, CYP1B1) were prioritized by ML across 113 model combinations. In the training set, the combined model achieved an AUC of 0.996 (95% CI, 0.991–0.999), with an external validation AUC of 0.834 (95% CI, 0.755–0.906). Single-gene ROC in an external cohort yielded AUCs of 0.764–0.795, with CYP1B1 being the most consistent. Single-cell analysis localized CYP1B1 upregulation to airway secretory cells (ASCs) and linked high CYP1B1 expression to the activation of the ferroptosis pathway. In mouse and 16HBE models, cigarette smoke increased lung inflammation/fibrosis and upregulated CYP1B1; proteomics corroborated expression changes. Docking identified α-/β-naphthoflavone, chrysin, and naringenin as CYP1B1 binders (best energies ≈ −7.11 to −5.45 kcal/mol). Immune deconvolution associated CYP1B1 with macrophage and plasma-cell signals. Conclusions: Integrative multi-omics implicates CYP1B1-mediated ferroptosis in ASCs as a central pathway in COPD, with diagnostic promise and a tractable chemistry space. Prospective validation in larger cohorts, causal perturbation of CYP1B1–ferroptosis in vitro/in vivo, and pharmacology against prioritized ligands are warranted to translate these findings.
Total words in MS: 4921
Total words in Title: 14
Total words in Abstract: 83
Total Keyword count: 6
Total Images in MS: 12
Total Tables in MS: 1
Total Reference count: 37