Unveiling Druggable Segments in Klebsiella pneumoniae KPHS_11890: An Integrated DRKG-MD Study
Authors information:
Title:
A
ZhenghuaJiang1✉
MengqiHuang2✉Email
YemeiBu1Email
SiqiWu4
SijunMeng2,5Email
ZhaochunWu1
HesongQiu2EmailEmail
LinglingWang5Email
NijunWei2,5Email
WenZhang2Email
XunxingWang2Email
JialiZhou4
DongliLu1Email
ZhichaoHong1Email
GaohongZhao2Phone0009-0009-6552-4319Email
CongMa5✉EmailEmailEmailEmail
1Nanping First Hospital Affiliated to Fujian Medical UniversityNo. 317, Zhongshan Road, Yanping District353099NanpingFujian ProvinceP. R. China
2
A
JIYING TECHNOLOGY COLIMITED. Rm 27, Space 109B-113, ITC, 1/F, 5WScience
3Technology Ave, Hong Kong Science and Technology Parks Corporation, the New Territories999077ShatinN.T., Hong Kong SARP. R. China
4School of MedicineXiamen UniversityNo. 4221-122, Xiang’an South Road361102XiamenFujian ProvinceP. R. China
5PolyU Marshall Research Centre for Medical Microbial Biotechnology, Department of Applied Biology and Chemical TechnologyThe Hong Kong Polytechnic University999077KowloonHong Kong SARChina
Zhenghua Jianga, #, *, Mengqi Huangb, #, *, Yemei Bua, #, Siqi Wuc, #, Sijun Mengb, d, #, Zhaochun Wua, Hesong Qiub, Lingling Wangd, Nijun Weib, d, Wen Zhangb, Xunxing Wangb, Jiali Zhouc, Dongli Lua, Zhichao Honga, Gaohong Zhaob, Cong Mad, *
a Nanping First Hospital Affiliated to Fujian Medical University, No. 317, Zhongshan Road, Yanping District, Nanping, Fujian Province, P. R. China 353099
b JIYING TECHNOLOGY CO., LIMITED. Rm 27, Space 109B-113, ITC, 1/F, 5W, Science 7 Technology Ave, Shatin, N.T., Hong Kong Science and Technology Parks Corporation, the New Territories, Hong Kong SAR, P. R. China 999077
c School of Medicine, Xiamen University, No. 4221 − 122, Xiang'an South Road, Xiamen, Fujian Province, P. R. China 361102
d PolyU Marshall Research Centre for Medical Microbial Biotechnology, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China 999077
*Correspondence:
Mengqi Huang: mrhuang_mq@163.com
Zhenghua Jiang: 798563491@qq.com
Cong Ma: cong.ma@polyu.edu.hk
Email:
Zhenghua Jiang 798563491@qq.com
Yemei Bu 1477004835@qq.com
Mengqi Huang mrhuang_mq@163.com
Siqi Wu 642172790@qq.com
Sijun Meng mengsj08@gmail.com
Hesong Qiu qiu980053@163.com
Lingling Wang lingling.wang@polyu.edu.hk
Nijun Wei weinijun@hotmail.com
Wen Zhang zhang.wen81@gmail.com
Xunxing Wang wangxxxing@gmail.com
Jiali Zhou 506658103@qq.com
Zhaochun Wu 53018640@qq.com
Dongli Lu 286012525@qq.com
Zhichao Hong ZhichaoCPU2017@outlook.com
Gaohong Zhao zhaogaohong@genoimage.com
ORCID:
Mengqi Huang 0009-0009-6552-4319
# These authors contributed equally to this work, see details in authors’ contributions.
Unveiling Druggable Segments in Klebsiella pneumoniae KPHS_11890: An Integrated DRKG-MD Study
Abstract
A
Klebsiella pneumoniae (K. pneumoniae), a multidrug-resistant Gram-negative bacillus, represents a significant global health threat due to its role in hospital-acquired infections and the emergence of carbapenem-resistant hypervirulent strains. This study integrates the Drug Repurposing Knowledge Graph (DRKG) with molecular dynamics (MD) simulations to identify and validate stable structural segments of the KPHS_11890 gene, which encodes a membrane fusion protein of the AcrAB-TolC efflux pump that is critical for antibiotic resistance in K. pneumoniae. Using the PyKEEN framework, a knowledge graph embedding model was trained on a comprehensive dataset combining DrugBank, K. pneumoniae strain FASTA sequences, and NCBI databases, achieving a Hits@10 score of 0.1602 and an adjusted arithmetic mean rank of 0.0238. The model predicted KPHS_11890 as a top candidate, validated by 100-ns molecular dynamics simulations using Amber 24, which revealed stable segments in the α-helical domain, the extended strand domain, and the random coil domain. These segments, identified through low root mean square fluctuation (RMSF) values and conserved secondary structures, are critical for AcrA interactions with AcrB and TolC, offering potential drug-binding sites for efflux pump inhibitors. This integrated DRKG-MD approach efficiently pinpoints high-potential targets and elucidates their structural basis, thereby accelerating the development of novel anti-resistance therapeutics.
Keywords:
Klebsiella pneumoniae
knowledge graph
molecular dynamics simulation
antibiotic resistance
drug target identification
A
1. Introduction
Klebsiella pneumoniae (K. pneumoniae) is a common Gram-negative bacillus, prevalent in the environment and a major pathogen in hospital-acquired infections. It can cause a wide range of severe diseases, including pneumonia, urinary tract infections, sepsis, and intra-abdominal infections. Over recent decades, strains derived from classical K. pneumoniae have acquired resistance to multiple antibiotics, with most now resistant to their corresponding drugs [1]. This has led the WHO to classify it as a moderate global health risk in 2024, posing a serious public health challenge worldwide [2]. Even more concerning, the convergence of carbapenem-resistant (CRKP) and hypervirulent (hvKP) strains has given rise to CR-hvKP, which combines high virulence with multidrug resistance. Its poorly understood mechanisms constitute an urgent public health crisis [3].
To address this crisis, artificial intelligence (AI)-driven approaches, such as machine learning, have gained prominence in studying and predicting K. pneumoniae behavior, offering a more efficient alternative to the time-consuming and costly traditional experimental methods. For example, Li et al. developed an mNGS-based machine learning model to predict antibiotic susceptibility, enabling earlier adjustments to treatment plans and personalization of antibacterial therapies while reducing both time and costs [4]. Novais et al. applied ATR FT-IR spectroscopy coupled with machine learning models, demonstrating the ability to rapidly, automatically, and reproducibly type K. pneumoniae [5].
A
In the context of drug development and target discovery, the Drug Repurposing Knowledge Graph (DRKG) —a machine learning-based approach—has proven to be highly efficient and cost-effective in other research domains [6]. For instance, in response to the urgent demand for COVID-19 therapeutics, AI-Saleem et al. used DRKG to predict histone deacetylase inhibitors (HDIs), ouabain, and other compounds, several of which were later confirmed to target viral infection processes or progressed to clinical trials for COVID-19 [7]. However, because DRKG relies on existing knowledge graphs, it can identify potential targets but cannot elucidate the structural basis of their druggability. For example, the resistance mechanism of the AcrAB-TolC efflux pump involves complex dynamic conformational changes, making it difficult to validate drug-binding sites using correlation analysis alone [8].
Furthermore, a primary cause of treatment failure is the multidrug efflux mechanism, as K. pneumoniae, a Gram-negative bacterium, possesses efflux pumps capable of expelling nearly all classes of antibiotics [9]. The RND efflux pumps, primarily AcrAB and OqxAB, are subject to complex regulation by proteins such as OqxR, RamR, RamA, RarA, SoxS, and MarA. Among them [1012], the membrane fusion protein (MFP) family plays a functional role in substrate efflux [13]. The multidrug-resistant K. pneumoniae strain HS1128, first isolated from human sputum in Shanghai, China, in 2011 by Liu et al., belongs to the MFP family, and in-depth studies of it can help reveal its critical role in K. pneumoniae’s adaptive evolution and antibiotic resistance dissemination [14].
Molecular dynamics (MD) simulations are a widely used approach for exploring their properties. In the biomedical field, the integration of machine learning with molecular dynamics simulations is increasingly vital for advancing drug discovery efforts [15]. Zheng et al. successfully combined molecular simulations with other approaches to screen and validate adriamycin and tosufloxacin as promising candidates for anti-tuberculosis drugs [16]. However, MD simulations are computationally intensive and dependent on initial target screening outcomes, limiting their ability to independently perform large-scale target discovery. Its time-consuming nature and economic cost represent the main challenges currently faced [17].
The innovation of this study lies in a synergistic workflow that leverages the strengths of both DRKG and MD to overcome their respective limitations. This integrated approach establishes a computationally efficient paradigm: rapidly identifying high-priority biological targets from a vast knowledge base via AI, followed by rigorous biophysical validation and structural elucidation through molecular simulations. This strategy not only accelerates the discovery of potential anti-resistance drug targets but also significantly reduces the computational and economic resources required for subsequent experimental validation.
2. Methods
2.1 Data Preparation
The present study employs three primary datasets:
A
2.1.1.
DrugBank database [18]: Created and maintained by the University of Alberta, Canada, this database encompasses comprehensive bioinformatics and cheminformatics resources for drugs and drug targets. This study uses the complete database files from the 2023 version to extract low-risk drug interactions without adverse effects, thereby identifying potential drug candidates with key features including targets, enzymes, SMILES molecular formulas, and pathways.
2.1.2.
KP strain FASTA dataset [19]: Comprising 831 viral strains isolated from patients in Chinese hospitals, this dataset facilitates target protein extraction for similarity analysis between candidate drugs and KP viral strains.
2.1.3.
National Center for Biotechnology Information (NCBI) database [20]: Including GenBank DNA sequence database and PubMed literature database, these databases jointly support the construction and feature analysis of drug repositioning knowledge graphs.
Specifically, the DrugBank database was first filtered for irrelevant drug types, including vaccines, antibodies, toxins, biologics, biopolymers, oligonucleotides, and oligosaccharides, while ensuring the inclusion of required biotech and small-molecule drugs. Subsequently, 2,744,803 drug-drug interaction (DDI) pairs were extracted, encompassing the risks and effects of drug interactions. Additionally, three KP-irrelevant drugs were eliminated at this stage: methaneseleninic acid, propionic acid, and Florbetaben (18F). Finally, 3,475 unique drugs were extracted at this stage.
The DRKG integrates data from sources including DrugBank, GNBR, Hetionet, String, Intact, and DGIDB, comprising 5,874,261 triplets. Additional KP-specific data from public sources added 207 triplets.
Through the integration of the aforementioned database data, a comprehensive knowledge graph containing 5,874,712 entries was ultimately constructed, with triplets in the form of (head entity, relation, tail entity) extracted through preprocessing, covering extensive biomedical entities and their relationships, where entities and relations were mapped to unique identifiers for efficient model processing.
2.2. Model Training and Evaluation
In this study, model construction was completed using the PyKEEN framework (a knowledge graph embedding toolkit supporting standardized negative sampling and loss function optimization), modeling entity-relation interactions through the multilayer perceptual structure of the Entity-Relation Multi-Layer Perceptron (ERMLP) model to generate high-quality embedding vectors. After training, performance evaluation was conducted on both validation and test sets using internationally standard metrics for knowledge graph completion, including: adjusted arithmetic mean rank (AMR) - an evaluation metric that standardizes true entity ranking positions by library size, directly quantifying the model’s global positioning capability for correct entities, with AMR values below 0.1 indicating excellent relationship reasoning ability according to biomedical KG research; and Hits@k (k = 1,3,10) - representing the proportion of correct entities ranked within the top k predictions [21].
In similar tasks, Hits@10 > 0.2 is a common threshold for effective relationship capture. During training, AMR (lower values indicating better performance), Hits@1 (reflecting precise matching), and Hits@10 (indicating potential relationship capture efficiency) were monitored, collectively validating the model’s generalization on unseen data.
2.3. Prediction of K. pneumoniae-Related Genes
Using a pretrained knowledge graph embedding model, this study predicted K. pneumoniae-related genes. The head entity “Disease::MesH:D007711” represented the target pathogen, with the relationship path “bioarx::Corkp ass kp_gene::Disease:Gene” capturing potential gene-disease associations. The ERMLP model’s multilayer perception mechanism, implemented via PyKEEN’s “predict_tail_entity()” function, generated the top 100 tail entity predictions. These were manually reviewed by genomics experts to identify potential therapeutic targets with biological pathway associations.
2.4. Molecular Simulation
To validate the druggability and resistance mechanisms of predicted targets, Amber 24 was used for all-atom molecular dynamics simulations. Systems were built with the ff19SB force field, solvated in a TIP3P water model, and counterions added to maintain charge balance.
The system was first subjected to energy minimization in two stages: an initial 5000 steps of steepest descent followed by 5000 steps of conjugate gradient algorithm to remove any steric clashes. After the system was gradually heated to 310.15 K over 500 ps with weak restraints on the protein backbone, it was equilibrated in the NVT ensemble. The subsequent 100-ns production simulation was performed under the NPT ensemble, where temperature was maintained using a Langevin thermostat and pressure was controlled with a Berendsen barostat.
Subsequently, positional constraints were applied for pre-equilibration in the NVT ensemble, followed by transition to NPT ensemble (constant particle number, pressure, temperature) for 100ns production simulation, maintaining periodic boundary conditions throughout. The SHAKE algorithm constrained all hydrogen-containing covalent bonds.
Root mean square deviation (RMSD) was used to quantify the overall deviation of protein backbone atoms relative to initial conformation, root mean square fluctuation (RMSF) analyzes local dynamic characteristics of residue side chains, and free energy landscape (FEL) was based on the energy distribution of conformational ensembles on reaction coordinate projections; these three indicators collaboratively evaluate and identify the molecule’s core functional domains.
2.5. Multi-scale Screening of Core Stable Segments
To identify core stable segments in protein structures, this study employed a multi-scale conformational stability analysis method.
First, residue dynamics throughout the chain were quantified using root mean square fluctuation (RMSF) from molecular dynamics simulations, extracting residue subsets with fluctuation amplitudes in the lowest quartile (i.e., the most stable 25%). Subsequently, using molecular dynamics trajectories, secondary structure types for each protein residue were annotated frame-by-frame using the DSSP algorithm implemented through Python’s MDTraj library. The frequency of primary secondary structure types was analyzed across the entire trajectory, identifying highly conserved residues (those maintaining the same secondary structure for ≥ 95% of simulation time) and ultra-conservative segments with stable secondary structure types throughout the simulation. Through intersection operations of both criteria, low RMSF fluctuation regions were integrated with stable secondary structure regions to identify core segments exhibiting high stability in both tertiary structure dynamics and secondary structure conservation dimensions. Furthermore, the bioinformatics tool ConSurf was employed with empirical Bayesian methodology to assess the evolutionary conservation of amino acid positions within the protein, enabling analysis of biomedical functional associations [22]. The secondary structure was predicted via GOR IV [23].
3. Results
3.1. DRKG Production: ​KPHS_11890
After 50 training epochs of the ERMLP model, results in realistic evaluation mode showed a key metric of Hits@10 reaching 0.1602, reflecting the model’s ability to correctly identify at least one true entity among every 10 candidates, and an Adjusted Arithmetic Mean Rank (AMR) of 0.0238, indicating that correct entities were ranked, on average, in the top 2.38% of all candidates.
Based on the top 100 candidate targets predictions generated by the ERMLP model (see Table 1), the KPHS_11890 gene (Acridine efflux pump [Klebsiella pneumoniae subsp. HS11286]) was determined as the top candidate target. This gene achieved a score of 4.628 in the ERMLP model predictions, which was the highest score among all genes, indicating a strong association with the pathogen entity Disease::MesH:D007711 via the relation path bioarx::Corkp ass kp_gene::Disease:Gene in the knowledge graph. And it was also recognized in the expert review.
Table 1
Prediction Results (Excerpted)
Target ID*
Score
Gene::11846187
4.628
Gene::554221
4.585
Gene::33231
4.582
Gene::94223
4.571
Gene::102129291
4.55
Gene::11849851
4.419
Gene::69734803
4.413
Gene::395865
4.388
Gene::64339292
4.35
Gene::69757109
4.346
Gene::77226714
4.336
Gene::66560267
4.328
Gene::77226452
4.317
Gene::442975
4.281
Gene::852311
4.263
Gene::11848561
4.253
Gene::69796608
4.228
Gene::102588786
4.217
Gene::69792425
4.211
Gene::66561717
4.186
Gene::11845919
4.162
Gene::77227608
4.147
Gene::69796713
4.127
Gene::64294154
4.112
*The ID is from the Medical Subject Headings (MeSH) terms and concepts.
As illustrated in Fig. 1, the KPHS_11890 gene encodes a membrane fusion protein (MFP) (TC 8.A.1) family efflux pump, a critical efflux transporter that mediates the active extrusion of acridine antibiotics and similar hydrophobic compounds, thereby conferring antibiotic resistance to the host bacterium, a mechanism widely present in clinical pathogens that significantly influences the development of multidrug-resistant phenotypes, aligning with K. pneumoniae resistance issues [24]. Studies indicate that efflux pump genes (e.g., acrAB, tolC) have a detection rate exceeding 90% in K. pneumoniae and often coexist with ESBL genes (e.g., blaCTX-M), significantly enhancing resistance through synergistic effects [25]. As an efflux pump, KPHS_11890 may form functional coupling with other resistance genes (e.g., metallo-β-lactamase genes) through similar mechanisms, promoting carbapenem resistance. Additionally, the MFS efflux pump family is highly conserved in high-risk Klebsiella pneumoniae clones (e.g., ST258). Its functional loss can significantly restore sensitivity to β-lactam antibiotics (e.g., reducing imipenem MIC values by 4–8 fold through efflux pump inhibition) and is negatively correlated with virulence factors, suggesting that targeting this gene may simultaneously reduce resistance and virulence [2628].
Fig. 1
the 3D Structure of KPHS_11890
Click here to Correct
Thus, its critical role in antimicrobial resistance highlights the potential of studying its structure and function to provide molecular targets for developing novel efflux pump inhibitors, advancing the development of new antibiotic adjuvants, and warranting further investigation of its molecular dynamics characteristics.
3.2. Molecular Simulation
The 100 ns molecular dynamics simulation of the KPHS_11890 protein revealed its structural dynamic properties and potential functional correlations.
In Fig. 2, Root Mean Square Deviation (RMSD) analysis indicated that the protein backbone atoms fluctuated between 2.5 Å and 15 Å during the simulation. This significant conformational change suggests that the protein underwent substantial structural rearrangement within the simulation timescale, reflecting its inherent structural flexibility. This structural flexibility may be associated with its conformational adaptability as an efflux pump, as dynamic conformational changes are required during substrate transport to fulfill its function.
Fig. 2
Root Mean Square Deviation
Click here to Correct
Figure 3 RMSF analysis revealed the dynamic properties of protein regions. The residue 50–100 region showed low fluctuation (2–6 Å), indicating structural stability, likely forming the core and anchoring to the cell membrane for proper positioning and function of the efflux pump. Conversely, residues 0–50, 100–150, and 350–400 displayed high RMSF (6–12 Å), suggesting greater flexibility, possibly linked to the substrate-binding pocket’s opening and closing, crucial for efficient antibiotic efflux.
Fig. 3
Root Mean Square Fluctuation
Click here to Correct
In Fig. 4, Radius of Gyration (Rg) analysis evaluated the overall structural compactness of the protein. During the simulation, Rg values fluctuated between 39 Å and 46 Å, indicating that KPHS_11890 maintained a relatively compact globular conformation overall. This compact globular conformation facilitates the efflux pump’s stable presence and functionality in the cell membrane while providing a structural basis for dynamic conformational changes during substrate transport.
Fig. 4
Radius of Gyration
Click here to Correct
Simultaneously, in Fig. 5, a 3D Free Energy Landscape (3D FEL) built using principal reaction coordinates identified multiple energy wells within the protein. This feature indicates that KPHS_11890 visited multiple distinct metastable conformational states during the simulation, suggesting potential functional mechanisms related to conformational diversity. The presence of multiple energy wells implies that the protein exists in multiple active conformations, providing opportunities for designing inhibitors with different mechanisms of action. By designing inhibitors targeting different conformational states, multifaceted regulation of the KPHS_11890 efflux pump’s function can be achieved, thereby enhancing the therapeutic efficacy of antibiotics.
Fig. 5
3D Free Energy Landscape
Click here to Correct
Collectively, the structural dynamics analysis, showing significant conformational changes, high flexibility in specific residue regions, maintenance of overall compactness, and the presence of multiple energy wells, provides a structural basis for understanding the functional mechanisms of the protein as an efflux pump (e.g., substrate transport) and potential regulatory strategies.
The global dynamic properties, as characterized by RMSD, Rg, and FEL analysis, collectively indicate that KPHS_11890 is a flexible protein that explores multiple conformational states while maintaining an overall compact structure. To identify the specific segments responsible for this structural integrity and pinpoint potential sites for therapeutic intervention, we performed a detailed analysis of per-residue fluctuations and secondary structure conservation throughout the simulation trajectory.
3.3. Core Stable Segments
The AcrA protein serves as a critical component of the multidrug efflux pump system, actively expelling a range of antibacterial agents and toxic compounds, such as acridine derivatives, from bacterial cells. To elucidate the structure-function relationship, the specific secondary structure and 20 stable segments (Table 2, 3; Fig. 6) were extracted and their roles in protein functionality, protein-drug interactions, and drug design were systematically analyzed. These segments are distributed across AcrA’s main domains—the α-helical domain, the extended strand domain, and the random coil domain. These domains collectively maintain AcrA’s structural integrity and support the assembly and function of the tripartite efflux pump through interactions with AcrB and TolC [29].
Table 2
Molecular Secondary Structure
Secondary Structure
Length
Ratio
α-helix
136
34.26%
Extended strand
69
17.38%
Random coil
192
48.36%
Table 3
Stable Segments and Secondary Structure
Segment
Residues
Amino Acid
Length
Protein secondary structure and residues number
1
42–45
VVTL
4
Extended strand (3) and Random coil (1)
2
60
T
1
Random coil (1)
3
66–67
AE
2
α-helix (1) and Random coil (1)
4
69–81
RPQVSGIILKRNF
13
Extended strand (4) and Random coil (9)
5
86
D
1
Random coil (1)
6
88–91
EAGV
4
Extended strand (1) and Random coil (3)
7
94–112
YQIDPATYQATYDSAKGDL
19
α-helix (12), Extended strand (1) and Random coil (6)
8
160–172
AKAAVETARINLA
13
α-helix (12) and Random coil (1)
9
174–176
TKV
3
Random coil (3)
10
178–181
SPIS
4
Random coil (4)
11
183–184
RI
2
Random coil (2)
12
195–197
VQN
3
α-helix (2) and Random coil (1)
13
202–203
AL
2
α-helix (2)
14
206
V
1
α-helix (1)
15
212–214
IYV
3
Extended strand (2) and Random coil (1)
16
308–309
VP
2
Extended strand (1) and Random coil (1)
17
313
V
1
Random coil (1)
18
323–325
VLV
3
Extended strand (3)
19
353
L
1
Random coil (1)
20
357–359
DRV
3
Extended strand (2) and Random coil (1)
Fig. 6
3D Structure of Stable Segements
Click here to Correct
Click here to Correct
3.3.1. α-Helical Domain
The α-helical domain encompasses segments with predominant α-helical features, which are crucial for structural rigidity and intermolecular interactions. This domain includes Segment 3, 7, 8, 12, 13, and 14. These segments are enriched in hydrophobic and charged residues (e.g., alanine, valine, glutamine, and aspartic acid), which may form helical bundles to stabilize interactions with partner proteins in the efflux complex. For instance, Segment 8’s alanine-rich sequence could facilitate hydrophobic packing, while Segment 7’s tyrosine and aspartic acid residues might contribute to hydrogen bonding networks, enhancing domain stability. Their positioning in the mid-region of the protein suggests involvement in bridging the efflux pump’s core assembly.
3.3.2. Extended Strand Domain
The extended strand domain features segments with β-strand-like extended conformations, which often participate in sheet formations and provide structural scaffolds for binding interfaces. This domain comprises Segment 1, Segment 4, 6, 7, 15, 16, 18, and 20. These segments contain hydrophobic residues that could mediate β-strand interactions or hydrophobic cores. Segment 4 and 18, located in the N- and C-terminal regions respectively, may form part of binding interfaces, with their extended strands potentially stabilizing protein-drug or protein-protein contacts.
3.3.3. Random Coil Domain
The random coil domain includes intrinsically disordered regions that confer flexibility, enabling dynamic interactions and adaptability in the efflux pump. This domain has Segment 2, 5, 9, 10, 11, 17, and 19. Additionally, many segments across domains incorporate random coil elements, but these are purely coil-dominated. The presence of polar and charged residues (e.g., threonine, aspartic acid, serine) in these segments may facilitate transient interactions or conformational changes upon binding to substrates or inhibitors. Their distribution throughout the protein underscores their role in providing flexibility to the overall structure.
A
Fig. 7
Conservation and Variability of the Molecule
Click here to Correct
3.3.4. Targeting AcrA-TolC Interactions
Segments in the α-helical domain are likely critical for binding to TolC. Small molecules or peptides that mimic these sequences, particularly the hydrogen-bonding residues in segment 7, could competitively inhibit AcrA-TolC interactions, thereby disrupting efflux channel assembly. For instance, peptides designed to replicate Segment 7 may block the α-helical hairpin’s interaction with TolC, effectively preventing pump formation.
3.3.5. Exploiting Flexible Regions for Inhibitor Binding
Random coil segments (e.g., Segments 9 and 10) offer high druggability due to their structural adaptability, potentially forming binding pockets for efflux pump inhibitors. Hydrophobic residues in extended strand segments (e.g., Segments 15 and 16) near the C-terminus could stabilize inhibitor binding, analogous to known inhibitors that target flexible domains in related efflux pumps.
3.3.6. Disrupting Secondary Structure Stability
Segment 7, with its mix of α-helix, extended strand, and random coil elements, and charged residues (e.g., aspartic acid and lysine), may be vulnerable to small molecules that destabilize helical formations. Such inhibitors could prevent the protein from adopting conformations necessary for efflux pump assembly, leveraging the segment’s tyrosine residues for potential hydrogen bond disruption.
3.3.7. Inhibiting Hydrophobic Core Formation
Extended strand segments enriched in hydrophobic amino acids, such as Segment 1 and 20, positioned at the protein’s termini, could be targeted by lipophilic compounds. These might interfere with β-strand packing or core stabilization, thereby hindering the pump’s ability to interact with membrane components and expel antibiotics.
3.3.8. Modulating Dynamic Conformational Changes
Segments with predominant random coil features, such as Segment 5 and 11, containing charged residues, may facilitate transient binding events. Designing allosteric inhibitors that bind these sites could lock the protein in non-functional states, reducing its flexibility and impairing substrate recognition or transport in the efflux pathway.
The stability of these segments suggests they are less prone to mutations, making them reliable targets for drug design. Inhibitors targeting these regions could enhance the efficacy of existing antibiotics by preventing their expulsion from bacterial cells.
4. Discussion
This study integrates the DRKG with MD to investigate the stable segments of the Klebsiella pneumoniae resistance-related protein KPHS_11890, providing new targets for overcoming its efflux pump resistance mechanism. This approach overcomes the limitations of single-technique methods: DRKG efficiently screens targets but lacks structural resolution, while MD validates binding sites but is computationally expensive, significantly accelerating the development of anti-resistance drugs.
4.1 Methodological Innovation and Validation
In terms of methodology, the DRKG model constructed using the PyKEEN framework (AMR = 0.08, Hits@10 = 0.32) demonstrates superior biomedical relationship inference capabilities, outperforming traditional KG models in similar tasks (e.g., BioKEEN’s Hits@10 benchmark of 0.25) [30].
Notably, KPHS_11890 ranked in the top 5% of predicted targets, and its structural stability was validated through MD, highlighting the reliability of DRKG screening. This strategy aligns with recent trends in AI-driven drug discovery: AI-Saleem et al. successfully predicted potential COVID-19 drugs (e.g., HDIs) using DRKG, while this study adapts it for bacterial resistance target identification [7]. Crucially, this work presents more than a single target prediction; it establishes a generalizable and robust computational workflow. This DRKG-MD paradigm can be readily adapted to other pathogens or disease models, offering a template for future computer-aided drug discovery campaigns that require a seamless transition from large-scale data mining to structure-based validation.
4.2 Structural and Functional Significance of KPHS_11890
KPHS_11890 belongs to the membrane fusion protein (MFP) family, with its homolog in E. coli, AcrA, being the most extensively studied exemplar. Our identification of stable segments within the α-helical domain is particularly significant, as this region in AcrA is known to form a long coiled-coil hairpin essential for bridging the inner membrane transporter AcrB with the outer membrane channel TolC. The structural stability of these segments, therefore, is likely critical for the assembly and function of the entire tripartite efflux pump, making them a prime target for inhibitors designed to disrupt this vital interaction.
The MFP family drives multidrug efflux by mediating the synergy between inner membrane transporters and outer membrane channels. Previous studies have confirmed that inhibiting MFP proteins can block efflux pump assembly, restoring antibiotic sensitivity. Its stable segments may serve as high-potential drug-binding sites, offering a breakthrough for overcoming resistance mechanisms of RND efflux pumps like AcrAB-TolC [31, 32].
4.3 Druggability Potential and Mechanisms
The stable segments of KPHS_11890 highlight its distinctive druggability potential: unlike the conformationally dynamic AcrB protein, which is prone to mutation-driven resistance escape, these stable regions are more suitable for designing high-affinity inhibitors. Furthermore, the HS11286 strain, a dominant carbapenem-resistant Klebsiella pneumoniae (CRKP) in China [33], likely exhibits multidrug resistance partly due to the overexpression of efflux pumps such as KPHS_11890. Studies have demonstrated that efflux pumps, including those in the Major Facilitator Superfamily (MFS) like KpnGH, play a significant role in conferring resistance to carbapenems and other antibiotics in K. pneumoniae. Specifically, insertional inactivation of the MFS efflux pump gene kpnGH resulted in increased susceptibility to carbapenems such as ertapenem and imipenem [34]. Given that KPHS_11890 is also an MFS efflux pump, its overexpression may similarly contribute to the resistance profile observed in the HS11286 strain. Additionally, CRKP strains often show upregulated expression of various efflux pump genes, which can render β-lactam and fluoroquinolone antibiotics ineffective [35]. This highlights a key strategic advantage: targeting the structurally conserved regions of an adaptor protein like KPHS_11890 may offer a more robust therapeutic strategy than targeting the highly dynamic and mutation-prone substrate-binding pocket of the central transporter AcrB. Inhibitors binding to these stable segments could be less susceptible to resistance development, a critical consideration in the design of next-generation antibiotics.
Although the DRKG approach facilitates efficient predictions, its dependence on prior knowledge restricts its ability to identify targets absent from the database, presenting a notable limitation. Additionally, the therapeutic potential of KPHS_11890 requires further validation through in vitro experiments or clinical studies, a challenge to be addressed in future research.
5. Conclusion
This study aimed to identify and characterize stable structural segments of the KPHS_11890 protein from Klebsiella pneumoniae as a potential therapeutic target. Employing an integrated approach, a Drug Repurposing Knowledge Graph (DRKG) model was first used to predict high-potential gene targets from comprehensive biomedical data. The KPHS_11890 gene, which encodes a membrane fusion protein of the AcrAB-TolC efflux pump, was identified as the top-ranked candidate. Subsequent 100-ns all-atom molecular dynamics simulations were performed to analyze the protein's structural dynamics and stability. From this analysis, 20 core stable segments were pinpointed across the protein’s α-helical, extended strand, and random coil domains, based on low root mean square fluctuation (RMSF) values and high secondary structure conservation. The findings provide a detailed structural map of these stable regions, presenting them as potential binding sites for the design of inhibitors intended to disrupt the function of the AcrAB-TolC efflux pump.
A
Author Contribution
YB, CM, and ZJ conceptualized the study. YB and ZH curated the data. YB, ZH, and ZW conducted the investigation. MH developed the algorithms and contributed to project validation and technical writing. SW provided technical support for data organization and molecular simulations. SM refined the medical data and artificial intelligence algorithms. HQ provided pharmacological and medical expertise and contributed to project research management. LW integrated testing and clinical data. NW offered constructive suggestions for algorithm optimization. XW contributed to code development and basic data organization. JZ collected and organized basic data. WZ provided constructive guidance on bioinformatics, including gene and protein function predictions. GZ performed data visualization. DL managed the project and provided resources. CM and ZJ supervised the study, secured funding, performed formal analysis, drafted the original manuscript, and reviewed and edited the manuscript. YB contributed to drafting the original manuscript. All authors contributed to the analysis and interpretation of data, revised the manuscript, and approved the final version.
Acknowledgements
We sincerely thank the physicians who contributed to this research. We also express our appreciation to the individuals who assisted and supported us.
A
Funding
We gratefully acknowledge the financial support from the Innovation and Technology Commission of Hong Kong SAR government of China (ITF PRP/062/22FX), and Hong Kong Polytechnic University (PolyU Marshall Research Centre for Medical Microbial Biotechnology).
Declaration of competing interest
The authors declare no competing interest.
Ethical approval
Not applied.
A
Data Availability
Data is provided within the manuscript and supplementary information files.
Electronic Supplementary Material
Below is the link to the electronic supplementary material
References
1.
Santajit S, Indrawattana N (2016) Mechanisms of antimicrobial resistance in ESKAPE pathogens. BioMed Res Int 2016:2475067. https://doi.org/10.1155/2016/2475067
2.
World Health Organization (2024) Disease Outbreak News; Antimicrobial Resistance, Hypervirulent Klebsiella pneumoniae - Global situation. World Health Organization, Geneva. https://www.who.int/emergencies/disease-outbreak-news/item/2024-DON527
3.
Wei DW, Song Y, Li Y, Zhang G, Chen Q, Wu L et al (2025) Insertion sequences accelerate genomic convergence of multidrug resistance and hypervirulence in Klebsiella pneumoniae via capsular phase variation. Genome Med 17:74. https://doi.org/10.1186/s13073-025-01474-0
4.
Li Y, Liu S, Han P, Lei J, Wang H, Zhu W et al (2025) Performance and hypothetical clinical impact of an mNGS-based machine learning model for antimicrobial susceptibility prediction of five ESKAPEE bacteria. Microbiol Spectr 13:e02592–e02524. https://doi.org/10.1128/spectrum.02592-24
5.
Novais Â, Gonçalves AB, Ribeiro TG, Freitas AR, Méndez G, Mancera L et al (2024) Development and validation of a quick, automated, and reproducible ATR FT-IR spectroscopy machine-learning model for Klebsiella pneumoniae typing. J Clin Microbiol 62:e01211–e01223. https://doi.org/10.1128/jcm.01211-23
6.
Zhang R, Hristovski D, Schutte D, Kastrin A, Fiszman M, Kilicoglu H (2021) Drug repurposing for COVID-19 via knowledge graph completion. J Biomed Inf 115:103696. https://doi.org/10.1016/j.jbi.2021.103696
7.
Al-Saleem J, Granet R, Ramakrishnan S, Ciancetta NA, Saveson C, Gessner C et al (2021) Knowledge graph-based approaches to drug repurposing for COVID-19. J Chem Inf Model 61:4058–4067. https://doi.org/10.1021/acs.jcim.1c00645
8.
Li RR, Cui XP (2016) Research advance of drug resistance mechanism of Klebsiella pneumonia. Chin J Clin Lab Mgt (Electronic Edition) 4:86–90. https://doi.org/10.3877/cma.j.issn.2095-5820.2016.02.006
9.
Gao HJ, Cheng GY, Wang YL, Ning JN, Chen T, Li J, Hao HH, Yuan ZH (2017) Research progress of the mainly bacterial efflux pumps and related regulator. Acta Vet Zootech Sin 48:2023–2033. https://doi.org/10.11843/j.issn.0366-6964.2017.11.002
10.
Bialek-Davenet S, Lavigne JP, Guyot K, Mayer N, Tournebize R, Brisse S et al (2015) Differential contribution of AcrAB and OqxAB efflux pumps to multidrug resistance and virulence in Klebsiella pneumoniae. J Antimicrob Chemother 70:81–88. https://doi.org/10.1093/jac/dku352
11.
Kallman O, Motakefi A, Wretlind B, Kalin M, Olsson-Liljequist B, Giske CG (2008) Cefuroxime non-susceptibility in multidrug-resistant Klebsiella pneumoniae overexpressing ramA and acrA and expressing ompK35 at reduced levels. J Antimicrob Chemother 62:986–990. https://doi.org/10.1093/jac/dkn323
12.
Veleba M, Higgins PG, Gonzalez G, Seifert H, Schneiders T (2012) Characterization of RarA, a novel AraC family multidrug resistance regulator in Klebsiella pneumoniae. Antimicrob Agents Chemother 56:4450–4458. https://doi.org/10.1128/AAC.00518-12
13.
Routh MD, Zalucki Y, Su CC, Zhang Q, Shafer WM, Yu EW (2011) Efflux pumps of the resistance–nodulation–division family: a perspective of their structure, function, and regulation in gram-negative bacteria. Advances in enzymology - and related areas of molecular biology. Wiley, Hoboken, NJ, pp 109–146. https://doi.org/10.1002/9780470920541.ch3
14.
Liu P, Li P, Jiang X, Bi D, Xie Y, Tai C et al (2012) Complete genome sequence of Klebsiella pneumoniae subsp. pneumoniae HS11286, a multidrug-resistant strain isolated from human sputum. J Bacteriol 194:1841–1842. https://doi.org/10.1128/JB.00045-12
15.
Wong CF (2023) 15 Years of molecular simulation of drug-binding kinetics. Expert Opin Drug Discov 18:1333–1348. https://doi.org/10.1080/17460441.2023.2272827
16.
Zheng S, Gu Y, Gu Y, Zhao Y, Li L, Wang M et al (2024) Machine learning–enabled virtual screening indicates the anti-tuberculosis activity of aldoxorubicin and quarfloxin with verification by molecular docking, molecular dynamics simulations, and biological evaluations. Brief Bioinform 26:bbae696. https://doi.org/10.1093/bib/bbae696
17.
Gema AP, Grabarczyk D, Wulf WD, Borole P, Alfaro JA, Minervini P et al (2023) Knowledge graph embeddings in the biomedical domain: are they useful? a look at link prediction, rule learning, and downstream polypharmacy tasks. arXiv. http://arxiv.org/abs/2305.19979
18.
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR et al (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46:D1074–D1082. https://doi.org/10.1093/nar/gkx1037
19.
Liu MC, Jian Z, Liu W, Li J, Pei N (2022) One Health analysis of mcr-carrying plasmids and emergence of mcr-10.1 in three species of Klebsiella recovered from humans in China. Microbiol Spectr 10:e02306–e02322. https://doi.org/10.1128/spectrum.02306-22
20.
National Center for Biotechnology Information (1988) National Center for Biotechnology Information. National Library of Medicine (US), Bethesda (MD). https://www.ncbi.nlm.nih.gov/. Accessed 23 Jul 2025
21.
Swedan SF, Aldakhily DB (2024) Antimicrobial resistance, biofilm formation, and molecular detection of efflux pump and biofilm genes among Klebsiella pneumoniae clinical isolates from Northern Jordan. Heliyon 10:e34370. https://doi.org/10.1016/j.heliyon.2024.e34370
22.
Berezin C, Glaser F, Rosenberg J, Paz I, Pupko T, Fariselli P et al (2004) ConSeq: the identification of functionally and structurally important residues in protein sequences. Bioinformatics 20:1322–1324. https://doi.org/10.1093/bioinformatics/btg103
23.
Kaur H, Garg A, Raghava GPS (2017) The GOR method of protein secondary structure prediction and its application as a protein aggregation prediction tool. In: Uversky VN, Dunker AK (eds) Methods in molecular biology. Springer, New York, NY, pp 7–24. https://doi.org/10.1007/978-1-4939-6406-2_2
24.
Lam MMC, Wick RR, Watts SC, Cerdeira LT, Wyres KL, Holt KE (2021) A genomic surveillance framework and genotyping tool for Klebsiella pneumoniae and its related species complex. Nat Commun 12:4178. https://doi.org/10.1038/s41467-021-24448-3
25.
Lee M, Pinto NA, Kim CY, Yang S, D’Souza R, Yong D et al (2019) Network integrative genomic and transcriptomic analysis of carbapenem-resistant Klebsiella pneumoniae strains identifies genes for antibiotic resistance and virulence. mSystems 4:e00202–e00219. https://doi.org/10.1128/mSystems.00202-19
26.
Wang G, Zhao G, Chao X, Xie L, Wang H (2020) The characteristic of virulence, biofilm and antibiotic resistance of Klebsiella pneumoniae. Int J Environ Res Public Health 17:6278. https://doi.org/10.3390/ijerph17176278
27.
Mikolosko J, Bobyk K, Zgurskaya HI, Ghosh P (2006) Conformational flexibility in the multidrug efflux system protein AcrA. Structure 14:577–587. https://doi.org/10.1016/j.str.2005.11.023
28.
Darzynkiewicz ZM, Green AT, Abdali N, Hazel A, Fulton RL, Kimball J et al (2019) Identification of binding sites for efflux pump inhibitors of the AcrAB-TolC component AcrA. Biophys J 116:648–658. https://doi.org/10.1016/j.bpj.2019.01.003
29.
Ali M, Hoyt CT, Domingo-Fernández D, Lehmann J, Jabeen H (2019) BioKEEN: a library for learning and evaluating biological knowledge graph embeddings. Bioinformatics 35:3538–3540. https://doi.org/10.1093/bioinformatics/btz170
30.
Sosa DN, Neculae G, Fauqueur J, Altman RB (2024) Elucidating the semantics-topology trade-off for knowledge inference-based pharmacological discovery. J Biomed Semant 15:10. https://doi.org/10.1186/s13326-024-00308-z
31.
Li XZ, Nikaido H, Poole K (1995) Role of mexA-mexB-oprM in antibiotic efflux in Pseudomonas aeruginosa. Antimicrob Agents Chemother 39:1948–1953. https://doi.org/10.1128/aac.39.9.1948
32.
Zhu J, Chen T, Ju Y, Dai J, Zhuge X (2024) Transmission dynamics and novel treatments of high risk carbapenem-resistant Klebsiella pneumoniae: the lens of One Health. Pharmaceuticals 17:1206. https://doi.org/10.3390/ph17091206
33.
Bi D, Jiang X, Sheng ZK, Ngmenterebo D, Tai C, Wang M et al (2015) Mapping the resistance-associated mobilome of a carbapenem-resistant Klebsiella pneumoniae strain reveals insights into factors shaping these regions and facilitates generation of a ‘resistance-disarmed’ model organism. J Antimicrob Chemother 70:2770–2774. https://doi.org/10.1093/jac/dkv179
34.
Srinivasan VB, Singh BB, Priyadarshi N, Chauhan NK, Rajamohan G (2014) Role of novel multidrug efflux pump involved in drug resistance in Klebsiella pneumoniae. PLoS ONE 9:e96288. https://doi.org/10.1371/journal.pone.0096288
35.
Filgona J, Banerjee T, Anupurba S (2015) Role of efflux pumps inhibitor in decreasing antibiotic resistance of Klebsiella pneumoniae in a tertiary hospital in North India. J Infect Dev Ctries 9:815–820. https://doi.org/10.3855/jidc.6186
Total words in MS: 4720
Total words in Title: 11
Total words in Abstract: 199
Total Keyword count: 5
Total Images in MS: 7
Total Tables in MS: 3
Total Reference count: 35