Molecular genetic characterization of a novel HIV-1 second-generation circulating recombinant form (CRF188_0107) among men who have sex with men in Henan,China
LifanDuan1
YuanYuan1
TingtingRen1
JiaqiLiu1
NingLi1
ZhenLi2
LinDing2
YuchaoZhang2
GuolongZhang1
DongyangZhao1
ChunhuaLiu1✉Email
1Henan Center for Disease Prevention and ControlZhengzhouChina
2Nanyang Center for Disease Prevention and ControlNanyangChina
Lifan Duan1, Yuan Yuan1, Tingting Ren1, Jiaqi Liu1, Ning Li1, Zhen Li2, Lin Ding2, Yuchao Zhang2, Guolong Zhang1, Dongyang Zhao1, Chunhua Liu1*
1Henan Center for Disease Prevention and Control, Zhengzhou, China,
2Nanyang Center for Disease Prevention and Control, Nanyang, China.
* Correspondence:
Chunhua Liu,
email: chunhua5167@126.com
Keywords:
HIV-1
CRF188_0107
CRF01_AE
CRF07_BC
NFLG
molecular epidemiology
Abstract
Introduction
HIV-1 genetic diversity in China is largely driven by recombination between predominant strains such as CRF01_AE and CRF07_BC. This study reports the identification and characterization of a novel second-generation CRF, designated CRF188_0107, identified among men who have sex with men (MSM) in Henan Province, China.
Methods
A
Near-full-length genomes (NFLGs) were amplified from plasma samples of five epidemiologically unlinked individuals using a near-endpoint dilution nested PCR approach. Phylogenetic analysis was performed using IQ-TREE under the GTR model, recombination breakpoints were identified with SimPlot v3.5.1 and Bootscan, and Bayesian evolutionary analysis was conducted in BEAST v1.10.5 to estimate the time of the most recent common ancestor (tMRCA). Drug resistance mutations were analyzed using the Stanford HIVdb program.
Results
All five NFLGs formed a distinct monophyletic clade separate from known CRFs. Recombination analysis revealed a mosaic structure consisting of nine segments with eight breakpoints, characterized by four inserted CRF07_BC fragments within a CRF01_AE backbone. The tMRCA was estimated between 2019 and 2023. No major drug resistance mutations were detected against PIs, NRTIs, NNRTIs, or INSTIs.
Discussion
This study identifies CRF188_0107, a novel HIV-1 recombinant derived from CRF01_AE and CRF07_BC, among MSM in Henan, China. The strain shows a complex mosaic genome and emerged around 2019–2023. Some segments suggest a previously unrecognized CRF01_AE sublineage in the region. Our findings underscore the critical need for sustained molecular surveillance among key populations like MSM to monitor the rapid evolution and potential public health impact of novel HIV-1 recombinants in China.
A
Introduction
Gene recombination is a key driver of the high genetic diversity observed in the human immunodeficiency virus type 1(HIV-1) genome. In China, CRF01_AE and CRF07_BC have become the most epidemiologically significant subtypes and have given rise to multiple second- and third-generation CRFs and unique recombinant forms (URFs) over the past decade[14], particularly among MSM populations. To date, more than twenty distinct CRF01_AE/CRF07_BC recombinants(CRFs_0107) have been documented nationally. However their prevalence and evolutionary dynamics within the key MSM
The evolution of HIV-1 diversity in China demonstrates distinct regional patterns. Henan Province, which experienced its initial HIV epidemic in the mid-1990s primarily through unsafe blood collection practices[5], has undergone a notable shift in viral subtype distribution. Recent surveillance data from antiretroviral treatment(ART)-naive patients reveals a notable epidemiological shift: CRFs have now become predominant, replacing the once-prevalent subtype B[6].
In this study, we characterize a newly identified recombinant form circulating in Henan, China, designated CRF188_0107. Our analyses define a novel CRF01_AE/CRF07_BC mosaic structure and elucidate its genomic architecture and evolutionary origins through comprehensive recombination analysis and phylogenetic reconstruction.
Materials and methods
Study population
This investigation originated from our HIV-1 molecular network analysis of newly diagnosed, ART-naive individuals in Henan Province (2022–2024). Partial pol gene sequencing identified a distinct cluster phylogenetically segregated from known CRFs.
For complete genome characterization, we selected five geographically dispersed cases(demographics in Table 1) representing this novel cluster. All study procedures were approved by the institutional Review Board of Henan Provincial Center for Disease Control and Prevention.
Viral genome amplification, sequencing and sequence assembly
Viral RNAs were extracted from plasma samples using an automatic spin column-based Viral RNA Extraction Kit (Qiagen, Germany) and subsequently reverse transcribed to cDNA using PrimeScript™ II 1st Strand cDNA Synthesis Kit (TaKaRa, China). Then, two overlapping halves of HIV-1 genome were amplified by nested polymerase chain reaction (PCR) utilizing LA Taq (TaKaRa, China) with a near endpoint dilution method as described previously[7]. Positive products were directly purified and sequenced by a commercial company (Dehongchangyuan, Beijing). The NFLGs were assembled and cleaned by DNA sequence analysis software Sequencher V5.4.5 (Gene Codes, United States).
Phylogenetic analysis
The five obtained NFLGs were deposited in GenBank(accession numbers: PV363678-PV363682) and aligned with reference sequences from the Los Alamos HIV Database, including group M subtypes (A-D, F-H, and J-L), CRFs (CRF01_AE, CRF07_BC, and CRF55_01B) and known CRF01_AE/CRF07_BC recombinants using BioEdit V7.2.5. Maximum-likelihood (ML) phylogenetic analysis was performed in IQtree software v2.3.6 under the general time reversible (GTR) substitution model with 1000 ultrafast bootstrap replicates to evaluate the reliability of internal branches.
Recombination breakpoint analysis
Recombination patterns were analyzed using Simplot V3.5.1. Similarity Plotting was conducted using CRF01_AE and CRF07_BC as parental references with a subtype H sequence as outgroup. Bootscan analysis was then performed to confirm recombination events. The FindSites function was used to determine precise recombination breakpoints. The recombinant genome structure was mapped to HXB2 coordinates and visualized using the Recombinant HIV-1 Drawing Tool. The identified recombination breakpoints were phylogenetically validated through maximum-likelihood analysis of segmented genomes, using parental subtype sequences as references.
Parental strain origins analysis
To determine the potential sources of parental strains in CRF188_0107, Bayesian coalescent Markov Chain Monte Carlo (MCMC) analysis was performed using BEAST software V1.10.5. The analysis utilized concatenated genome segments with a chain length of 200 million generations. The bestfit model incorporated a General time reversible(GTR) substitution model with estimated Gamma substitution model, an uncorrelated relaxed lognormal molecular clock model, and a Bayesian skyGrid model. Parameter convergence was assessed using Tracer v1.5. The Maximum Clade credibility (MCC) tree was generated with TreeAnnotator v1.10.5 after burning-in the first 10% of states of each run, and visualized using FigTree v1.4.4.
Drug resistance analysis
Drug resistance profiles were analyzed using the HIVdb program from the Stanford University HIV Drug Resistance Database.The identified drug resistance mutations (DRMs) were categorized according to four drug classes: protease inhibitors (PIs), nucleoside reverse-transcription inhibitors (NRTIs), non-nucleoside reverse-transcription inhibitors (NNRTIs), and integrase strand transfer inhibitors (INSTIs). Resistance levels were classified into five categories(susceptible, potential low, low level, intermediate-level, and high-level drug resistance.
Results
Phylogenetic analysis of the NFLGs sequences
The five NFLG sequences, obtained from five epidemiologically unlinked individuals were 9094, 8841, 9071, 9026 and 9110 nt in size for strain NY24225, NY23025, NY23052, ZZS24046 and XYPQ24008 respectively, spanning from the 5’ long terminal repeat (LTR) to part of the 3’ LTR(corresponding to nucleotides 686–9565 in HXB2 strain). These sequences have been deposited in GenBank under accession numbers PV363678 to PV363682. The ML phylogenetic tree analysis revealed that all five sequences formed a tight, distinct monophyletic clade, clearly separated from all previously identified HIV-1 0107 CRFs (Fig. 1A).
Recombinant breakpoint analysis
The mosaic structures were further analyzed using bootscanning analysis. The analyses confirmed that these sequences shared a consistent recombination pattern, characterized by four inserted CRF07_BC fragments within a CRF01_AE backbone (Fig. 1B). The genome was segmented into nine distinct regions separated by eight breakpoints. A schematic representation of the CRF188_0107 mosaic structure was generated using the Recombinant HIV-1 Drawing Tool from Los Alamos HIV Sequence Database (Fig. 1C). The NFLG structure comprises the following subregions: I01_AE (686-2585nt), II07_BC (2586-2825nt), III01_AE (2826-3045nt), IV07_BC (3046-4905nt), V01_AE (4906-5625nt), VI07_BC (5626-6025nt), VII01_AE (6026-6465nt), VIII07_BC (6466-8785nt), IX01_AE (8786-9565nt).
Subregion phylogenetic tree analysis
Recent studies have established standardized criteria for classifying and naming HIV-1 subtypes of CRF01_AE and CRF07_BC in China, highlighting their complex recombination patterns and evolutionary trajectories[8]. Against this background, we conducted a detailed subregion phylogenetic analysis to elucidate the parental origins of a novel recombinant strain.
Subregion phylogenetic analyses of the nine genomic segments revealed strong phylogenetic clustering with reference strains of CRF01_AE or CRF07_BC supported by high bootstrap values. respectively. Segments I and V both clustered with CRF01_AE sublineage C4, while segments III, VII, and IX grouped with non-subtyped CRF01_AE strains. In contrast, segment II, IV, VI, and VIII exhibited close affinity of CRF07_BC references (Fig. 2). Based on established classification criteria for CRFs, these epidemiologically unlinked HIV-1 sequences fulfill the requirements for designation as a novel CRF, which was named CRF188_0107, This strain displays a mosaic genome structure in which the pol, env, tat, vpr and rev genes originate from both CRF01_AE and CRF07_BC, while the gag, vif, vpu and nef genes are derived exclusively from CRF01_AE.
Bayesian evolutionary analysis based on concatenated segments estimated the tMRCA of the CRF01_AE-derived segments as 2021.31 [95% highest probability density (HPD): 2019.59-2026.63] and the CRF07_BC-derived segments as 2021.30 [95% HPD: 2019.61-2025.58] (Fig. 3), suggesting that HIV-1 CRF188_0107 emerged between approximately 2019 and 2023. The evolutionary analysis further indicated that the CRF01_AE segments clustered within sublineage C4, while the CRF07_BC segments grouped with MSM-associated variants in the CRF07_BC lineage.
Drug Resistance analysis
Genotypic resistance analysis using the Stanford HIV Drug Resistance Database revealed no evidence of major DRMs against PIs, NRTIs, NNRTIs, or INSTIs. These results indicate full susceptibility to all four classes of antiretroviral drugs.
Discussion
A
The identification and characterization of the novel CRF188_0107 underscores the persistent evolution and remarkable diversity of HIV-1 within the MSM population in Henan Province. Continuous molecular surveillance in this key population is crucial not only for tracking the spread of known strains but, more importantly, for the early detection of emerging variants with potentially enhanced transmissibility or other adaptive advantages, which could have direct implications for public health interventions and vaccine design efforts.
Over the past decade, CRF01_AE and CRF07_BC have solidified their status as the predominant HIV-1 strains circulating in China, particularly among MSM[9]. Their frequent co-circulation and recombination have fueled the continuous emergence of novel second-generation CRFs collectively designated as CRFs_0107. According to the LANL HIV Sequence Database, a total of 20 such CRF01_AE/CRF07_BC recombinants have been reported across China since the initial identification of CRF79_0107 in Shanxi in 2017.
Enhanced molecular surveillance efforts have significantly contributed to the discovery of these recombinant forms. For instance, in 2018, a one-year molecular network surveillance project in Anyang, Henan, identified the province’s first third-generation recombinant strain, CRF114_0155[10]. More recently, since 2023, the implementation of systematic HIV pol gene sequencing for all newly diagnosed cases in Henan has enabled more robust detection of previously unrecognized CRFs as exemplified by discovery of CRF188_0107. This recent timescale of emergence indicates an actively evolving local epidemic and underscores the necessity for surveillance to remain agile.
Beyond its recombinant nature, our analysis of the CRF01_AE-derived segments of CRF188_0107 points to a broader genetic complexity in the region. According to previous reports, CRF01_AE strains in China have been classified into 11 distinct clusters[8]. In this study, although segments III, VII, and IX of CRF188_0107 clustered phylogenetically with CRF01_AE references, they did not fall within any of the 11 established clusters. This findings suggests the potential circulation of a previously unrecognized CRF01_AE sublineage in Henan. The presence of such genetic diversity provides a richer pool of parental strains for recombination events and may have directly contributed to the genesis of CRF188_0107. Further surveillance and expanded sampling are warranted to clarify the genetic diversity and epidemiological significance of this potential new cluster.
In conclusion, our study identifies and characterizes CRF188_0107 as a novel second-generation recombinant form of HIV-1 derived from CRF01_AE and CRF07_BC, circulating among MSM in Henan Province, China. Its complex mosaic genome and recent origin highlight the dynamic and ongoing evolution of HIV-1 in this key population. These findings not only expand the known diversity of HIV-1 recombinant forms in China but also highlight the ongoing evolution of the virus in key populations.
Figures
Fig. 1
Phylogeny and recombination analyses of CRF188_0107. (A) Maximum-likelihood(ML) phylogenetic analysis was performed in IQtree software v2.3.6 under the general time reversible (GTR) substitution model with 1000 ultrafast bootstrap replicates to evaluate the reliability of internal branches. Reference sequences obtained from the Los Alamos National Laboratory (LANL) HIV database (https://www.hiv.lanl.gov/), including HIV-1 subtypes A-D, F-H, J, K, L, CRF01_AE, CRF07_BC, CRF55_01B and other 0107 subtypes that have been identified. CRF188_0107 sequences were highlighted with red dots. Bootstrap values ≥ 70% were considered as reliable clusters and labeled at branch nodes. (B) Bootscan analysis was conducted with a window size of 250 bp and a step size of 20 bp along with reference strains of CRF01_AE, CRF07_BC, and a representative HIV-1 H group sequence.(C)The genome mosaic structure of CRF188_0107 near-full-length genome(NFLG) was generated using the LANL Recombinant HIV-1 Drawing Tool (https://www.hiv.lanl.gov/content/sequence/DRAW_CRF/recom_mapper.html).
Click here to Correct
Fig. 2
Subregion Maximum likelihood (ML) phylogenetic analysis and maximum clade credibility (MCC) trees of CRF188_0107. All the subregion ML trees were reconstructed using the same reference sequences, including twelve HIV-1 subtypes (A1, A2, B-D, F1, F2, G-H, and J-L) along with additional Chinese CRF01_AE and CRF07_BC strains. The subregion ML trees were peroformed in IQtree v2.3.6 under the general time reversible(GTR) substitution model with 1000 ultrafast bootstrap replicates to assess branch support. Nodes with bootstrap values ≥ 70% were categorized as reliable clusters and labeled at the nodes. Subregion phylogenetic analyses revealed that CRF188_0107 segments I, III, V, VII, IX clustered with CRF01_AE, while segments II, IV, VI, and VIII grouped within the CRF07_BC clade.
Click here to Correct
Fig. 3
Click here to Correct
Bayesian phylogenetic inference were conducted using BEAST software V1.10.4. Maximum clade credibility(MCC) trees for CRF01_AE and CRF07_BC regions of CRF188_0107 were visualized and edited in FigTree 1.4.4. The trees display mean time to most recent common ancestor (tMRCA) and 95% highest posterior density(HPD) intervals for key nodes.
Table 1
Sociodemographic characteristics of six CRF188_0107 infected individuals.
Sequence name
Age of HIV infection
Confirming time
Route
Patient sex
Sample city
Sequence length
HXB2 coordinate range
Acession number
NY24225
22
2024/4/17
MSM
Male
Nanyang
9094
540–9735
PV363678
NY23025
21
2023/6/28
MSM
Male
Nanyang
8841
798–9736
PV363679
NY23052
37
2023/6/6
MSM
Male
Nanyang
9071
540–9732
PV363680
ZZS24046
41
2024/3/12
MSM
Male
Zhengzhou
9026
540–9665
PV363681
XYPQ24008
21
2024/4/11
MSM
Male
Xinyang
9110
540–9746
PV363682
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflict of interest.
A
Author Contribution
LD:Data curation, Formal analysis, Funding acquisition, visualization, Writing- original draft, Writing-review&editing. YY: Conceptualization, Data curation, Methodology, Supervision, Writing-review&editing. TR: Validation Writing-review&editing. JL:  Methodology, Resources, Writing- review&editing. NL: Supervision, Writing-review&editing. ZL: Investigation, Writing-review&editing. LD: Investigation, Writing-review&editing. YZ: Resources, Writing-review&editing. GZ:Resources, Writing- review&editing. DZ: Resources, Writing-review&editing. CL: Project administration, Resources, Supervision, Writing- review&editing
A
Funding
The authors declare that financial support was received for the research, authorship, and/or publication of this article. This work was supported by the Youth Program of Natural Science Foundation of Henan Province(252300420592), Joint Construction Project of Henan Provincial Health Commission(LHGJ20240633), Science and Technology Project of Henan(252102310128).
A
Acknowledgement
We are deeply grateful to Prof Xin Ruolei from Beijing Center for Disease Control and Prevention for his expert guidance and invaluable assistance throughout this study. We also extend our sincere gratitude to the dedicated HIV/AIDS prevention and control workers from various municipal Centers for Disease Control and Prevention across Henan Province for their invaluable collaboration and efforts in participant recruitment and data collection.
References
1.
Lin Y et al. Molecular genetic characterization analysis of a novel HIV-1 circulating recombinant form (CRF156_0755) in Guangdong, China. Front Microbiol, 2024. 15.
2.
Chen M, et al. Identification of a long-standing HIV-1 circulating recombinant form (CRF178_BC) in different high-risk populations along the China-Myanmar border region. J Infect. 2025;90(4):106477.
3.
Chen M, et al. Identification of a newly emerging second-generation HIV-1 circulating recombinant form (CRF145_0755) among men who have sex with men in China. J Infect. 2024;88(3):106126.
4.
Xiao M, et al. A novel HIV-1 circulating recombinant form (CRF168_0107) identified from men who have sex with men in Beijing, China. J Infect. 2025;90(1):106368.
5.
Wang Z, et al. HIV/AIDS in Henan Province. In: Wu Z, et al. editors. HIV/AIDS in China: Epidemiology, Prevention and Treatment. Springer Singapore: Singapore; 2020. pp. 567–85.
6.
Yang Z, et al. Characterization of HIV-1 subtypes and drug resistance mutations in Henan Province, China (2017–2019). Arch Virol. 2020;165(6):1453–61.
7.
Yao Y, et al. Characteristics of Four Novel Recombinant Strains from the Backbone of CRF55_01B and CRF65_cpx in Beijing by Near Full-Length Genome. AIDS Res Hum Retroviruses. 2021;37(12):936–45.
8.
D W et al. Criteria for classification, nomenclature, and reference sequence selection for HIV sub-subtypes of CRF01_AE and CRF07_BC strains in China. AIDS, 2024 MAR. 1(38): pp. 427–430.
9.
Xiu, Liu et al. Changes in HIV-1 Subtypes/Sub-Subtypes, and Transmitted Drug Resistance Among ART-Naïve HIV-Infected Individuals — China, 2004–2022. China CDC Weekly, 2023. 5(30)(664 – 71).
10.
Li Y, et al. The first third-generation HIV-1 circulating recombinant form (CRF114_0155) identified in central China. Arch Virol. 2021;166(12):3409–16.
A
Data Availability
The datasets presented in this study can be found in online repositories, The names of the repositories and accession numbers can be found at: https://www.ncbi.nlm.nih.gov/genbank/, PV363678, https://www.ncbi.nlm.nih.gov/genbank/, PV363679, https://www.ncbi.nlm.nih.gov/genbank/, PV363680, https://www.ncbi.nlm.nih.gov/genbank/, PV363681, https://www.ncbi.nlm.nih.gov/genbank/, PV363682
Total words in MS: 2220
Total words in Title: 21
Total words in Abstract: 245
Total Keyword count: 6
Total Images in MS: 3
Total Tables in MS: 1
Total Reference count: 10