Telomere-to-telomere sequencing genome assembly identifies variants and genes associated with lactation and dwarf traits in Bactrian camel
LiuJiajia1,8
ZhouHao1,9
ZhangCaiyun1
YangWenhao1
WuRentaodi2
WangXiaoshan2
ZhangHui2
ChenZeming2
ZhuJianshen1
QinChao1
WangLiyuan1
WangYunfei3
ChenQiuju3
WangWenyi3
WangFei4
LiJianjun4
WuniMenghe5
ZhangJian6
DongZhimin6
ZhangSunping7
LiYanhui7
ZhuWenqi1
HuangJinlong1
AiDong2
QiuShengyu2
ZhangWenbin2
A
MaPeipei1✉
A
DaoLema2✉
MengHe1✉Email
HeMeng1
1School of Agriculture and BiologyShanghai Jiao Tong University200240ShanghaiChina
2
A
A
A
Animal Husbandry Research Institute of Alxa League750306Inner MongoliaChina
3Bayannur Institute of Agriculture and Animal Husbandry Science015000Inner MongoliaChina
4Inner Mongolia Yinggesu Biotechnology Co0155995LedInner MongoliaChina
5Livestock Breed Improvement Station of Alxa League7503066Inner MongoliaChina
6Department of AgricultureHetao College, Bayannur Inner Mongolia015000China
7Inner Mongolia Research Institute of Shanghai Jiao Tong University200240ShanghaiChina
8Key Laboratory of Animal Reproduction and Biotechnology in Universities of Shandong, College of Animal Science and TechnologyQingdao Agricultural University266109QingdaoChina
9State Key Laboratory of Mariculture Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research InstituteChinese Academy of Fishery Sciences266071ShandongChina
Liu Jiajia 1,8,#, Zhou Hao 1,9,#, Zhang Caiyun 1, Yang Wenhao 1,Wu Rentaodi 2, Wang Xiaoshan 2, Zhang Hui 2, Chen Zeming 2, Zhu Jianshen 1, Qin Chao 1, Wang Liyuan 1, Wang Yunfei 3, Chen Qiuju 3, Wang Wenyi 3, Wang Fei 4, Li Jianjun 4, Wuni Menghe 5, Zhang Jian6, Dong Zhimin 6, Zhang Sunping 7, Li Yanhui 7, Zhu Wenqi 1, Huang Jinlong 1, Ai Dong 2, Qiu Shengyu 2, Zhang Wenbin 2, Ma Peipei 1,*, Dao Lema 2,*, Meng He 1,*
1. School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China 200240;
2. Animal Husbandry Research Institute of Alxa League, Inner Mongolia, China 750306;
3. Bayannur Institute of Agriculture and Animal Husbandry Science, Inner Mongolia, China 015000;
4. Inner Mongolia Yinggesu Biotechnology Co., Led, Inner Mongolia, China 0155995;
5. Livestock Breed Improvement Station of Alxa League, Inner Mongolia, China 7503066;
6. Department of Agriculture, Hetao College,Bayannur Inner Mongolia, China 015000
7. Inner Mongolia Research Institute of Shanghai Jiao Tong University, Shanghai, China 200240;
8. Key Laboratory of Animal Reproduction and Biotechnology in Universities of Shandong, College of Animal Science and Technology, Qingdao Agricultural University, Qingdao 266109, China;
9. State Key Laboratory of Mariculture Biobreeding and Sustainable Goods, Yellow Sea Fisheries
Research Institute, Chinese Academy of Fishery Sciences, Shandong, China 266071.
*Corresponding author(s): He Meng (menghe@sjtu.edu.cn); Ma Peipei; Dao Lema
Liu Jiajia and Zhou Hao contributed equally to this work.
A
Abstract
Camels play a crucial role in supporting livelihoods and productivity in desert regions. Despite their importance, and our understanding of their genetic composition remains limited. To address this gap, we constructed a telomere-to-telomere (T2T) reference genome for Bactrian camels and carried out a comprehensive genetic analysis concentrating on lactation traits and dwarf defect. The newly assembled genome spans 2.46 Gb in size with a contig N50 of 71.57 Mb and successfully organized into 22 T2T chromosomes with only 7 gaps remaining. Utilizing this comprehensive genome, we detected a total of 33,076,298 genetic variations across the Bactrian camel genome, increasing the effective variant calling rate by 28.14%. A genome-wide association study of 148 lactating Bactrian camels identified 467 significant variations (P < 5 ×10− 6) associated with lactation traits, implicating 42 important candidate genes, including ACSL3, COL23A1, ITGA9, PDGFD, CXXC4, FARP1, CACNA2D1. Additionally, utilizing a dwarf Bactrian camel as a disease case, we identified the FBN2 gene as a critical regulator of growth characteristics intrinsic to Bactrian camels. A mutation occurring at the site c.1411C > T, resulting in an amino acid substitution from valine to isoleucine(Val > Ile), was found to underlie the dwarf phenotype. In conclusion, this study provides the most complete and detailed genome sequence assembly for Bactrian camels to date, highlighting significant genes and mutation loci affecting milk production and dwarf defect. These findings offer valuable insights for advancing genetic research on camel biology and enhancing economically important traits through selective breeding.
Keywords:
T2T Genome
Lactation trait
Dwarf trait
Bacteria camel
Camels play crucial socio-economic roles in arid and semi-arid desert regions. The Camel genus (Camelus, Camelini) encompasses three extant species: the Bactrian camel (Camelus bactrianus), the wild Bactrian camel (Camelus ferus), and the dromedary camel (Camelus dromedarius). The global camel population is estimated at 39.29 million individuals (FAOSTAT, 2022). Among them, 34.26 million camels, primarily dromedaries, are found in Africa, while 5.02 million, predominantly Bactrian camels, inhabit Asia (FAOSTAT, 2022). The wild Bactrian camel is estimated to number only about 2000 individuals in Mongolia's Gobi Desert, classifying it as critically endangered on the IUCN Red List of Threatened Species.
The camel (2n = 74) genome comprises 36 pairs of autosomes and 1 pair of sex chromosomes, with a sex determination system of the XX/XY type. Until 2012, the world's first wild Bactrian camel whole-genome sequencing and mapping was initiated and completed, a project in which our team played a significant role 1. A scaffold-level genome assembly of the wild Bactrian camel was obtained, estimating the entire camel genome size to be 2.38 Gb, containing 20,821 protein-coding genes. Although several camelid genomes have been released and improved over the past decades 2,3, their integrity and accuracy still require considerable improvement. Recent advancements in long-read sequencing technology and assembly algorithms have enhanced the capacity to promptly produce high-quality genomes. Pacific Biosciences' high-fidelity (HiFi) reads can achieve accuracies of over 99.9% with read lengths of 18–25 kb 4, and Oxford Nanopore Technologies (ONT) reads routinely reach median lengths of 50–150 kb with accuracies around 95% 5. These long reads have proven capable of resolving complex structural variations and gaps, as successfully demonstrated in the assembly of T2T, gapless, and complete genomes in humans 6,7, sheep8, chickens 9, and Lepidopteran pests 10.
Due to the absence of a precise reference genome and the free-grazing practices of camels, the genetic basis of their economically important traits remains poorly understood. In modern times, camels are highly valued for their milk, meat, wool, and hides. Camel milk, in particular, holds significant nutritional and economic importance, serving as a dietary staple for many communities. It is a nutritious source, rich in vitamin C, beneficial lactic acid bacteria, and key proteins like beta-caseins, whey proteins, and milk fat globule proteins 11. In recent years, camel milk products have gained popularity in the domestic consumer market due to the rise of live-streaming marketing. However, a Bactrian camel typically produces only 0.25–1.5 kg of milk daily, in addition to the amount consumed by the calf 12, which is less than one-tenth of the milk produced by a cow. This low productivity underscores the urgent need to elucidate the genetic architecture underlying milk performance and to identify molecular markers associated with milk production. Integrating traditional breeding methods with modern genomic techniques is essential to address the issue of low milk yield in Bactrian camels comprehensively.
Additionally, restricted growth, commonly referred to as dwarfism, is a condition characterized by unusually short stature. While dwarfism is well-documented in humans and other animals13,14, natural dwarfism in camels is extremely rare, and no relevant studies have been reported to date. The molecular mechanisms and genetic causes of dwarfism in camels remain entirely unknown. Therefore, leveraging molecular-level scientific breeding approaches offers a promising pathway to achieve significant genetic improvements in both milk production and growth traits in Bactrian camels.
In this study, we generated a high-quality telomere-to-telomere (T2T) reference genome for the Bactrian camel using advanced multi-platform sequencing, achieving a highly contiguous assembly. Leveraging this resource, we conducted comprehensive genetic analyses focused on economically important traits, particularly lactation and growth. A genome-wide association study identified key genetic variations and candidate genes linked to milk production, while analysis of a dwarf Bactrian camel revealed a causal mutation in FBN2 associated with growth defects. This study provides the most complete camel genome to date, offering valuable insights into genetic adaptation and laying a foundation for improved breeding strategies.
A
A
Results
A Telomere-to-Telomere Genome Assembly for Domestic Bactrian Camel
An adult male domestic Bactrian camel from Inner Mongolia was selected to establish a comprehensive reference genome (Fig. 1a). Utilizing multiple sequencing platforms, including PacBio HiFi, ONT, Hi-C, and Illumina platforms. We obtained 117.44 Gb of HiFi reads at 60× coverage, 231.42 Gb of ONT ultra-long reads at 115× coverage, 111.39 Gb of Hi-C reads at 55× coverage, and 204.19 Gb of Illumina paired-end reads at 100× coverage (Supplementary Table 1–2). K-mer analysis from next generation sequencing (NGS) short-read data estimated the genome size to be 2.03 Gb and revealed a homozygosity rate of 0.67% (Supplementary Fig. 1). Our initial assembly produced 216 contigs, with the Contig N50 value at 71.57 Mb and the N90 value at 25.47 Mb by combining HiFi, ONT and Hi-C dataset. After removing duplicates, the contig count was streamlined to 69, with the Contig N50 value remaining unchanged at 71,57 Mb, and the N90 value was improved to 27.68 Mb. We then used Hi-C data to refine the assembly further, anchoring these contigs onto 60 scaffolds, achieving the Scaffold N50 was 78.00 Mb and the Scaffold 90 was 30.29 Mb (Supplementary Table 3).
Linear alignment with the previously published Camelus ferus genome (BCGSAC_Cfer_1.0) revealed that the 37 longest scaffolds correspond one-to-one with the chromosomes of the wild Bactrian camel, excluding the Y chromosome (Supplementary Fig. 2). Subsequently, we used the Y chromosome of human (GRCh38.p14) as a reference template to merge and assemble the remaining 23 scaffolds, thereby constructing the Y chromosome of the domestic Bactrian camel. Hi-C sequencing reads were instrumental in generating chromosomal interaction maps, affirming the accurate order and orientation of all chromosomes (Fig. 1b). After gap-closing and refinement, only seven gaps remained across the entire camel genome. These included three within autosomal chromosomes 4 (gaps 1 and 2) and 9 (gap 3), as well as four within the Y chromosome (gaps 4 to 7) (Fig. 1e). We identified 44 telomeres and successfully constructed 22 T2T chromosomes. The remaining 16 chromosomes showed telomere structures at one end (Fig. 1d). We identified 36 candidate centromeric tandem repeats, predominantly spread across all chromosomes except chromosomes 8 and 15 (Fig. 1d). The final assembled genome, named Camelus.1.0, is a complete T2T genome of a domestic Bactrian camel.
Predictive genomics provided insights into gene distribution and structure, annotating 29,932 protein-coding genes with an average size of 38,583 base pairs and typically 26 exons per gene (Supplementary Table 4). We also annotated a wide array of non-coding elements, including 54,372 messenger RNAs (mRNAs) and 884 transfer RNAs (tRNAs) involved in protein synthesis, and other functionally significant RNA species such as 10,639 long non-coding RNAs (lncRNAs), 566 small nucleolar RNAs (snoRNAs), and 791 small nuclear RNAs (snRNAs) (Fig. 1e).
Repetitive sequences in the domestic camel genome spanned a total length of 701.98 kb, constituting 28.54% of the entire genome. These repetitive sequences comprised two fundamental types: interspersed repeats and tandem repeats. Notably, long interspersed nuclear elements (LINEs) emerged as the most abundant, representing 16.63% of the repetitive content. This category included LINE1 (12.46%), LINE2 (3.63%), L3/CR1 (0.39%), and RTE (0.14%). In addition, long terminal repeats (LTRs) constituted 4.34%, while short interspersed elements (SINEs) accounted for 3.01% of the repetitive sequences (Fig. 1c).
Fig. 1
A Telomere-to-Telomere Genome Assembly for Domestic Bactrian Camel. a, The male camel which was sampled from Alashan camel was used for genome assembly. b, Hi-C chromatin interaction map of the camel assembly. c, Composition of repetitive sequences in the camel genome. d, Chromosomal map with telomeres, centromeres, and gene distribution. e, Circos plot of genomic features at 500-kb intervals across the 38 chromosomes. From outer to inner ring: GC content, genes, TEs, snRNA, snoRNA, and lncRNA in the genome.
Click here to Correct
Camelus.1.0 Assembly Assessment
We evaluated the quality of the domestic Bactrian camel genome assembly from multiple dimensions, including genome continuity, accuracy, completeness, and reliability. Genome continuity is typically assessed using the Contig N50 size, and in the Camelus.1.0 genome assembly, the Contig N50 reached 71,572 Kb, significantly surpassing other versions (ranging from 139 Kb to 5,365 Kb). Additionally, the Contig/Chromosome ratio (CC ratio) of our assembly is only 1.8, which is much lower than that of other versions (Table 1). The CC ratio is an intuitive indicator of genome continuity and is unaffected by contig length or the intrinsic length of chromosomes 15.
The genome accuracy was evaluated using the quality value (QV). The average QV of the assembled Camelus.1.0 reached 56.21, with chromosome 28 having the highest QV of 61.98, and chromosome 7 the lowest at 48.48. The genome completeness was assessed at 97.07%, indicating a high level of accuracy in the present assembly.
The genome integrity of the Camelus.1.0 assembly was evaluated using BUSCO (Benchmarking Universal Single-Copy Orthologs) and the number of genome gaps. When comparing the Bactrian camel genome to the mammalian single copy ortholog gene set, 96.1% of the conserved core genes were identified as complete. Among these, 94.9% were single-copy genes with complete structures, and 1.2% were multi-copy genes with complete structures. Our Bactrian camel genome assembly has only seven gaps, whereas the best-assembled camel genome to date, BCGSAC_Cfer_1.0 3, still contains 2,345 gaps scattered across the genome (Table 1).
To validate the reliability of the Camelus.1.0 genome, eliminating potential effects stemming from environmental factors, natural variation, and sequencing errors, we employed an NGS dataset comprising162 domestic Bactrian camels sequenced in-house and 163 camels from various species across Asia downloaded from NCBI (Supplementary Table 5). This analysis yielded an impressive average mapping rate of 99.77%, confirming the genomic representation of the assembly.
To confirm the reliability of the Camelus.1.0 assembly, eliminating potential effects stemming from environmental factors, natural variation, and sequencing errors, we assessed the genome by aligning NGS short-read sequences from camels to the assembled Bactrian camel genome and calculating the alignment rate. Based on the results from 350 camel samples (Supplementary Table 5), the average alignment rate reached 99.77%. Specifically, the average alignment rate for different domestic Bactrian camel breeds was 99.79%, for wild Bactrian camels 99.32%, and for dromedaries 99.77%. These results demonstrate that this genome version exhibits high compatibility and reliability.
Table 1
Comparison of assembly statistics among different camel genome assemblies.
Assembly
Camelus.1.0
Ca_bactrianus_MBC_1.0 (2014)[7]
BCGSAC_Cfer_1.0 (2020)[8]
BCGSAC_CB_1 (2012)[6]
CamDro3 (2019)[15]
Taxon
Camelus bactrianus
Camelus bactrianus
Camelus ferus
Camelus ferus
Camelus dromedarius
Total length (Gb)
2.46/2.37
1.99
2.09
2.01
2.2
Ungapped length (Gb)
2.37
1.98
2.09
1.99
2.2
Gap number
7
31,980
2,345
55,538
45,932
Contig number
69
67,434
4,402
68,871
53,084
Contig N50 (Kb)
71,572
139
5,365
90
236
Contig L50
13
3,963
112
5,814
2,637
Scaffold number
60
35,454
2,057
13,333
21,069
Scaffold N50 (Mb)
78
8.8
76
2
70.4
Scaffold L50
12
68
11
274
11
GC%
41.9
41.4
41.66
41.3
41.5
CC ratio
1.8
1774.6
115.8
1812.4
1396.9
Chromosome number
38
0
37
0
37
Genetic Structures of Local Bactrian Camels in China
Comprehensive variants across camel genome were detected among a cohort of 350 camels using both the preceding BCGSAC_Cfer_1.0 (Old assembly) and the Camelus.1.0 (New assembly) genomes as references. We identified 33,076,298 variant sites, encompassing 30, 019, 998 SNPs, 1,153,351 insertions, 1,612,350 deletions, and 290,599 complex variants. This marked a 28.14% increase in variant detection over the new assembly, which demonstrated the enhanced resolution of the Camelus.1.0 genome for genetic analyses (Supplementary Fig. 3).
To understand the genetic structure of local Chinese camel populations, we conducted principal component analysis (PCA), phylogenetic tree, and population structure analyses on the study's camel group using genetic variation sites detected from the Camelus.1.0 assembled Bactrian camel genome. The PCA segregated wild and domestic Bactrian camels along the first principal component, while dromedaries formed distinct clusters separate from Bactrian camels (Fig. 2a). We constructed a phylogenetic tree of Bactrian camels (Fig. 2b). The phylogenetic tree analysis provided a more intuitive representation, showing that the Alashan Bactrian camels with 125 individuals were divided into two subgroups, the Sonid Bactrian camels with 40 ones were also divided into two subgroups, and among the Gobi Red camels with 50 ones formed a separate branch, which were clearly separate from the Alashan Bactrian camels and Sonid Bactrian camels. The Qinghai and Junggar Bactrian camels form relatively dispersed clusters, which may indicate higher genetic diversity within these populations. Additionally, the phylogenetic tree shows that they are more closely related to the Alashan Bactrian camels. In contrast, the Tarim Bactrian camels form a smaller and more distinct branch, which may suggest lower genetic diversity or a certain degree of genetic isolation. Presently, Gobi Red camel is not among the five breeds in China. It is located within the Inner Mongolia region and is considered a branch of the Alashan camel. Evaluating the three camel populations from Inner Mongolia is to investigate whether there is genetic diversity despite being within the same geographical area. However, there were still a few individuals in the three populations exhibiting overlapping patterns, which might be related to unclear breed identification and could affect the accuracy of data recording.
To further investigate the genetic structure of three populations in Inner Mongolia, including Alashan Bactrian camels (145), Sonid Bactrian camels (40), and Gobi Red camels (30), we extracted all genetic variation loci from 215 individuals of these three populations. The results of PCA analysis showed that when using PC1 and PC2 as coordinates, Alashan Bactrian camels, Sunite Bactrian camels, and Gobi Red camels clustered separately (Fig. 2c). Furthermore, both Alashan and Sonid Bactrian camels had significantly higher average pairwise nucleotide diversity (π) than Gobi Red camels (P = 1.8e-09 and P = 1.3e-05, respectively), indicating a genetic bottleneck in the Gobi Red population during domestication (Fig. 2d). In contrast, the difference between Alashan and Sonid camels was not statistically significant (P = 0.1). These observations are consistent with the weaker domestication of Gobi Red camels compared to livestock species that underwent intensive selection, such as cattle or sheep.
Fig. 2
Genetic Structures of Camels in China. a, Principal component analysis (PCA) of Camel genus whole-genome sequences (C. bactrianus, C. ferus, C. dromedarius) mapped to the Camelus.1.0 reference genome, showing clear interspecies divergence along PC1 (110.36% variance) and PC2 (28.3996% variance). Colors denote species: red (domestic Bactrian), blue (wild Bactrian), green (dromedary). b, The phylogenetic tree of the six Bactrian camel breeds. Alashan (blue) and sonite (yellow) breed were both divided into two subgroups, while Gobi red breed (red) was separated from the other breeds. The results indicated that Gobi red should be considered as one Bactrian camel breed in China. c, Principal component analysis of three populations in Inner Mongolia showing close genetic relationships between Alashan (red) and Sonit (blue) breed, while Gobi red (green) was separated. d, Boxplot of pairwise nucleotide diversity (π) among three populations in Inner Mongolia consisted with the PCA analysis.
Click here to Correct
Milk Traits-associated Genetic Markers in Bactrian Camel
In our pursuit to unravel the genetic factors influencing camel milk production traits, we executed a comprehensive genome-wide association analysis leveraging Camelus.1.0 genome. A total of 148 Bactrian camels were subjected to resequencing, yielding a substantial 3845.09 Gb of quality-controlled data. On average, each individual yielded 86,601,269 quality-controlled sequences, with an approximate sequencing depth of 10×. Simultaneously, we meticulously recorded and statistically analyzed key milk traits within this camel population, encompassing morning milk yield (Milk/g), milk lactose content (Lactose/%), milk protein content (%), milk fat content (Fat/%), non-fat solids content (Non_fat/%), ash content (Ash/%), and milk density (Density). The findings disclosed that dual-humped camels exhibited an average morning milk yield of 1392.13g, lactose content at 4.86%, milk protein content at 3.79%, milk fat content at 4.99%, non-fat solids content at 9.36%, ash content at 0.72%, and milk density at 30.83 ((Supplementary Table 6).
We further measured the correlation relationships among these seven camel milk performance traits, revealing correlations spanning from − 0.30 to 0.98 (Fig. 3a). This information is crucial for selecting animals for breeding programs aimed at improving specific milk production characteristics. Subsequently, we conducted an analysis of environmental factors affecting dual-humped camel milk traits, encompassing variables such as population, year, parity, age, feeding method, and sampling date. Notably, significant environmental factors influencing milk traits were incorporated into the genome-wide association analysis model.
Employing the fixed and random model circulating probability unified method (FarmCPU), we unveiled 467 significant variants (P < 5 x 10− 6) exerting influence on Bactrian camel milk traits. Among these, 42 crucial candidate genes were identified. Among them, there are 71 variant sites related to milk production in Bactrian camels, with candidate genes including ACSL3, ITGA9, CXXC4, FARP1, etc. (Fig. 3c, Supplementary Table 7, Supplementary Fig. 4). There are 59 variant sites related to lactose content, with candidate genes including CACNA2D1, SLC35F1, TPH2, PIANP, etc. (Supplementary Table 8). There are 51 variant sites related to milk protein content, with candidate genes including INPP5D, CACNA2D1, SLC35F1, PDGFD, etc. (Supplementary Table 8). There are 7 variant sites related to milk fat content, with candidate genes including HS6ST1, STXBP5, PRR5, MPP7 (Supplementary Table 8). There are 67 variant sites related to non-fat solid content, with candidate genes including INPP5D, CACNA2D1, SLC35F1, TPH2 (Supplementary Table 8). There are 58 variant sites related to ash content, with candidate genes including RBMS1, CACNA2D1, SLC35F1. (Supplementary Table 8). There are 95 variant sites related to milk density, with candidate genes including INPP5D, SLC35F1, COL23A1, CLK4 (Supplementary Table 8). It can be observed that genes such as ACSL3, COL23A1, ITGA9, PDGFD, CXXC4, FARP1, CACNA2D1 are associated with multiple milk production traits in Bactrian camels.
Fig. 3
Genomic architecture of camel milk production traits. a, Pearson correlation matrix of seven milk traits (fat, milk yield, non-fat, lactose, density, protein, ash) measured in 148 lactating Bactrian camels. Color gradient indicates r values (scale bar). b, Chromosomal distribution of 33.1 million variants (mean density: 13.4 variants/kb). Chromosome sizes scaled to Camelus.1.0 assembly. c, Manhattan plot showing genome-wide association (GWA) signals for milk yield (-log₁₀P values; horizontal line:
). d, Quantile-quantile plot of observed vs. expected GWA P-values. Shaded region: 95% confidence interval.
Click here to Correct
A Mutation of FBN2 Gene Affect Dwarf Trait in Camel
A
Natural dwarfism in camels is an exceedingly rare phenomenon, with limited documented cases and no established understanding of the underlying molecular pathogenic mechanisms. In this groundbreaking study, we delved into the molecular intricacies of dwarfism in camels, leveraging high-depth genomic sequencing techniques. Our subject of analysis was a singular case of a dwarf Bactrian camel (Fig. 4a). To elucidate the genetic basis of dwarfism, we meticulously compared various body traits between the dwarf camel and a normal counterpart of the same age, which included height, foreleg length, hindleg length, head length, tail length, weight, and growth hormone levels. Notably, the most prominent contributing factor to the undersized stature of the dwarf camel was the notably shorter legs, with measurements of 26 cm and 30 cm compared to the normal camel's 44 cm and 43 cm for the foreleg and hindleg, respectively (Fig. 4b).
A
Our study harnessed the power of the T2T camel genome as an invaluable tool for investigating rare and low-frequency mutations within genes associated with diseases. Through whole-genome sequencing (WGS), we embarked on the search for genes linked to dwarfism in camels. After a rigorous screening process, we successfully identified a novel missense mutation characterized by a C-to-T transition at position 1411 (c.1411 C > T) within exon 10 of the FBN2 gene. This genetic alteration resulted in a Valine (Val) to Isoleucine (Ile) substitution at position 471 (p.Val471Ile).
It is noteworthy that several variants within the FBN2 gene have been previously implicated in causing short stature, hand abnormalities, and scoliosis in humans and mice16,17. These clinical manifestations align with the observed symptoms in the dwarf camel. Importantly, the dwarf camel exhibited heterozygosity for the identified mutation (C/T), while its mother and 163 other camels exhibited homozygosity for the wild-type (WT) allele (C/C). This mutation (c.1411C > T; p.Val471Ile) was not found in a previously reported population of 128 camels, further underscoring its rarity and specificity to the dwarf camel. To validate the presence of this mutation, Sanger sequencing was employed, confirming its presence in the dwarf camel and its absence in the WT camel population.
The FBN2 gene encodes a protein approximately 2950 amino acids in length in camels, a gene that is highly conserved across vertebrates. A multi-species sequence alignment demonstrated the high conservation of residue Val471 in the FBN2 protein among vertebrates (Fig. 4f), suggesting its critical role in maintaining the protein's function.
Fig. 4
Phenotypic and molecular characterization of dwarfism in Bactrian camels. a, Dwarf camel (right) alongside its mother (left), showing proportional size reduction. b, Age-matched comparison between dwarf (left) and normal camel (right; non-sibling, same birth date). c, Dorsal view highlighting skeletal proportions. d-f, Growth parameters at 3 months: d, Basic morphometrics; e, Body weight (***P < 0.001, t-test); f, Serum growth hormone levels (ng/mL). g, Sanger chromatograms confirming the FBN2 missense mutation (c.1411C > T, p.Val471Ile; reference transcript XM_032476642.1). h, Cross-species alignment demonstrating evolutionary conservation of Val471 (highlighted) in FBN2 proteins.
Click here to Correct
Discussion
Since 2012, our research team has achieved the first complete sequencing and decoding of the camel genome. In recent years, several versions of the camel genome have been previously published, encompassing the domesticated Bactrian camel 1, wild Bactrian camel 18, and dromedary 19. However, previous versions relied on second-generation sequencing data for assembly, which posed challenges in handling highly repetitive and complex structural regions, leading to significant gaps and assembly errors in the genome. We here adopted state-of-the-art third-generation single-molecule long-read sequencing technology, combined with Hi-C and NGS platforms. This approach allowed us to successfully assemble the first T2T-level Bactrian camel genome. Specifically, PacBio HiFi sequencing provided high-precision sequences with an average read length of 18 Kb, while Ultra-long ONT sequencing offered an average read length of 60 Kb. These technologies demonstrated clear advantages in assembling repetitive sequences, telomeres, and centromeres within the camel genome, as previously validated in other species' genome assembly projects 6,20,21.
The Camelus.1.0 genome assembly substantially surpasses its predecessors in terms of assembly quality. It stands out as the only one that has successfully assembled a complete set of 38 chromosomes, including 36 autosomes along with the X and Y chromosomes in the camel genome. In addition, genome continuity, often assessed through contig N50 size, is significantly higher in our assembly. The contig N50 for our camel genome assembly stands at 71,572 kb, far exceeding other previous versions. Furthermore, our genome exhibits only seven remaining gaps, whereas the best-assembled camel genome assembly to date, BCGSAC_Cfer_1.0, still harbors 2,345 gaps scattered throughout the genome. Additionally, the contig/chromosome ratio (CC ratio), defined as the ratio of contig counts to the chromosome pair number, is substantially lower in our assembly compared to other versions (Table 1). This CC ratio serves as an intuitive indicator of contiguity, regardless of contig length or the inherent length of chromosome DNA 15.
China is one of the primary regions for the distribution of Bactrian camels. In 2021, the national camel population was approximately 462,000, and by 2022, the national camel population increased to 532,000, reflecting a year-on-year growth rate of 15.33% (Supplementary Fig. 5).
A
China's Bactrian camel population consists of five domesticated local breeds, including Alashan Bactrian camels, Sonid Bactrian camels, Qinghai Bactrian camels, Tarim Bactrian camels, and Junggar Bactrian camels according to the Monograph on the Genetic Resources of Livestock and Poultry in China, 2011. Population structure analysis revealed clear interspecies differentiation among Camelus species, yet failed to resolve distinct genetic clusters among Chinese Bactrian camel breeds. This limited substructure could reflect either: historical misclassification of phenotypically distinct populations as discrete breeds despite shared genetic ancestry, or recent admixture eroding breed-specific signatures through uncontrolled crossbreeding. However, Gobi red was separated from Alashan breed, which indicated that Gobi red camel could be classified as one breed in China.
The milk yield of Bactrian camels is notably lower than that of other dairy livestock, and it is significantly less than that of their single-humped counterparts. This disparity can be attributed to several factors. The majority of Bactrian camels in our country are still raised in a pastoral state, which is characterized by low levels of intensification, complex population structures, and a lack of systematic breeding programs. As a result, there is a pressing need to apply modern molecular breeding techniques to expedite the breeding process and enhance productivity. Analyzing the genetic characteristics of economic traits at the whole-genome level, along with utilizing molecular markers in assisted breeding and genomic selection, has already found widespread use in livestock breeding and production for species such as cattle 22,23, pigs24, and chickens25. Efficient and precise genetic markers, offering rich and stable genetic variation, hold tremendous promise in livestock breeding research.
Our study successfully identified multiple SNPs associated with milk production traits in Bactrian camels, including milk yield, milk fat percentage, lactose percentage, milk protein percentage, non-fat solids percentage, ash content, and milk density. These SNPs were found to be significantly correlated with 42 candidate genes, including ACSL3, COL23A1, ITGA9, PDGFD, CXXC4, FARP1, CACNA2D1. Notably, some candidate genes have been proven to be associated with milk production traits in cattle and sheep. For example, ACSL3 has been extensively studied in cattle, where it has been linked to milk fat composition and fatty acid metabolism. GWAS analyses in dairy cattle have highlighted its significant association with multiple milk fatty acid traits, suggesting its involvement in lipid biosynthesis and adipocyte differentiation26. Functional studies further support its role in lipid accumulation in bovine adipocytes27. Given these findings, ACSL3 may also play a crucial role in regulating milk composition in Bactrian camels, providing a reference for future studies on camel lactation genetics. Similarly, ITGA9 can be highlighted for its dual role in milk composition regulation and immune function in dairy cattle. The observed associations between ITGA9 polymorphisms and milk quality traits suggest that it may contribute to variation in milk protein and fat content28. Moreover, its involvement in immune response raises interesting questions about potential trade-offs between metabolic functions and immunity in dairy cows29. The CACNA2D1 gene has been identified in the analysis of multiple traits. Studies have shown that polymorphisms in the CACNA2D1 gene are associated with mastitis resistance and milk production traits. The GG genotype is linked to lower somatic cell counts, reduced susceptibility to mastitis, and higher milk yield, making it a potential marker for dairy cattle breeding30,31. Although these genes have been shown to influence milk production in cattle and sheep, their role in Bactrian camels remains to be further investigated. Given the distinct composition of camel milk, particularly in fat and protein structure, these genes may regulate lactation traits through different mechanisms in camels.
Milk production traits are complex quantitative traits regulated by multiple genes. Currently, only a limited number of related genes have been discovered, including milk protein genes, prolactin genes, growth hormone and growth hormone receptor gene 32, DGAT33, and pituitary-specific transcription factors 34. Reports on genes associated with camel milk production traits are even scarcer and primarily revolve around polymorphic sites in milk protein genes affecting milk fat and milk protein percentages35. These studies have identified multiple polymorphic sites on these genes that are closely associated with milk production traits at different stages of lactation36. Due to the relatively small population of domestic Bactrian camels, mostly raised in free-ranging systems, obtaining large-scale samples has proven challenging. This limitation has affected the selection of candidate genes and loci in our genome-wide association analysis of milk production traits. Future research will aim to validate these findings in larger populations. Nevertheless, the discovery of genes and molecular genetic markers related to milk production traits in Bactrian camels not only enhances our understanding of the genetic basis of camel milk production but also lays the foundation for genetic improvement efforts.
In the field of agricultural animal breeding, growth traits are economically significant, and related research has always been a hot topic. Numerous key genes affecting growth traits have been identified in species such as pigs, cattle, sheep, and chickens37. These genes include GH, GHR, IGF1, POU1F1, MRFs, MSTN38, IGF2BP139, and Leptin, among others. Notably, genes like FGFR3, SHOX, HMGA2, ADAMTS17, and ACAN have been associated with increased height. FGFR3 plays a critical role in individual growth and development40, negatively regulating the proliferation and differentiation of chondrocytes to influence bone growth. Variations in this gene are linked to height-related conditions such as dwarfism41. SHOX gene defects are associated with conditions like Turner syndrome and Leri-Weill syndrome. ADAMTS17, through its regulation of the BMP-Smad1/5/8 pathway, is involved in bone formation, affecting skeletal proportions and leading to growth retardation42.
In our study, we also focused on a rare dwarf Bactrian camel as a research model to explore key genetic factors influencing camel height development. Comparative genomic analysis revealed significant genetic differences in the FBN2 gene between dwarf and normal camels. In humans and mice 43, mutations in the FBN2 gene are associated with symptoms such as short stature, hand abnormalities, and scoliosis, aligning with the observed symptoms in our dwarf camels. However, the mismatched variations we discovered are heterozygous, suggesting that the FBN2 gene may be a key factor influencing camel growth traits. This finding provides an important genetic marker for further research on camel growth traits and holds promise for future applications in genetic improvement efforts.
Materials and Methods
Dataset Collection
We collected camels genomic sequencing data from a total of 350 individuals: 1) Samples required for the construction of the T2T genome of the Bactrian camel, obtained from one adult healthy male domesticated Bactrian camel. 2) Samples for the genome-wide association study on milk production traits, obtained from 162 lactating Bactrian camels in the Inner Mongolia region. 3) Samples for the identification of pathogenic genes causing dwarfism in Bactrian camels, obtained from a 3-month-old female congenital dwarf calf camel and its mother, along with age-matched normal control calves. 4) Samples used for population genetics analysis involved in whole-genome sequencing data of 163 camels from different regions of Asia, downloaded from the NCBI database. Additionally, the genomes of 20 Gobi Red camels and one Dromedary camel were sequenced. Detailed information on all sequencing samples, including sample names, camel breeds, species, gender, sample types, and sequencing types, is provided in Supplementary Table 4.
Sample Preparation
For the comprehensive assembly of the Bactrian camel T2T genome, we systematically selected a nine-year-old male camel from Inner Mongolia, China. Blood was drawn from the jugular vein following disinfection, collected into ethylenediaminetetraacetic acid (EDTA)-coated tubes, and stored at -80°C, in line with best practices for DNA preservation 44.
To detect milk production associated genes and genetic markers, we collected ear tissue samples from 162 lactating Bactrian camels between 2020 and 2022. The ear tissue sample (0.5 cm) of each camel was obtained and stored in 75% alcohol. The Local anesthesia was administered via 5% procaine hydrochloride, followed by aseptic treatment with iodophor and sulfonamide to prevent infection. Post elution with phosphate-buffered saline, samples were cryopreserved at -80°C. Meanwhile, we recorded seven major camel milk production traits including morning milk yield, milk lactose content, milk protein content, milk fat content, non-fat solids content, ash content, and milk density, as well as detailed factors that may influence milk performance such as species, herd, age, lactation period, milking time and calving time.
The dwarf camel was a female domestic Bactrian camel, born on March 13, 2021, at Bayan Nur City, Inner Mongolia. Her parents were normal and have not produced dwarf camels previously. The basic physical indicators, including height, foreleg length, hindleg length, head length, tail length, and weight of her and a normal non-sibling camel (control) born on the same date were measured at the age of three month. Their blood was also collected to measure the blood growth hormone concentrations using an enzyme-immunoassay at Beijing sino-uk institute of biological technology. We collected the ear samples of the dwarf camel, her mother and the control camel to construct NGS library and sequencing.
To conduct the population genetic analyses, we further obtained the genome sequencing dataset of 146 camels comprised domestic Bactrian camels, wild Bactrian camels and dromedaries from previous research 45. The blood samples of twenty Gobi Red camels and the ear tissue sample of one dromedary camel were collected and stored for genome resequencing.
Genomic Sequencing
A
A
Genomic DNA from the blood and the ear tissues were extracted using the Omega Blood DNA kit and Tissues DNA kit (Omega Bio-Tek, Norcross, GA, USA). The quality and integrity of DNA was controlled by OD260/280 ratio and agarose gel electrophoresis. For PacBio long-read sequencing, a standard 20 kb SMRTbell libraries were constructed according to the PacBio’s protocol and sequenced on a PacBio Sequel II system (Pacific Biosciences, CA, USA). For ultra-long ONT sequencing, a standard 60 kb libraries were prepared according to the manufacturers’ recommendations and sequenced on the Nanopore PromethION platform (Oxford Nanopore Technologies, Oxford, UK). For Hi-C sequencing, the sample was subjected to a formaldehyde cross-linking reaction for Hi-C library preparation and subsequent sequenced using Illumina HiSeq platform. For all next-generation sequencing in the present study, the 150 bp paired-end sequence libraries were construction for running on the Illumina HiSeq platform.
Genome Assembly and Quality Assessment
Genome preliminary characteristics of Bactrian camel was estimated using the K-mer method based on short reads data 46. The distribution of K-mer was evaluated by Jellyfish 47, then GenomeScope 48 used to assess the genome size, heterozygosity, and repetitive sequences.
A
Pre-assembly of the Bactrian camel genome employed a strategy combining Hifiasm 49 and NextDenovo 50. Initially, the draft contig genome was assembled using different combinations: PacBio HiFi reads, HiFi reads + Hi-C reads, and HiFi reads + ONT ultra-long reads + Hi-C reads, respectively. We selected the optimal outcome assembled with the strategy of HiFi + ONT ultra-long + Hi-C data according to the Contig N50 and N90. Afterwards, run_purge_dups.py 51 program was used to eliminate duplicate contigs. To correct errors generated through ONT ultra-long sequencing, assembly correction was performed using Hifi data. This involved counting k-mer occurrences by meryl 52, aligning the assembled genome with the Hifi data using winnowmap 53, followed by secondary filtering and removal of chimeric alignments using the falconc 4. Three rounds of correction were conducted using the racon 54 to obtain the final contigs. With the assistance of Chromap 55 and yahs 56 suites, we mounted the contigs to scaffolds by using Hic data.
The assembled scaffolds were aligned to reference genome of Camelus ferus (BCGSAC_Cfer_1.0) to determine the corresponding chromosome for each scaffold. It was found that the longest 37 scaffolds could be matched with 37 chromosomes of the wild Bactrian camel one-to-one, except the Y chromosome. Hence, taking the Y chromosome of human (GRCh38.p14) as template, the remaining 23 scaffolds merged and joined to assemble the Bactrian camel’s Y chromosome with ragtag.py program 57.The GapFiller of quarTeT 58 software was used for gap filling base on the scaffold assembly. Additionally, TGS- GapCloser 59 was employed with ONT ultra-long reads.
We utilized metaeuk 60 and BUSCO 61 to assess the completeness of camel genome assembly. The genome continuity was evaluated by estimating contig N50 and N90 length with Quast 47 and calculating contig/chromosome ratio. Consensus quality value (QV) based on k-mer was predicted by Merqury 52 to evaluate the correctness of this assembly version. Illumina short reads of camels were mapped to the assembled genome, and mapping rates and coverage were calculated to evaluate the accuracy of the genome 62. Genomic interactions and its heatmap were calculated and plotted with Juice 63 and Juicebox 64.
For telomere identification, animal telomeric sequences (TTAGGG) were searched across the genome using by TeloExplorer 58. The tandem repeats were first identified through EDTA software, and then the CentroMiner 58 was used for centromere identification. Approximate locations of centromeric regions were estimated by the frequency of all candidate centromeric tandem repeats.
Genome Annotation
RepeatMasker 65 was applied to annotate repetitive sequence elements within the Bactrian camel genome. After masking the repeat sequences in the genome, Liftoff 66 was engaged to predict protein-coding genes and non-protein-coding sequences, utilizing the BCGSAC_Cfer_1.0 genome assembly and its associated annotation data as references. The annotation results were integrated to obtain a complete genomic structure of Bactrian camel.
Variant Detection
Illumina clean data of all camel samples were aligned to the genome of BCGSAC_Cfer_1.0 assembly and our assembled camel genome by using Burrows-Wheeler Aligner (BWA) 67, respectively. The GATK program ((www.broadinstitute.org/gatk/)) was respectively used to call high quality single-nucleotide polymorphism (SNP) and insertion or deletion (InDel). They were then filtered using the GATK variant filter module with a hard filter setting68. After that, we used all high-quality SNPs from camel to construct a variant database. The variant databases identified in the previously published genomes and those constructed in this study were compared to validate the reliability and completeness of the current genome.
Population Genetics Analysis
Variants quality control was performed using vcftools 69 with the filtering criteria: 1) variants with a minor allele count less than 3; 2) variants with the minor allele frequency is less than 0.01; 3) variants of the average depth of coverage less than 5; 4) Variants with a quality score below 30; 5) the proportion of missing data more than 95%. After that, we generated a set of SNPs for the following analyses. First, an unrooted neighbor-joining (NJ) tree was constructed for all the samples based on the p-distance matrix using the VCF2Dis (https://github.com/BGI-shenzhen/VCF2Dis). The NJ tree was visualized using FastMe 2.0. Principal component analysis (PCA) of camel population was performed with the plink270
A
The kinship relationship between individuals was estimated according to the kinship coefficient calculated by GCTA, and the kinship matrix heat map was drawn using R software.
Windowed nucleotide diversity (π) and Tajama’D statistics was performed within 100 kb sliding windows using the vcftools 69 and visualized as a boxplot by R package ggplot2 71.
Genome-wide Association Analysis
Quality control for variants dataset was performed using PLINK1.9 72 for individuals with a call rate ≥ 97%. The SNP were selected with the criteria of genotyping call rate ≥ 95%, minor allele frequency ≥ 0.05, P-value of chi-squared test of Hardy-Weinberg equilibrium ≥ 5 x 10− 6. The missing genotype was filled with Beagle 73.
Phenotype data set was controlled with Mean ± 2SD (Standard Deviation). Factor that may influence camel milk performance was grouped: 1) three Bactrian camel breeds included Alxa Bactrian camel, Gobi red camel, and Sunit Bactrian camel; 2) three herds are divided based on the feeding method including grazing feed, stabbing feed, and both; 3) Six lactation period, the month after parturition including 3, 6, 7, 12, 15; 4) Year and month when collected and tested milk sample. 5)Three parity levels of the camel including 1 ~ 3, 4 ~ 6, 7 ~ 10; 6) calving time, 2020.2-3 and 2019.2-3; 7) Age of the camel, four levels including 4ཞ7, 8ཞ10, 11ཞ15, and > 15. These seven effect factors were tested through a fixed linear model (Least square mean), only significant (P < 0.05) factors were included in the model as fixed effects to perform GWAS.
We implement genome-wide association tests of markers using fixed and random model circulating probability unification (FarmCPU) methods in rMVP 74R package. The FarmCPU algorithm provides greater power to detect true-positive signals, and the significant associated signals was set at P < 5 x 10− 6. Furthermore, SNP with the lowest P-value in each detected genomic region was selected. At each locus, the frequency of each genotype among camel population were calculated. The LSM of affected lactation trait for the three genotypes were compared, and the significance threshold was set at the Bonferroni corrected P-value < 0.05. The plots were generated using the ggplot2 package in R software 71. Annotation of these loci was achieved using SnpEff 75 followed by GO and KEGG analyses via the Metascape 76.
Dwarfism Gene Detection
NGS dataset of the dwarf camel was processed following above method. After obtained the genetic variations, we conducted comparative genomic analyses by comparing the variations of the dwarf camel with those of other normal camels in the variant database, aiming to identify specific genotypes unique to the dwarf camel. These unique genotypes in dwarf camel were annotated using SnpEff 75. The high and moderate effect mutations were considered as candidate variants. Causal variants associated with dwarfism were identified through comparative genomic analysis among different, and PCR was employed for validation, following standard protocols for mutation confirmation.
The Primers used for Validation of candidate mutation were designed using the NCBI Primer-Blast (NCBI Web site). Genomic DNA of the dwarf camel and one normal domestic Bactrian were used for polymerase chain reaction (PCR). PCR product size was 440 bp. PCR steps were 94˚C, 5 min for 1 cycle, 94˚C, 30 sec, 60˚C, 30 sec, 72˚C, and 30 sec for 35 cycles. The primers designed for mutations in FBN2 genes were F: 5’- ATGATCAGGGCAGCTGCAAA − 3’; R: 5’ - TCCTGGAGGCAATGGCTTTT-3’. After gel recovery and purification, gene sequence analysis was performed by two-way sequencing and then aligned with standard sequences.
Protein sequence of FBN2 from camel, human, mouse, rat, cattle, pig was obtained from the NCBI site. ClustalW in MEGA 77 was used to align full-length protein sequences of FBN2 from these species. The sequence logos were generated by online software WEBLOG 78.
Data availability
The camel genome assembly, as reported in this paper, has been deposited in GenBank under the Bioproject PRJNA991267 with the accession number JAUIZL000000000.
A
Acknowledgement
This project was financially supported by the Specific Project of Shanghai Jiao Tong University for "Invigorating Inner Mongolia through Science and Technology" (Grant No. KJXM2023-02-01), and by the Science and Technology Project of Inner Mongolia for research on "Application of Molecular Genetic Markers for Milk Production Traits in Bactrian Camels" (Grant No. 2019GG363). The authors acknowledge the contributions of these funding sources to the advancement of this research.
Institutional Review Board Statement
A
Experimental protocols including animal care and experimental activities were ap-proved by the Animal Ethics Committee at Shanghai Jiao Tong University China, approval No. 202104002 (12 April 2021).
A
Author Contribution
J.J.L., H.M., P.P.M. and L.M.D. contributed to the study's conception and design. J.J.L., H.Z., and C.Y.Z. performed the genome assembly and data analysis. W.H.Y., R.T.W., X.S.W., H.Z., Z.M.C., Y.F.W., Q.J.C., W.Y.W., D.A., S.Y.Q. and W.B.Z. collected the samples. J.S.Z., C.Q., L.Y.W., F.W., J.J.L., M.H.W., J.Z., Z.M.D., S.P.Z., Y.H.L., W.Q.Z., and J.L.H., were involved in material preparation, data collection, and DNA extraction. H.M., P.P.M. and L.M.D. evaluated the study's quality. J.J.L., C.Y.Z., W.H.Y. wrote and edited the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare no competing interests.
Supplementary files
Supplementary Fig. 1 | The distribution of the K-mer of the domestic Bactrian camel.
Supplementary Fig. 2 | Genomic collinearity comparison between the domestic Bactrian camel and the wild Bactrian camel.
Supplementary Fig. 3 | Comparison of different variant types in Bactrian camel genome.
Supplementary Fig. 4 | The Manhattan plot and QQ plot of GWAS for Ash, Density, Fat, Lactose, Morning milk, Non fat, Protein, respectively.,
Supplementary Fig. 5 | Statistics of Bactrian camel population in China of 2011–2021.
Supplementary Table 1 | Statistics of the PacBio HiFi and ONT ultra-long sequencing data.
Supplementary Table 2 | Statistics of the Hi-C sequencing and NGS sequencing clean data.
Supplementary Table 3 | Genome pre-assembly parameter statistics of Bactrian camel.
Supplementary Table 4 | Genomic structure annotation of Bactrian Camel.
Supplementary Table 5 | Details of all sequencing samples in this study.
Supplementary Table 6 | Descriptive statistics of lactation traits in Bactrian camel.
Supplementary Table 7 | Significant SNPs identified with camel milk production traits.
Supplementary Table 8 | Significant genes with camel milk production traits.
A
Data Availability
The camel genome assembly, as reported in this paper, has been deposited in GenBank under the Bioproject PRJNA991267 with the accession number JAUIZL000000000.
Reference
1.
Consortium TBCG. S. a. A. Genome sequences of wild and domestic bactrian camels. Nat Commun. 2012;3:1202.
2.
Wu H, et al. Camelid genomes reveal evolution and adaptation to desert environments. Nat Commun. 2014;5:5188. 10.1038/ncomms6188.
3.
Ming L, et al. Chromosome-level assembly of wild Bactrian camel genome reveals organization of immune gene loci. Mol Ecol Resour. 2020;20. 10.1111/1755-0998.13141.
4.
Wenger A, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37:1155–62. 10.1038/s41587-019-0217-9.
5.
Shafin K, et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat Biotechnol. 2020;38:1044–53. 10.1038/s41587-020-0503-6.
6.
Nurk S, et al. The complete sequence of a human genome. Science. 2022;376:44–53. 10.1126/science.abj6987.
7.
Aganezov S, et al. A complete reference genome improves analysis of human genetic variation. Science. 2022;376:eabl3533. 10.1126/science.abl3533.
8.
Wu H, et al. Telomere-to-telomere genome assembly of a male goat reveals variants associated with cashmere traits. Nat Commun. 2024;15:10041. 10.1038/s41467-024-54188-z.
9.
Huang Z, et al. Evolutionary analysis of a complete chicken genome. Proc Natl Acad Sci USA. 2023;120:e2216641120. 10.1073/pnas.2216641120.
10.
Zhang T, et al. Comparison of Long-Read Methods for Sequencing and Assembly of Lepidopteran Pest Genomes. Int J Mol Sci. 2022;24. 10.3390/ijms24010649.
11.
Khan MZ, et al. Research Development on Anti-Microbial and Antioxidant Properties of Camel Milk and Its Role as an Anti-Cancer and Anti-Hepatitis Agent. Antioxid (Basel). 2021;10. 10.3390/antiox10050788.
12.
Zhang H, et al. Changes in chemical composition of Alxa bactrian camel milk during lactation. J Dairy Sci. 2005;88:3402–10. 10.3168/jds.S0022-0302(05)73024-1.
13.
Bianchine JW, Risemberg HM, Kanderian SS, Harrison HE. Camptomelic dwarfism. Lancet. 1971;1:1017–8. 10.1016/s0140-6736(71)91413-9.
14.
Schwarzenbacher H, et al. A frameshift mutation in GON4L is associated with proportionate dwarfism in Fleckvieh cattle. Genet Sel Evol. 2016;48. 10.1186/s12711-016-0207-z.
15.
Wang P, Wang F. A proposed metric set for evaluation of genome assembly quality. Trends Genet. 2023;39:175–86. 10.1016/j.tig.2022.10.005.
16.
Du Q, et al. The Molecular Genetics of Marfan Syndrome. Int J Med Sci. 2021;18:2752–66. 10.7150/ijms.60685.
17.
Huang J, Huang J, Li N, Wang L, Xiao Q. FBN2 promotes the proliferation, mineralization, and differentiation of osteoblasts to accelerate fracture healing. Sci Rep. 2025;15:4843. 10.1038/s41598-025-89215-6.
18.
Ming L, et al. Chromosome-level assembly of wild Bactrian camel genome reveals organization of immune gene loci. Mol Ecol Resour. 2020;20. 10.1111/1755-0998.13141.
19.
Elbers JP, et al. Improving Illumina assemblies with Hi-C and long reads: An example with the North African dromedary. Mol Ecol Resour. 2019;19:1015–26. 10.1111/1755-0998.13020.
20.
Song J-M, et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol Plant. 2021;14:1757–67.
21.
Belser C, et al. Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing. Commun biology. 2021;4:1047.
22.
Weller J, Ezra E, Ron M. Invited review: A perspective on the future of genomic selection in dairy cattle. J Dairy Sci. 2017;100:8633–44.
23.
Wiggans GR, Cole JB, Hubbard SM, Sonstegard TS. Genomic selection in dairy cattle: the USDA experience. Annu Rev Anim Biosci. 2017;5:309–27.
24.
Yang A-Q, Chen B, Ran M-L, Yang G-M, Zeng C. The application of genomic selection in pig cross breeding. Yi Chuan = Hereditas. 2020;42:145–52.
25.
Misztal I, Lourenco D, Legarra A. Current status of genomic evaluation. J Anim Sci. 2020;98:skaa101.
26.
Li C, et al. Genome wide association study identifies 20 novel promising genes associated with milk fatty acid traits in Chinese Holstein. PLoS ONE. 2014;9:e96186. 10.1371/journal.pone.0096186.
27.
Lv Y, et al. Effect of ACSL3 Expression Levels on Preadipocyte Differentiation in Chinese Red Steppe Cattle. DNA Cell Biol. 2019;38:945–54. 10.1089/dna.2018.4443.
28.
Zhang M, et al. Polymorphisms of ITGA9 Gene and Their Correlation with Milk Quality Traits in Yak (Bos grunniens). Foods. 2024;13. 10.3390/foods13111613.
29.
Zhang B, et al. Store-operated Ca(2+) entry-sensitive glycolysis regulates neutrophil adhesion and phagocytosis in dairy cows with subclinical hypocalcemia. J Dairy Sci. 2023;106:7131–46. 10.3168/jds.2022-22709.
30.
Yuan ZR, et al. Single nucleotide polymorphism of CACNA2D1 gene and its association with milk somatic cell score in cattle. Mol Biol Rep. 2011;38:5179–83. 10.1007/s11033-010-0667-0.
31.
Magotra A, et al. Candidate SNP of CACNA2D1 Gene Associated with Clinical Mastitis and Production Traits in Sahiwal (Bos taurus indicus) and Karan Fries (Bos taurus taurus × Bos taurus indicus). Anim Biotechnol. 2019;30:75–81. 10.1080/10495398.2018.1437046.
32.
Bordonaro S, et al. Effect of GH p. L127V polymorphism and feeding systems on milk production traits and fatty acid composition in Modicana cows. Animals. 2020;10:1651.
33.
Thaller G, et al. Effects of DGAT1 variants on milk production traits in German cattle breeds. J Anim Sci. 2003;81:1911–8.
34.
Heidari M, Azari M, Hasani S, Khanahmadi A, Zerehdaran S. Effect of polymorphic variants of GH, Pit-1, and β-LG genes on milk production of Holstein cows. Russian J Genet. 2012;48:417–21.
35.
Nowier AM, Ramadan SI. Association of β-casein gene polymorphism with milk composition traits of Egyptian Maghrebi camels (Camelus dromedarius). Archives Anim Breed. 2020;63:493–500.
36.
Amandykova M, Dossybayev K, Mussayeva A, Bekmanov B, Saitou N. Comparative analysis of the polymorphism of the casein genes in camels bred in Kazakhstan. Diversity. 2022;14:285.
37.
Mohammadabadi M, Bordbar F, Jensen J, Du M, Guo W. Key genes regulating skeletal muscle development and growth in farm animals. Animals. 2021;11:835.
38.
Hickford J, et al. Polymorphisms in the ovine myostatin gene (MSTN) and their association with growth and carcass traits in New Zealand Romney sheep. Anim Genet. 2010;41:64–72.
39.
Wang K, et al. The Chicken Pan-Genome Reveals Gene Content Variation and a Promoter Region Deletion in IGF2BP1 Affecting Body Size. Mol Biol Evol. 2021;38:5066–81. 10.1093/molbev/msab231.
40.
Matsushita M, et al. Meclozine promotes longitudinal skeletal growth in transgenic mice with achondroplasia carrying a gain-of-function mutation in the FGFR3 gene. Endocrinology. 2015;156:548–54.
41.
Bonaventure J, et al. Common mutations in the fibroblast growth factor receptor 3 (FGFR3) gene account for achondroplasia, hypochondroplasia, and thanatophoric dwarfism. Am J Med Genet. 1996;63:148–54.
42.
Oichi T, et al. Adamts17 is involved in skeletogenesis through modulation of BMP-Smad1/5/8 pathway. Cell Mol Life Sci. 2019;76:4795–809.
43.
Peeters S, De Kinderen P, Meester JAN, Verstraeten A, Loeys BL. The fibrillinopathies: New insights with focus on the paradigm of opposing phenotypes for both FBN1 and FBN2. Hum Mutat. 2022;43:815–31. https://doi.org/10.1002/humu.24383.
44.
Wong PB et al. Tissue sampling methods and standards for vertebrate genomics. Gigascience 1, 2047-2217X-2041-2048 (2012).
45.
Ming L, et al. Whole-genome sequencing of 128 camels across Asia reveals origin and migration of domestic Bactrian camels. Commun biology. 2020;3:1. 10.1038/s42003-019-0734-6.
46.
Salzberg SL, et al. A critical evaluation of genome assemblies and assembly algorithms. Genome Res 22. 2012;GAGE:557–67. 10.1101/gr.131383.111.
47.
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–5. 10.1093/bioinformatics/btt086.
48.
Vurture GW, et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33:2202–4. 10.1093/bioinformatics/btx153.
49.
Cheng H, Concepcion G, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18:170–5. 10.1038/s41592-020-01056-5.
50.
Hu J, Fan J, Sun Z, Liu S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics. 2020;36:2253–5. 10.1093/bioinformatics/btz891.
51.
Roach MJ, Schmidt SA, Borneman AR. Purge Haplotigs: Allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19 (2018).
52.
Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:245. 10.1186/s13059-020-02134-9.
53.
Jain C, Rhie A, Hansen NF, Koren S, Phillippy AM. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat Methods. 2022;19:705–10. 10.1038/s41592-022-01457-8.
54.
Vaser R, Sović I, Nagarajan N, Šikić M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017;27:737–46. 10.1101/gr.214270.116.
55.
Zhang H et al. Fast alignment and preprocessing of chromatin profiles with Chromap. Cold Spring Harbor Lab (2021).
56.
Zhou C, McCarthy SA, Durbin R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics. 2023;39. 10.1093/bioinformatics/btac808.
57.
Alonge M, Soyk S, Ramakrishnan S, Wang X, Schatz MC. RaGOO: Fast and accurate reference-guided scaffolding of draft genomes. Genome Biol 20 (2019).
58.
Lin Y, et al. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic Res. 2023;10. 10.1093/hr/uhad127.
59.
Xu M, Guo L, Gu S, Wang O, Liu X. TGS-GapCloser: Fast and accurately passing through the Bermuda in large genome using error-prone third-generation long reads. (2019).
60.
Karin EL, Mirdita M, Sding J. MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics. Microbiome 8 (2020).
61.
Seppey M, Manni M, Zdobnov EM. BUSCO: Assessing Genome Assembly and Annotation Completeness. Methods Mol Biol 1962, 227–245. 10.1007/978-1-4939-9173-0_14 (2019).
62.
Deng Y, et al. A telomere-to-telomere gap-free reference genome of watermelon and its mutation library provide important resources for gene discovery and breeding. Mol Plant. 2022;15:1268–84. 10.1016/j.molp.2022.06.010.
63.
Durand NC, et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 2016;3:95–8.
64.
Neva C et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst (2016).
65.
Tempel S. Using and understanding RepeatMasker. Methods Mol Biol. 2012;859:29–51. 10.1007/978-1-61779-603-6_2.
66.
Shumate A, Salzberg SL. Liftoff: accurate mapping of gene annotations. Bioinformatics. 2021;37:1639–43. 10.1093/bioinformatics/btaa1016.
67.
Li H, Durbin R. Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform. Bioinformatics. 2009;25:1754–60.
68.
DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–8. 10.1038/ng.806.
69.
Danecek P, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–8. 10.1093/bioinformatics/btr330.
70.
Chang CC, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4. 10.1186/s13742-015-0047-8.
71.
Klaus. & Galensa. ggplot2: elegant graphics for data analysis (2nd ed.). Computing Reviews (2017).
72.
Chang CC et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience,4,1(2015-02-25), 7 (2015).
73.
Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet. 2007;81:1084–97. 10.1086/521987.
74.
Yin L, et al. A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool for Genome-wide Association Study. Genom Proteom Bioinform. 2021;19:619–28. https://doi.org/10.1016/j.gpb.2020.10.007. rMVP.
75.
Cingolani P. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 1–13 (2012).
76.
Zhou Y, et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10:1523. 10.1038/s41467-019-09234-6.
77.
Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol Biol Evol. 2018;35:1547–9. 10.1093/molbev/msy096.
78.
Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–90. 10.1101/gr.849004.
Click here to Correct
Click here to Correct
Click here to Correct
Click here to Correct
Abstract
Camels play a crucial role in supporting livelihoods and productivity in desert regions. Despite their importance, and our understanding of their genetic composition remains limited. To address this gap, we constructed a telomere-to-telomere (T2T) reference genome for Bactrian camels and carried out a comprehensive genetic analysis concentrating on lactation traits and dwarf defect. The newly assembled genome spans 2.46 Gb in size with a contig N50 of 71.57 Mb and successfully organized into 22 T2T chromosomes with only 7 gaps remaining. Utilizing this comprehensive genome, we detected a total of 33,076,298 genetic variations across the Bactrian camel genome, increasing the effective variant calling rate by 28.14%. A genome-wide association study of 148 lactating Bactrian camels identified 467 significant variations (P 5 ×10-6) associated with lactation traits, implicating 42 important candidate genes, including ACSL3, COL23A1, ITGA9, PDGFD, CXXC4, FARP1, CACNA2D1. Additionally, utilizing a dwarf Bactrian camel as a disease case, we identified the FBN2 gene as a critical regulator of growth characteristics intrinsic to Bactrian camels. A mutation occurring at the site c.1411C>T, resulting in an amino acid substitution from valine to isoleucine(Val >Ile), was found to underlie the dwarf phenotype. In conclusion, this study provides the most complete and detailed genome sequence assembly for Bactrian camels to date, highlighting significant genes and mutation loci affecting milk production and dwarf defect. These findings offer valuable insights for advancing genetic research on camel biology and enhancing economically important traits through selective breeding.
Total words in MS: 7444
Total words in Title: 17
Total words in Abstract: 238
Total Keyword count: 4
Total Images in MS: 8
Total Tables in MS: 1
Total Reference count: 78