Reference
1.ECA & WFP (2023) Africa - Regional Overview of Food Security and Nutrition 2023: Statistics and Trends. FAO, Accra
2.Herniter IA, Muñoz-Amatriaín M, Close TJ (2020) Genetic, textual, and archeological evidence of the historical global spread of cowpea (Vigna unguiculata (L.) Walp). Legume Sci 2:e57
3.Carvalho M et al (2017) Cowpea: a legume crop for a challenging environment. J Sci Food Agric 97:4273–4284
4.Abebe BK, Alemayehu MT (2022) A review of the nutritional use of cowpea (Vigna unguiculata L. Walp) for human and animal diets. J Agric Food Res 10:100383
5.Fiscus CJ et al (2024) The pattern of genetic variability in a core collection of 2,021 cowpea accessions. G3-GENES GENOM. GENET. 14:jkae071
6.Wu X et al (2024) Differential selection of yield and quality traits has shaped genomic signatures of cowpea domestication and improvement. Nat Genet 56:992–1005
7.Mahalakshmi V et al (2007) Cowpea [Vigna unguiculata (L.) Walp.] core collection defined by geographical, agronomical and botanical descriptors. Plant Genet Resour 5:113–119
8.Swarup S et al (2021) Genetic diversity is indispensable for plant breeding to improve crops. Crop Sci 61:839–852
9.Muñoz-Amatriaín M et al (2017) Genome resources for climate-resilient cowpea, an essential crop for food security. Plant J 89:1042–1054
10.Pan L et al (2023) Comprehensive genomic analyses of Vigna unguiculata provide insights into population differentiation and the genetic basis of key agricultural traits. Plant Biotechnol J 21:1426–1439
11.Liu Y et al (2020) Pan-Genome of Wild and Cultivated Soybeans. Cell 182:162–176
12.Zhou Y et al (2022) Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606:527–534
13.Tao Y et al (2021) Extensive variation within the pan-genome of cultivated and wild sorghum. Nat Plants 7:766–773
14.Yuan Y et al (2021) Current status of structural variation studies in plants. Plant Biotechnol J 19:2153–2163
15.Li W et al (2022) Plant pan-genomics: recent advances, new challenges, and roads ahead. J Genet Genomics 49:833–846
16.Liang Q et al (2024) A view of the pan-genome of domesticated Cowpea (Vigna unguiculata [L.] Walp). Plant Genome 17:e20319
17.Lonardi S et al (2019) The genome of cowpea (Vigna unguiculata [L.] Walp). Plant J 98:767–782
18.Liang L et al (2022) Genome and pan-genome assembly of asparagus bean (Vigna unguiculata ssp. sesquipedialis) reveal the genetic basis of cold adaptation. Front Plant Sci 13:1059804
19.Yang Y et al (2023) A near-complete assembly of asparagus bean provides insights into anthocyanin accumulation in pods. Plant Biotechnol J 21:2473–2489
20.Simão FA et al (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212
21.Rhie A et al (2020) Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21:245
22.Li K et al (2023) Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nat Commun 14:6556
23.Wei C et al (2025) Telomere-to-telomere cowpea genomes reveal insights into centromere evolution in Phaseoleae. Preprint at https://www.biorxiv.org/content/10.1101/2025.09.04.674167v1
24.Chen S et al (2023) Gene mining and genomics-assisted breeding empowered by the pangenome of tea plant Camellia sinensis. Nat Plants 9:1986–1999
25.Mistry J et al (2021) Pfam: The protein families database in 2021. Nucleic Acids Res 49:D412–D419
26.UniProt Consortium (2023) UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res 51:D523–D531
27.Cantalapiedra CP et al (2021) eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol Biol Evol 38:5825–5829
28.Torkamaneh D, Lemay MA, Belzile F (2021) The pan-genome of the cultivated soybean (PanSoy) reveals an extraordinarily conserved gene content. Plant Biotechnol J 19:1852–1862
29.Liu C et al (2022) High-quality genome assembly and pan-genome studies facilitate genetic discovery in mung bean and its improvement. Plant Commun 3:100352
30.Tao Y (2019) Exploring and exploiting pan-genomics for crop improvement. Mol Plant 12:156–169
31.Barker CL et al (2005) Genetic and physical mapping of the grapevine powdery mildew resistance gene, Run1, using a bacterial artificial chromosome library. Theor Appl Genet 111:370–377
32.Barthole G et al (2014) MYB118 represses endosperm maturation in seeds of Arabidopsis. Plant Cell 26:3519–3537
33.Xu B et al (2008) Arabidopsis genes AS1, AS2, and JAG negatively regulate boundary-specifying genes to promote sepal and petal development. Plant Physiol. 146, 566–575
34.Porco S et al (2016) Dioxygenase-encoding AtDAO1 gene controls IAA oxidation and homeostasis in Arabidopsis. Proc. Natl. Acad. Sci. U. S. A. 113, 11016–11021
35.Wang F et al (2019) BES1-regulated BEE1 controls photoperiodic flowering downstream of blue light signaling pathway in Arabidopsis. New Phytol 223:1407–1419
36.Dardick C et al (2013) PpeTAC1 promotes the horizontal growth of branches in peach trees and is a member of a functionally conserved gene family found in diverse plants species. Plant J 75:618–630
37.Hamberger B, Hahlbrock K (2004) The 4-coumarate:CoA ligase gene family in Arabidopsis thaliana comprises one rare, sinapate-activating and three commonly occurring isoenzymes. Proc. Natl. Acad. Sci. U. S. A. 101, 2209–2214
38.Doyle MR et al (2002) The ELF4 gene controls circadian rhythms and flowering time in Arabidopsis thaliana. Nature 419:74–77
39.Liu Z et al (2024) Dual roles of pear EARLY FLOWERING 4 -like genes in regulating flowering and leaf senescence. BMC Plant Biol 24:1117
40.Kohorn BD et al (2006) An Arabidopsis cell wall-associated kinase required for invertase activity and cell growth. Plant J 46:307–316
41.Rieu I et al (2008) The gibberellin biosynthetic genes AtGA20ox1 and AtGA20ox2 act, partially redundantly, to promote growth and development throughout the Arabidopsis life cycle. Plant J. 53, 488–504
42.Gao X et al (2025) Allelic variations in GA20ox3 regulate fruit length and seed germination timing for high-altitude adaptation in Arabidopsis thaliana. Nat Commun 16:5053
43.Bianchi JS et al (2020) Changes in leaflet shape and seeds per pod modify crop growth parameters, canopy light environment, and yield components in soybean. Crop J 8:351–364
44.Meinke DW (2020) Genome-wide identification of EMBRYO-DEFECTIVE (EMB) genes required for growth and development in Arabidopsis. New Phytol 226:306–325
45.Hufford MB et al (2021) De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373:655–662
46.Feng C et al (2025) Genomic and genetic insights into Mendel's pea genes. Nature 642:980–989
47.Guo L et al (2025) Super pangenome of Vitis empowers identification of downy mildew resistance genes for grapevine improvement. Nat Genet 57:741–753
48.Yuan Y et al (2021) Current status of structural variation studies in plants. Plant Biotechnol J 19:2153–2163
49.Akakpo R et al (2020) The impact of transposable elements on the structure, evolution and function of the rice genome. New Phytol 226:44–49
50.Muller HJ (1964) The relation of recombination to mutational advance. Mutat Res 106:2–9
51.Scott AJ, Chiang C, Hall IM (2021) Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res 31:2249–2257
52.Igolkina AA et al (2025) A comparison of 27 Arabidopsis thaliana genomes and the path toward an unbiased characterization of genetic polymorphism. Nat. Genet. Online
53.Han L et al (2025) Identification of novel genomic regions associated with yield-related traits in cowpea (Vigna unguiculata [L.] Walp) landraces. Mol Breed 45:65
Sample selection and sequencing
The 20 cowpea accessions were selected from a large diversity analysis of 9,609 accessions from multiple genebanks and collections globally (unpublished data), including the United States Department of Agriculture (USDA), International Institute of Tropical Agriculture (IITA), Australian Grains Genebank (AGG), the Japanese National Agriculture and Food Research Organization (NARO), the Mozambique Genebank and other previously established collections including the UCR mini-core54
Based on phylogenetic relationships, geographic origin and subgroup clustering information, 20 representative cowpea accessions were selected for genome sequencing. These accessions originated from 11 countries, encompassing Venezuela, Mexico, Thailand, India, Philippines, Afghanistan, Japan, South Africa, Nigeria, Madagascar, and Morocco, representing a broad geographical distribution across the Americas, Asia, and Africa. Genomic DNA isolated from young leaves of the 20 cowpea accessions were used to construct PacBio SMRT libraries (> 10 kb) for each sample, following the recommended standard protocols. The libraries were sequenced on the PacBio Sequel II platform using circular consensus sequencing (CCS) mode, generating 15.01–24.45 Gb of HiFi reads per sample
Genome assembly, annotation and quality assessment
We assembled the genomes of 20 cowpea accessions sequenced with HiFi reads using Hifiasm (v0.20.0-r639) with default parameters55. The resulting contigs were anchored and oriented onto the chromosomes of the FC6 reference genome using the reference-guided scaffolding tool RagTag56 (v2.1.0) with default settings. The quality of genome assemblies was evaluated using BUSCO20 (v5.7.1) with the eudicots_odb10 lineage dataset, Clipping Reveals Assembly Quality22 (CRAQ, v1.0.9-alpha), and Merqury21 (v1.4.1)
TEs were annotated using the Extensive de novo TE Annotator (EDTA v2.2.0), which integrates multiple TE prediction tools57. Briefly, long terminal repeat retrotransposons (LTR-RTs) were initially identified using LTR_FINDER_parallel (v1.0.7) and LTR_HARVEST (v1.1), followed by quality filtering and refinement with LTR_retriever (v2.9.4)58–60. Terminal inverted repeat (TIR) elements were detected using TIR-Learner (v3.0), while Helitron elements were annotated with HelitronScanner61 (v1.1). The remaining TE components were identified using RepeatModeler (v2.0.5) and further annotated through homology-based searches with RepeatMasker (v4.1.5), incorporating the TE library generated by EDTA62,63. The combined results from all tools were integrated to generate the final comprehensive TE annotation. Gene structures were predicted for each genome assembly using the BRAKER3 pipeline64 (v3.0.8), which integrates ab initio gene predictions, transcript evidence, and homologous protein evidence. To improve gene model prediction accuracy, GeneMark-ETP (v1) was first trained using both transcript and protein evidence, followed by training of AUGUSTUS (v3.5.0) based on the GeneMark-ETP predictions65. Protein homology evidence was derived from the UniProt90 database26. The RNA-seq datasets from cowpea (BioProject accession: PRJNA970477), provided by Wu 6, were downloaded from the NCBI and aligned to the soft-masked genomes using HISAT266 (v2.2.1)
The quality of gene annotation was assessed using BUSCO20 (v5.7.1) with the eudicots_odb10 dataset. Functional annotation of predicted protein-coding sequences was performed using BLASTP searches against the Swiss-Prot database. Conserved protein domains were identified using InterProScan67 (v5.45-80.0). Gene Ontology (GO) terms (http://geneontology.org/) and KEGG pathway annotations (https://www.genome.jp/kegg/) were assigned using eggNOG-mapper68
Comparative genomic analysis
To investigate the phylogenetic relationships among the 26 cowpea accessions Vigna mungo was used as an outgroup69. A maximum likelihood phylogenetic tree was constructed based on the concatenated sequences of 14,174 single-copy orthologous genes using the RAxML70 software package (v8.2.12). Pairwise genome synteny analysis was performed using the JCVI71 package (v1.4.22). The syntenic relationship index (SRI) was calculated to quantify the level of synteny between two genomes (https://github.com/Yujiaxin419/SRI-Pipeline)
Gene-based pan-genome analysis
To construct the gene-based pan-genome protein sequences from 26 cowpea accessions were analyzed using OrthoFinder72 (v2.5.5), which applies the Markov Cluster Algorithm (MCL) to group protein-coding genes into orthogroups. The resulting orthogroups were categorized as core (present in all 26 genomes), dispensable (present in 2–25 genomes), or private (unique to a single genome). The nonsynonymous to synonymous substitution ratio (Ka/Ks) for each orthogroup category was estimated using KaKs_Calculator73 (v2.0) with the Yang–Nielsen (YN) method. Nucleotide diversity (π) was calculated with vcftools74 (v0.1.16). Wilcoxon rank-sum tests were performed to assess statistically significant differences among the gene categories
To identify SVs across the 26 assembled cowpea genomes, a whole-genome alignment-based approach was employed. Specifically, 25 genome assemblies were aligned to the FC6 reference genome using Minimap275 (v2.28) with default parameters. The resulting whole-genome alignments were analyzed using SyRI76 (v1.7.0) to detect SVs larger than 49 bp. To merge SVs across all genomes, we applied SURVIVOR77 (v1.0.7) with the following parameters: 1000 1 1 0 0 50, which allows a maximum distance of 1,000 bp between breakpoints for merging SVs across samples
For genotyping SVs in resequenced samples6,10, we constructed a graph-based pan-genome using Minigraph-Cactus78 (v2.9.1), incorporating all 26 assembled cowpea genomes. Resequencing reads were mapped to the graph genome using vg giraffe79 (v1.63.1). Mapping coverage was analyzed using vg pack80 (v1.63.1), and SV genotyping was performed with vg call81 (v1.63.1). Genotype data for each sample were subsequently indexed using tabix (v1.21. https://github.com/tabixio/tabix). Finally, all individual genotype VCF files were merged using bcftools82 (v1.21) for downstream analysis
To minimize potential misclassification caused by subspecies assignment, we retained only individuals with an ancestral component greater than 70% under the ADMIXTURE population structure analysis83,84 at K = 2. As a result, 72 accessions were retained for the Vu group and 283 accessions for the Vs group. To eliminate bias due to unequal sample sizes, we randomly selected 72 accessions from the Vs group to match the Vu group for downstream analyses
Selective sweeps potentially associated with artificial selection were identified by integrating three approaches XP-CLR (v1.1.2), nucleotide diversity ratio (πVu/πVs), and FST. Due to the limited number of SVs, XP-CLR analysis was conducted using only SNP data. XP-CLR85 was run with a 50-kb window size, a 10-kb step size, and a maximum of 200 SNPs per window. Nucleotide diversity (π) and fixation index (FST) were calculated using vcftools74 (v0.1.16) with a sliding window of 50 kb and a step size of 10 kb. For each method, the top 5% of scores was used as the threshold to define candidate selective sweep regions. Regions identified by at least two of the three methods were considered as putative selective sweeps. Overlapping regions were determined using bedtools86 (v2.31.1)
Genome-wide association studies
For the SV-based genome-wide association study (SV-GWAS) only biallelic structural variants were retained for analysis. To account for population structure and relatedness among samples, a kinship matrix was included as a covariate. GWAS was performed using the R package rMVP87. Phenotypic data for pod length and grain number per pod were obtained from a previously published study6. The best linear unbiased predictions (BLUPs) across multiple years were calculated using the lme488 package and used as phenotypic inputs for the SV-GWAS. Both general linear models (GLM), mixed linear models (MLM), and the Fixed and random model Circulating Probability Unification (FarmCPU) model were evaluated for association testing. The FarmCPU model was ultimately selected as the optimal model due to its better control of false positives and higher statistical power. The genome-wide significance threshold was determined using the Bonferroni correction based on the effective number of independent SVs, as calculated by the genetic type I error calculator. A uniform threshold of 0.05/N was applied, resulting in a final significance cutoff of 2.14 × 10– 6. PopLDdecay89 (v3.43) was used to generate linkage disequilibrium heatmaps
Gene expression profiling
The RNA-seq raw data were retrieved from the study by, Wu et al 6. including seeds, leaves, flower buds, and pods at 0, 4, 8, and 12 days after anthesis of cowpea.To quantify the expression of genes in this study, RNA-seq reads from different tissues were trimmed using the fastp90 (v0.24.0) program. The clean reads were then mapped against the reference gene models using HISAT266 (v.2.2.1). The featureCounts91 (v2.0.6) package was used for estimating FPKM values
Protein structure and binding site prediction
The three-dimensional structure of the protein was predicted using AlphaFold392. To identify potential ligand-binding sites, DeepSite (available at https://playmolecule.com/deepsite/) was employed with default parameters