Bayesian fine-mapping pinpoints candidate genes and pleiotropic loci of production traits from a chicken backcrossing scheme

Chi Mei Sun 1✉ Emailchimei.sun@fli.de

Johannes Geibel 1,2

Henner Simianer 2,6

Björn Andersson 3

David Cavero 4

Rudolf Preisinger 5

Steffen Weigend 1,2

Christian Reimer 1,2

1 Friedrich-Loeffler-Institut – Institute of Farm Animal Genetics Höltystraße 10 31535 Neustadt Germany

2 Center for Integrated Breeding Research University of Göttingen Albrecht-Thaer- Weg 3 37075 Göttingen Germany

3 Lohmann Breeders GmbH Am Seedeich 9-11 27472 Cuxhaven Germany

4 H&N International GmbH Am Seedeich 9-11 27472 Cuxhaven Germany

5 EW GROUP GmbH Hogenbögen 1 49429 Visbek Germany

6 Animal Breeding and Genetics Group University of Göttingen Albrecht-Thaer-Weg 3 37075 Göttingen Germany

Chi Mei Sun^1*, Johannes Geibel^1,2, Henner Simianer^2,6, Björn Andersson³, David Cavero⁴, Rudolf Preisinger⁵, Steffen Weigend^1,2, Christian Reimer^1,2

¹Friedrich-Loeffler-Institut – Institute of Farm Animal Genetics, Höltystraße 10, 31535 Neustadt, Germany

²University of Göttingen, Center for Integrated Breeding Research, Albrecht-Thaer-Weg 3, 37075 Göttingen, Germany

³Lohmann Breeders GmbH, Am Seedeich 9–11, 27472 Cuxhaven, Germany

⁴H&N International GmbH, Am Seedeich 9–11, 27472 Cuxhaven, Germany

⁵EW GROUP GmbH, Hogenbögen 1, 49429 Visbek, Germany

⁶University of Göttingen, Animal Breeding and Genetics Group, Albrecht-Thaer-Weg 3, 37075 Göttingen, Germany

^*Corresponding author: chimei.sun@fli.de

Abstract

Background

Understanding the genetic architecture of economically important traits in poultry is critical for improving breeding strategies. In this study, we investigated a backcrossing scheme between a White Layer line and Araucana chickens. Genome-wide association studies (GWAS) were performed using array-genotyped and imputed data. We also explored the use of correlated traits as covariates, that helps to distinguish shared genetic effects from those specific to individual traits. We applied a Bayesian-based fine-mapping approach to refine GWAS-identified QTLs and to pinpoint candidate variants and genes associated with egg number (EN, from 20 to 71 weeks), egg weight (EW, from 30 to 70 weeks), and body weight (BW, at 32 weeks).

Results

GWAS identified multiple significant loci associated with BW and EW, with higher heritability estimated. EN across periods showed a more polygenic architecture with lower heritability. Including correlated traits as covariates in GWAS revealed pleiotropic loci particularly on chromosomes 1 and 4 that influenced both BW and EW, as well as loci specific to individual traits. Bayesian fine-mapping successfully pinpointed candidate genes such as NCAPG, LCORL, and IGF2BP1, well known for their roles in growth and body size across species. Several novel candidate genes were also highlighted for EN. Notably, some fine-mapped signals suggested indirect genetic effects on traits rather than direct causal relationships.

Conclusions

This study demonstrates the power of combining GWAS with imputation and Bayesian fine-mapping in chickens to uncover the genetic basis of economically important traits. Furthermore, incorporating correlated traits as covariates in GWAS provided valuable insights, enabling the distinction between pleiotropic and trait-specific loci. Together, these approaches refine GWAS signals and deepened our understanding of the genetic architecture underlying complex traits.

Keywords

GWAS

Bayesian fine-mapping

egg production

egg weight

body weight

chicken

backcross

Background

The introduction of single nucleotide polymorphism (SNP) arrays have significantly advanced the genetic analysis of economically important traits in chickens, offering unprecedented insights into their complex genetic architecture. In genome-wide association studies (GWAS), SNP arrays serve as powerful tools for identifying novel loci associated with phenotypic variation across different chicken lines, including egg production and quality [1, 2], growth [3, 4], meat quality [5], and disease resistance [6]. These advancements have identified quantitative trait loci (QTLs) and candidate genes linked to key traits of laying chickens, such as egg production and quality, and body weight, that can drive genetic improvements in poultry breeding [7–13].

Despite the achievements, challenges remain. Although SNP arrays are a cost-effective and widely used genotyping tool, their medium density limits the resolution for fine-mapping causal variants. One way to overcome this limitation is through imputation, where lower density SNP arrays are imputed to whole-genome sequence (WGS) level using reference panels of related sequenced animals. In chickens, several studies have incorporated imputation in GWAS to enhance detection power [14–16]. This approach provides denser genomic information, and has been shown to provide a substantial boost in GWAS power from a simulation study, enabling the performance of commercial SNP arrays close to an ideal “complete” chip [17]. Other researchers suggested that WGS-based GWAS enables the discovery of novel variants that were missed by array-based GWAS, but the gains in detection power are modest since they often reinforce signals from known QTLs [18–20]. Nevertheless, there is broad consensus that increasing sample size remains one of the most effective strategies for improving detection power in GWAS [17, 20–22].

Another fundamental limitation for GWAS resolution is that a detected QTL contains numerous genes rather than pinpointing candidate genes with precision, due to the extent of linkage disequilibrium (LD) and generally long haplotypes in structured populations. However, studies have demonstrated that repeated intercrossing of outbred lines can enhance QTL mapping resolution for growth traits such as body weight [23–26], which suggests that introducing recombination events help reduce QTL intervals. GWAS methodologies can also be adapted to improve resolution and interpretation. Including covariates in GWAS is a common approach to control for confounding factors such as population structure and environmental influences that could otherwise bias the identification of the true genetic associations. Covariates such as principal components, age, sex, batch effect, leading SNPs, and correlated traits were generally considered to avoid stratification [16, 26–29]. Beyond just addressing confounding, multi-trait GWAS approaches have been increasingly explored, allowing to detect pleiotropic loci that may not reach the significance thresholds in single-trait GWAS analyses [20, 30, 31].

While traditional GWAS is effective for detecting loci with strong effects, it faces limitations in detection of polygenic traits where many SNPs contribute small individual effects, or when LD complicates the identification of causal variants. Bayesian methods offer a solution by incorporating prior probabilities and modelling sparse SNP effects, thereby refining the large QTLs identified through GWAS. A sequential approach using GWAS to identify QTLs, followed by a Bayes-based approach for fine-mapping and integration of functional enrichment, has successfully revealed candidate genes for important traits in cattle [32–34].

The objective of this study was to dissect the genetic architecture underlying production traits through integration of different GWAS methodologies and Bayesian fine-mapping tool, focused on a backcrossing and intercrossing scheme between Araucana chickens and high-performing White Layer (WL) hens.

Fig. 1

The back-crossing scheme between Araucana chickens and commercial White Layers (WL). The black lines represent the pedigree relationships between individuals across generations, information next to the black lines indicates the parent’s characteristics. BB, Bw and ww refer to the genotypes of the blue eggshell insertion where B refers to the color allele. Integers indicating the number of animals selected as parents (highlighted) or genotyped (not highlighted) in the corresponding generation. Phenotypes were measured on females from five generations (in green circles).

Material and Methods

Animals and Phenotypes

The animals used in this study are owned by the breeding company Lohmann Breeders GmbH. This study focused on a backcrossing scheme between distinct breeds, developed as part of the EU Horizon 2020 project Innovative Management of Animal Genetic Resources (IMAGE; https://www.imageh2020.eu/). This backcrossing scheme was designed to introgress blue eggshell colour from Araucana chickens into a high-performing White Layer line. The blue eggshell colour is determined by the autosomal inheritance of the SLCO1B3 gene (1:65167287–65185448) [35], “BB”, “Bw” and “ww” refer to the genotypes of the blue eggshell insertion where “B” refers to the color allele. Six Araucana males were crossed with ten high-performing White Layer (WL) females from a White Leghorn line to produce F1, followed by two back-crossing generations with WL hens to produce BC1 and BC2, and intercrossing within BC2 to produce the IC generation (Fig. 1). Only males of F1 and BC1 were selected to mate with white layer hens for producing BC1 and BC2. Parents from F1, BC1 and BC2 were selected as the carrier of the color allele (Bw).

Phenotypic traits were recorded in hens from BC1, BC2, IC and the WL founders of the BC cohorts (FounderBC1, FounderBC2), data is available in Supplementary Data 1. Only eggs with intact shell were selected for measuring egg quality traits. Egg number (EN) was recorded in thirteen 28-day periods, starting from 20 weeks (w) of age as the first period (EN1; 20w to 24w), measured as the total number of eggs during the corresponding period. There was no record for EN9 where the selection of individuals took place. Egg weight (EW) was measured at 30, 40, 50 and 70 weeks of age as the average weight of a 3-day clutch, although “FounderBC1” and “BC2” did not have data for EW50. Body weight (BW) was a onetime measurement at 32 weeks of age (Table 1).

Genotyping and Array-based GWAS

Araucana founders (6 males) were sequenced and WL founders (10 females) were genotyped using Affymetrix Axiom 600k array [36]. Remaining animals were genotyped for 52k SNPs using a custom Affymetrix array (ThermoFisher, Santa Clara, CA, USA). Overlapping SNPs were combined and genotypes were mapped to the chicken reference genome GRCg6a. Genotype data were filtered using PLINK 1.9 [37], by removing samples and SNPs with missing call rate greater than 15 %. The iltered genotype data contained 46,605 SNPs and 1,639 individuals with 1,024 females and 615 males.

Genome-wide association studies (GWAS) for EN, EW and BW measured throughout different periods were performed based on the filtered genotype data and different sample groups, including all the animals with non-missing phenotypic records (WL founders for BC1 and BC2, BC1, BC2, IC) denoted as group “all”, and group “offspring” included BC1, BC2 and IC, and by each generation. BW32 was included as a quantitative covariate in the analyses of EW traits, and the corresponding EW measurement was included as a covariate in the analyses of BW32.

Array-based GWAS was conducted for each trait including the generation as a fixed effect, using single-marker regression with polygenic correction, and the genetic relationship matrix (GRM) for each GWAS was computed between all the animals (1,639) using array genotypes, both by GCTA (version v1.94.1) [38–40]. Additional minor allele frequency (MAF) filtration was applied on each GWAS to remove SNPs with MAF lower than 0.01. The filtered genotyped data was fitted to the mixed linear model:

$\:y=X\beta\:+g+e$

where

$\:y$

is a vector of phenotypes of animals;

$\:\beta\:$

is a vector of fixed effects;

$\:X$

is the design matrix for fixed effects;

$\:g$

is a vector of random genetic effects that captures the total genetic variation;

$\:e$

is a vector of residuals. We applied a strict genome-wide Bonferroni-corrected threshold, calculated as

$\:P<\:0.05\:$

divided by the total number of tested SNPs for each analysis [41].

Imputation and accuracy

The filtered genotype data were phased and imputed using Beagle 5.4 [42, 43] with a reference panel of nine Araucana (of which six were the founders for the IMAGE backcross) and 161 White Leghorn samples and commercial White Layer lines, retrieved from a global reference set of around 5k chickens, prepared by the Chicken Genomic Diversity Consortium [44]. Since the Genotype data was mapped to GRCg6a, the reference sequence was lifted from GRCg7b to GRCg6a by bcftools-1.20 [45] using a custom chain file prepared according to [46]. The snakemake pipeline for the liftover is available on https://doi.org/10.5281/zenodo.15343279.

The imputation produced nearly 11 million biallelic variants. Imputation accuracy of the filtered data was estimated in 5-fold cross validations with two repetitions, by masking variants partially from the array-genotyped data in each fold and compared with the imputed SNPs correspondingly. The mean concordance and Pearson’s correlation were 0.95 and 0.92, respectively. The Imputation Quality Score (IQS), which adjusts for the minor allele frequency [47], was 0.88 on average. For the following analyses, we considered 4,868,808 (44.4 %) ariants having good predicted imputation quality (dosage r-square; DR2 > = 0.8).

Imputed-based GWAS

GWAS based on the 4,868,808 imputed variants was conducted for each trait including the generation as a fixed effect, using a fast mixed linear model based tool in GCTA [48]. First, a sparse GRM was generated between all the animals (1639) using the imputed genotypes with a cutoff value of 0.05, then the GWAS was performed using the sparse GRM previously generated for each trait and different sample groups with non-missing phenotypic records. Additional minor allele frequency (MAF) filtration was applied on both analyses to remove SNPs with MAF lower than 0.01. The genome-wide Bonferroni-corrected threshold became stricter (0.05 divided by the total number of tested SNPs for each analysis) as the number of SNPs significantly increased. Additionally, BW32 was included as a quantitative covariate in the analyses of EW traits, and vice versa.

Multi-trait metaGWAS

A multi-trait metaGWAS was conducted on all EN measurements (EN1 – EN13, excluding EN9 due to missing data) using the summary statistics from imputation-based GWAS. The formula

$\:Multi\:trait\:{X}^{2}={t}_{i}^{{\prime\:}}{V}^{-1}{t}_{i}$

suggested by Bolormaa et al. [30], where each SNP was assigned a t-value calculated as the allele effect divided by the standard error obtained from the imputed-based GWAS

$\:\:(t=\frac{b}{se})$

, while

$\:{t}_{i}$

is an

$\:n*1$

vector of t-values of

$\:{SNP}_{i}$

for the

$\:n$

traits,

$\:{t}_{i}^{{\prime\:}}$

is a transpose of the

$\:{t}_{i}\:\left(1*n\right)$

$\:{V}^{-1}$

is an inverse of the

$\:n*n$

correlation matrix between all the traits calculated based on the signed t-values of the SNPs. The genome-wide Bonferroni-corrected threshold was applied to identify significant variants.

Phenotypic correlations and genetic parameters

Phenotypic correlations

$\:\left({r}_{p}\right)$

between traits after adjusted by generation means, were calculated as Pearson’s correlations by R (version 4.3.2) [49] using function cor with pairwise.complete.obs, the corresponding p-values for the phenotypic correlations were calculated using cor.mtest from R package “corrplot” (version 0.92) [50].

Genetic parameters and correlations

$\:\left({r}_{g}\right)$

between traits were estimated from variance captured by the imputed variants including generation as a fixed effect, with the same model as described in the Array GWAS section, using restricted maximum likelihood analysis. We used GREML and Bivariate GREML analysis implemented in GCTA [39, 51, 52] that utilised the GRM computed between all animals (1639) using the imputed genotypes with MAF filtration of 0.01 [38]. GCTA only considered individuals with non-missing phenotypes, therefore the number of samples included in the analysis varied depending on the corresponding phenotypic records.

Fine-mapping analysis

We defined the QTL for fine-mapping based on the distance between the first and the last significant SNP that passed the Bonferroni threshold identified by imputed-based GWAS in group “all”, with a 1 Mb extension both upstream and downstream. All QTL regions were limited to a maximum size of 11 Mb. QTLs of 17 traits for fine-mapping can be found in the Supplementary Data 2.

We used two Bayesian-based fine-mapping methods, the BFMAP (Version 0.65) [32] and FINEMAP (Version 1.4.2) [53]. In our analysis, we observed that FINEMAP exhibited sensitivity to minor alterations in input data that has led to inconsistency in the results, therefore FINEMAP results are not discussed further here. Meanwhile, BFMAP demonstrated a high degree of stability in credible set (CS) identification, providing robust and interpretative results for further discussion.

The phenotypes and imputed genotypes of the individuals for which both records exist, including generation as a covariate, were used as input for the following BFMAP analyses. The QTL heritability was estimated from the proportion of variance explained by SNPs in the QTL region by

$\:{\sigma\:}_{g}^{2}/{(\sigma\:}_{g}^{2}\:+{\sigma\:}_{e}^{2})$

, based on the regional GRM computed by the QTL SNPs between all animals according to Yang’s formula [38]. The overall GRM was computed by BFMAP based on the imputed genotypes with MAF filtration of 0.01 between all animals (1639), and were included during fine-mapping together with the estimated QTL heritability. The Bayesian model developed for fine-mapping is:

$\:y=Xb+Za+g+e$

where

$\:y$

is a vector of a phenotype from n animals;

$\:\:b$

is a vector of covariate effects (generation) and

$\:X$

is the corresponding design matrix;

$\:a$

is a vector of variant effects and

$\:\:Z$

is the corresponding genotype coding matrix;

$\:g$

is a vector of polygenic effect considering the population structure;

$\:e$

is the residual. Probability of the null model P(D|M₀) and the probability of model with variant inclusion given the data P(D|M) can be computed to obtain the scaled Bayes factor and the corresponding p-value for a SNP. BFMAP approximates flat priors for SNP effects [32].

For fine-mapping, BFMAP first performs an iterative forward selection process to identify independent signals, by sequentially adding lead SNPs that were evaluated conditional on all previously selected lead SNPs to the model. The process continues until no variants meet the software’s predefined threshold (2logBF + 1 < 2logm_eff), with each selected lead SNP represents the proxy of an independent signal. Next, each signal is expanded to include variants in high LD (default r² > 0.3) with the signal proxy. The final lead SNP for each signal is then re-evaluated and determined through repositioning within each signal set. Finally, a posterior probability of causality

$\:\left({PPC}_{i}\right)$

is assigned to each variant within a signal conditioning on the lead SNPs from all other signals while excluding the lead SNP of that signal. PPC values are then sorted in descending order, starting from top, a list of variants with the cumulative sum of their PPC over 95% consists a credible variant set (CS) for each signal [32]. The summed PPC for a gene can also be generated based on the

$\:{PPC}_{i}$

, variants with 1 kb extension both upstream and downstream of the gene were also included [54]. All genes with summed PPC over 0.01 were documented in Supplementary Data 3. Afterwards we combined all signal-specific CS into a final CS for a QTL.

Functional enrichment of BFMAP fine-mapping results

Functional enrichment of the fine-mapping results from BFMAP was conducted using the R package “gemrich” (version 0.1.3) [54], which is designed to integrate functional annotations with the forward selection approach of BFMAP. Variant functional annotations were obtained with the Ensembl Variant Effect Predictor (VEP, version 112) [55], applied to variants included in the CS determined by BFMAP. The categories analyzed included Consequences, Impact, and Biotype. The number of variants analyzed in each category could vary slightly dependent on the availability of annotation.

Signal filtration was applied using a p-value threshold that targeted the lead variant of the signal, thereby a signal was either retained or filtered out as a whole. We applied a threshold of

$\:P<5\times\:{10}^{-5}$

, as recommended by the author. All variants within retained signals were included in the following analyses.

The estimated probability of causal variants being in a category C

$\:\left(\widehat{{P}_{C}}\right)$

was generated using maximum likelihood based on the produced

$\:{PPC}_{i}$

. The enrichment factor for each category

$\:\left({f}_{C}\right)$

quantified how enriched the candidate causal variants in that category compared to its background frequency

$\:\left({q}_{C}\right)$

$\:{f}_{C}=\:\frac{\widehat{{P}_{C}}}{{q}_{C}}$

. To ensure robust enrichment estimates, we excluded runs where fewer than 10 variants remained after signal filtration, or where fewer than two category types were present; these cases were flagged as “Failed” in the “mle_status” column, with the reason noted in the “reason” column. In addition, categories representing less than 1% of variants were grouped into a single “remaining” category. Parameters for enrichment estimation were documented in Supplementary Data 4.

For a total of

$\:n$

variants remaining, we then renormalized

$\:{PPC}_{i}$

for each variant

$\:i$

to account for the corresponding category enrichment

$\:\left({f}_{C}\right)$

, weighted by the total PPC of

$\:n$

variants:

$\:{renormedPPC}_{i}=\:\frac{{PPC}_{i}\times\:{f}_{C}}{\sum\:_{i}^{n}({PPC}_{i}\times\:{f}_{C})}\times\:\sum\:_{i}^{n}{PPC}_{i}\:$

Variants within filtered signals, along with their

$\:{PPC}_{i}$

and

$\:{renormedPPC}_{i}$

, are provided in Supplementary Data 5. When computing the summed PPC for a gene based on the filtered variants’

$\:{renormedPPC}_{i}$

, we included variants within the gene plus 1 kb upstream and downstream. Genes with summed PPC greater than 0.01 were reported in Supplementary Data 6.

Workflow

The pipeline of this work was implemented with Snakemake 7.32.4 [56]. The scripts, snakefiles, working conda environment definitions in yaml format, and the dependency analytics graph (DAG) and the rulegraph of the pipeline, are available on https://doi.org/10.5281/zenodo.17473280.

Results

Phenotypical characteristics

Descriptive statistics for phenotypes in all generations are shown in Table 1. A smaller mean number of eggs laid (11.9) and a larger coefficient of variance (CV % at 76.1%) were observed during early period of EN measurement (EN1; 20 w – 23 w), while during later periods (EN2 – EN13) the mean number of eggs laid ranged from 21.7 to 25.7 and CV % ranged from 17.2% to 35%. During all EN measurements there were always some individuals (from 0.8% to 19.6% of the population) with none egg laid. From 1.8% to 39.7% of the population laid 28 eggs across the periods. The mean weight and CV % for BW at 32 weeks were 1786.2 grams and 11.6%, respectively. For EW measured at different weeks, their CV % were similar average at 7.7%, the mean egg weight was increasing from 57.7 to 64.0 grams as the hens aged.

Visualization of the phenotypic correlations (

$\:{r}_{p}$

, lower triangle) and genetic correlations (

$\:{r}_{g}$

, based on imputed SNPs, upper triangle) among all traits, after adjusting for generation effects, together with their estimated heritabilities, is presented in Fig. 2.

Overall,

$\:{r}_{p}$

showed moderate to strong positive correlations among EN measurements, although these correlations weakened as the interval between measurements increased. In contrast, many

$\:{r}_{g}$

across EN failed to converge after adjusting for generation effects, specifically from EN4 to EN6. This period coincided with peak laying performance, during which reduced variance limited reliable estimation. Interestingly, EN1 exhibited strong positive

$\:{r}_{g}$

with the start and end of the laying period, and moderate positive

$\:{r}_{g}$

with the mid-to-late laying phase. A similar though weaker pattern was also observed for EN2.

Generally, EN showed negative correlations with both EW and BW in our analysis, with

$\:{r}_{g}$

showing stronger correlations than

$\:{r}_{p}$

. For EW70, correlations with EN gradually strengthened over the laying period, while for BW32, EW30 and EW40, both

$\:{r}_{g}$

and

$\:{r}_{p}$

were strongest at the beginning and end of the laying period, and weakest during the mid-to-late phase.

EW measurements at different weeks of age were consistently and positively correlated, with

$\:{r}_{p}$

ranging from 0.64 to 0.78 and

$\:{r}_{g}$

from 0.88 to 0.96. However,

$\:{r}_{g}$

estimates involving EW50 did not converge. BW32 also showed positive correlations with EW traits, with the strongest

$\:{r}_{p}$

observed with EW40 (0.43) and the strongest

$\:{r}_{g}$

with EW 30 (0.57).

Estimated genetic heritability was exceptionally high for EN1 that reached 0.4, and dropped sharply for EN2, EN10 and EN11, ranging from 0.12 to 0.15, and was close to zero for other EN periods. In contrast, BW32 had the highest heritability at 0.8, while EW30 to EW70 showed moderately high values, ranging from 0.57 to 0.69.

Fig. 2

Visualization of genetic (upper triangle, based on imputed genotypes) and phenotypic (lower triangle) correlations between traits after adjusting for generation. Genetic correlation estimates that did not converged were marked as “NA”. Pearson’s correlations were calculated from phenotypic records and statistically insignificant (p-values > 0.05) results were blanked. Diagonal shows the estimated heritability for each trait from imputed data.

Table 1

Descriptive statistics for phenotypes in all generations
Trait	N	Weeks	Mean	SD	CV(%)	Min(%)	Max(%)
EN1	874	20–23	11.86	9.02	76.1	0(19.6%)	28(1.8%)
EN2	874	24–27	24.12	4.9	20.3	0(0.8%)	28(23.5%)
EN3	874	28–31	25.61	4.4	17.2	0(1.1%)	28(35.4%)
EN4	874	32–35	25.68	4.8	18.7	0(1.6%)	28(39.7%)
EN5	874	36–39	25.38	5.31	20.9	0(2.5%)	28(35.7%)
EN6	874	40–43	25.51	5.13	20.1	0(2.2%)	28(35.9%)
EN7	874	44–47	25.32	5.26	20.8	0(2.4%)	28(33.5%)
EN8	874	48–51	24.62	5.8	23.6	0(2.9%)	28(28.5%)
EN10	854	56–59	23.95	5.91	24.7	0(1.6%)	28(20.0%)
EN11	854	60–63	23.56	6.49	27.6	0(2.9%)	28(20.5%)
EN12	854	64–67	22.75	7.17	31.5	0(4.6%)	28(15.5%)
EN13	853	68–71	21.69	7.59	35.0	0(5.4%)	28(12.2%)
BW32	864	32	1786.2	207.53	11.6	1202(0.1%)	2582(0.1%)
EW30	854	30	57.66	4.54	7.9	45(0.1%)	73(0.1%)
EW40	831	40	61.39	4.51	7.4	50(0.1%)	79(0.1%)
EW50	524	50	62.1	4.72	7.6	51(0.2%)	79(0.2%)
EW70	746	70	64.02	5.03	7.9	52(0.1%)	80(0.1%)
N - number of records, SD - standard deviation, CV(%) - coefficient of variance in percentage, Min(%)/Max(%) – minimum/maximum value (proportion of individuals with that value)
Comparison between Array and imputed GWAS

All array-based and imputed GWAS results across traits and groups are presented as Manhattan plots in Supplementary Figs. 1.

Both array and imputed GWAS indicated a polygenic architecture for EN measurements across different sample groups, with many loci each contributing small effects to EN. In general, imputed GWAS identified more significant variants than array GWAS, particularly in the “FounderBC2” and “FounderBC1” groups. During the mid-to-late laying phase (EN7 – EN11), significant variants were detected across more sample groups, most notably on chr 2 and 12 (Fig. 3).

For EW30, both array and imputed GWAS revealed significant peaks on chr 1 and 4 in the "all" and "offspring" groups, with an additional peak on chr 2 detected in the “offspring” group which just below the significance threshold in the “all” group. Interestingly, in the “IC” group, a significant signal on chr 4 was detected in the imputed GWAS, which applied a stricter significance threshold, while the same signal in the array GWAS fell just below significance. Conversely, array GWAS identified significant signals on chr 4 in the "BC1" and "BC2" groups that did not surpass the more stringent significance threshold in the imputed analysis. Emerging signals on chr 1 and 2 were also observed in the “BC2” group in both GWAS approaches.

Across EW40, EW50, and EW70, both array and imputed GWAS consistently identified a significant peak on chr 4 that persisted across all EW measurements in the "all" and "offspring" groups, as well as in the "IC" group for EW40 and EW50, and partially in the “BC1” group. The significant peak on chr 1 detected for EW30 was also observed for EW40 in the "all" and "offspring" groups in both GWAS analyses, and in the "BC2" group based on imputed GWAS. Notably, for EW70 in the “BC1” and “BC2” groups, additional suggestive peaks were observed on chr 1 at a distinct position and on chr 7, respectively, both falling just below the genome-wide significance threshold in the two GWAS analyses.

For BW32, three major significant peaks on chr 1, 4 and 27 were detected by both array and imputed GWAS in the "all," "offspring," and "IC" groups, while peaks on chr 1 and 4 were identified in the "BC2" group. In the "BC1" group, array GWAS identified significant signals on chr 1 and 4, but only the chr 4 signal remained significant in the imputed GWAS. No significant variants were detected in the WL founders, although the chr 4 signal nearly reached significance in the "FounderBC2" group, and a spurious signal on chr 1 emerged in the “Founder_BC1” group, both from the imputed GWAS. Notably, the major peaks on chr 1 and 4 identified for BW32 overlapped with those associated with multiple EW measurements (Fig. 4).

Fig. 3

Manhattan plots of single-trait genome-wide association from individuals of all generations for egg number at mid-to-late stages (EN7, 44w – 48w; EN8, 48w – 52w; EN10, 56w – 60w), using array genotypes (top) and imputed genotypes (bottom). Chromosomes are on the x-axis and

$\:-{log}_{10}\left(P\right)$

on the y-axis. The horizontal blue line indicates the Bonferroni significance threshold. Variants passing the threshold were highlighted in bright colour.

Fig. 4

Manhattan plots of single-trait genome-wide association from individuals of all generations, for body weight at 32w (BW32, top) and egg weight at 30w (EW30, bottom), using array genotypes (first column), imputed genotypes (second column) and association including counterpart as covariate based on imputed data (third column). Chromosomes are on the x-axis and

$\:-{log}_{10}\left(P\right)$

on the y-axis. The horizontal blue line indicates the Bonferroni significance threshold. Variants passing the threshold were highlighted in bright colour.

Multi-trait metaGWAS for EN

Given the moderate to strong correlations among EN measurements (Fig. 2), and the polygenic background observed for EN in both array-based and imputed GWAS, we conducted a multi-trait metaGWAS to increase statistical power and identify pleiotropic loci influencing EN across the laying period. The metaGWAS identified several significant variants on chr 2 associated with EN traits from EN1 to EN13 (Fig. 5).

Fig. 5

Multi-trait genome-wide association based on results from single-trait associations, including egg number from EN1 to EN13 (EN9 excluded) from individuals of all generations using imputed data. Chromosomes are on the x-axis and

$\:-{log}_{10}\left(P\right)$

on the y-axis. The horizontal blue line indicates the Bonferroni significance threshold. Variants passing the threshold were highlighted in bright colour.

Including correlated traits as covariates in GWAS

Given the high correlations between BW32 and EW measurements, we performed both array and imputed GWAS for EW traits while including BW32 as a quantitative covariate, and vice versa. This approach provided additional insights into the genetic architecture of these traits.

When BW32 was included as a covariate in EW GWAS, signals that previously overlapped between BW32 and EW40, EW50, and EW70 on chr 1 and 4 were suppressed and no longer reached significance across any sample group in either GWAS approach. For EW30, covariate adjustment helped clarified the presence of spurious associations without markedly altering most p-values. For example, a signal on chr 2 became significant in the “all” group and remained significant in the “offspring” group in array GWAS. Meanwhile, the overlapping signal on chr 1 was strongly suppressed and disappeared across sample groups, whereas the signal on chr 4 showed greatly increased p-values and remained close to the significance threshold only in the “all” and “offspring” groups. (Fig. 4). Interestingly, for EW70, the previously suggestive peak on chr 1 at a distinct position in the “BC1” group was retained with comparable p-values, and additional suggestive peak was observed on chr 21 in the imputed GWAS, although both remained just below the genome-wide significance threshold.

Conversely, including EW measurements as covariates in the GWAS for BW32 generally increased p-values for the major signals on chr 1 and 4 (Fig. 4). The locus on chr 27 also showed increased p-values when EW50 and EW70 were included. Such reduction in statistical power sometimes preventing these signals from reaching genome-wide significance. Nevertheless, the three major peaks on chr 1, 4, and 27 were largely retained in the "all" and "offspring" groups, except for the chr 1 peak, which dropped below the significance threshold when EW50 was used as the covariate. Across different EW covariates, the chr 1 signal was suppressed in the “IC” group but retained in “BC2”, while the chr 4 signal was sometimes suppressed in the “IC” and “BC2” groups and completely lost in the “BC1” group.

All array and imputed GWAS results including covariate among groups between BW32 and EW30, EW40, EW50, EW70 are available as Manhattan plots in Supplementary Figs. 2.

Fine-mapping GWAS-identified QTLs and enrichment analysis

Fine-mapping analysis was conducted to refine the QTL regions identified through imputation-based GWAS, with the aim of pinpointing potential causal variants and candidate genes. EN traits exhibited polygenic signals, with only a few variants reaching significance and no clear peaks, making these results unreliable for QTL identification. Given the moderate to strong correlation among EN measurements, a multi-trait metaGWAS was performed to detect pleiotropic loci associated with EN1 – EN13. Fine-mapping was subsequently applied to the significant loci identified in this meta-analysis. The fine-mapping results—including the QTLs analysed for 17 traits, their estimated heritabilities, enrichment factors, and the probability of including causal variants within functional annotated categories, as well as the filtered variants and candidate genes—are summarised in Supplementary Data 2–6.

Body Weights at Week 32

The estimated

$\:{h}^{2}$

for the three QTLs associated with BW32 on chr 1, 4, and 27 were comparable, at 0.39, 0.43, and 0.38, respectively. Before functional enrichment analysis, the CS for these QTLs contained 3,616 (QTL-chr1), 87 (QTL-chr4), and 47 (QTL-chr27) variants.

For QTL-chr1, two signals were identified: one spanning the entire QTL region and another confined to a narrower region. After applying signal filtration using a p-value threshold of

$\:P<5\times\:{10}^{-5}$

, only the refined signal was retained, consisting of 160 variants. Among these, a variant located at 170,911,032 bp exhibited the highest PPC, increasing from 43% pre-enrichment to 92.3% and 43.9% post-enrichment under the “Consequence” and “Impact” categories, respectively. This variant was annotated as an intergenic variant and modifier, but without “Biotype” information. The signal highlighted three genes — FAM124A, ENSGALG00000052822, and ENSGALG00000053256 — with PPC values ranging from 3.8% to 43% before enrichment. After renormalization by “Impact”, PPC values for these genes remained largely unchanged, however, only ENSGALG00000053256 persisted in the “Consequence” category, where its PPC increased markedly to 92.3%. Conversely, only ENSGALG00000052822 was retained in the “Biotype” category, with a modest PPC increase from 37.2% to 39.1%.

For QTL-chr4, a single signal localized near the NCAPG-LCORL region and its upstream area was identified and retained after filtration, comprising 87 variants. Enrichment estimation was not feasible for the “Consequence” and “Impact” categories due to a lack of annotation variability. When enriched by the “Biotype” category, PPC values for variants annotated as lncRNA decreased from 0.3% to near zero, while those annotated as protein coding remained stable (about 1.3%). At the gene level, NCAPG and LCORL displayed high PPC values of 90.8% and 88.3%, which further increased to 95.1% and 92.5% after enrichment by “Biotype” category.

On QTL-chr27, one refined signal was detected and retained after filtration, comprising 47 variants. Due to uniform “Biotype” annotation (all variants as protein coding), enrichment estimation was not applicable for this category. Variants PPC values ranged from 0.8% to 3.8% before, and nearly zero to 4.3% after enrichment, consistently highlighted IGF2BP1 with the highest PPC values of 61.2% pre-enrichment, 69.4% (“Consequence”) and 64.4% (“Impact”) post-enrichment. Additional genes— GIP, SNF8, UBE2Z, and ENSGALG00000001525 were also identified, with PPC values ranging from 2.2% to 16.3% across categories before and after enrichment.

Egg Weight (EW)

Two QTLs were identified for EW30 on chr 1 and 4, with estimated

$\:{h}^{2}$

of 0.15 and 0.73 and CS sizes of 800 and 455 variants, respectively. On QTL-chr1, two initial signals spanning the region highlighted several genes, namely SETDB2, ENSGALG00000050514, FNDC3A, and ENSGALG00000031392, with PPC values ranging from 10.7% to 44.2%. After signal filtration, only one signal retained, comprising 655 variants, with the top PPC increasing from 1.3% to 6.2%. ENSGALG00000053103 became the top gene in the “Consequence” category, with PPC rising from 8.7% to 41.5%. ENSGALG00000050514 was retained and ranked highest across the “Impact” and “Biotype” categories, with PPC increasing from 11.8% to 11.9% and 19.1%, respectively. Notably, ENSGALG00000031392 was pinpointed by both initial signals with different PPC (44.2% and to 3.4%), but showed lower PPC in the retained signal. For QTL-chr4, two signals were detected and retained after filtration: one localized near SLIT2 and LDB2, and another in an intergenic region. The combined signal comprised 455 variants, with the top PPC of 9.9%. Enrichment analysis did not converge for the “Consequence” category and could not be evaluated for “Impact” due to limited annotation variability. The retained signal highlighted LDB2, SLIT2, ENSGALG00000054285, and ENSGALG00000032384, with PPC values unchanged after enrichment in the “Biotype” category (1.3% to 65.4%).

Two QTLs were identified for EW40 on chr 1 and 4, with estimated

$\:{h}^{2}$

of 0.06 and 0.20 and CS sizes of 3,355 and 206 variants, respectively. On QTL-chr1, two initial signals were detected, with THSD1 showing the highest PPC (21.2%). After signal filtration, one signal remained (1,245 variants), and the top PPC increased from 1.0% to 7.4%. The gene set also shifted, with ENSGALG00000050514 emerging as the top gene across the “Consequence”, “Impact” and “Biotype” categories (PPC increasing from 7.9% to 26%, 7.9%, and 12.0%, respectively). For QTL-chr4, a single retained signal localized near the NCAPG-LCORL region and its upstream area (206 variants, top PPC of 2.1%). These variants highlighted NCAPG (30%) and LCORL (25.2%) pre-enrichment. Post-enrichment PPC values dropped to 5.7% for NCAPG, and < 1% for LCORL across the “Consequence” and “Biotype” categories. The “Impact” category could not be evaluated due to lack of annotation variability.

For EW50, one QTL was identified on chr 4 (estimated

$\:{h}^{2}$

= 0.24, CS = 186 variants). A single signal near the NCAPG-LCORL region and its upstream area, as well as on LDB2, was retained after filtration. Pre-enrichment PPC values highlighted NCAPG (7%), LCORL (2.6%), and LDB2 (11.8%). Post-enrichment, only NCAPG remained, with PPC of 5.2% across the “Consequence” and “Biotype” categories (“Impact” category not evaluable).

For EW70, one QTL was identified on chr 4, overlapping the EW50 region (estimated

$\:{h}^{2}$

= 0.21, CS = 1,277 variants). Two signals were initially detected: one overlapping the EW50 locus and another spanning a downstream region. These variants highlighted several genes, notably TAPT1 (35.2%) and PROM1 (29.4%) pre-enrichment. After filtration, only the first signal was retained (176 variants), continuing to highlight NCAPG (3.7%), LCORL (1.7%), and LDB2 (4.9%) pre-enrichment. Similar to EW50, only NCAPG remained after enrichment, with PPC of 2.1% across the “Consequence” and “Biotype” categories (“Impact” category not evaluable).

Overall, three genes located on chr 1 that were previously pinpointed by BW32 were also observed for EW30 and EW40 both pre- and post-enrichment, although their QTL-chr1 regions slightly differed. Among other genes pinpointed on chr 1 for EW30 and EW40, about half overlapped pre-enrichment, and most overlapped post-enrichment. On chr 4, NCAPG and LCORL were consistently identified across BW32 and EW traits (except EW30), both pre- and post-enrichment, while LDB2 was repeatedly highlighted in EW traits (except EW40) before and after enrichment.

Egg Number (EN)

The multi-trait metaGWAS for EN1 to EN13 identified one QTL on chr 2. The estimated

$\:{h}^{2}$

for this QTL across different EN measurements was generally low, with a maximum of 0.1, and CS sizes ranging from 31 to 1,807 variants. These variants consistently highlighted ENSGALG00000049955 (PPC = 7 − 27.8% for EN3 – EN13, declining to 1.8% in EN2), and GARS (PPC = 32.5–58.3% for EN3 – EN12, declining to 2% in EN13). Several genes overlapped between the beginning and end of the laying period, including XIRP1, OXSR1, DLEC1, ENSGALG00000005884, and CTDSPL (their PPC = 1.1–18.1%), while ENSGALG00000050406 and ENSGALG00000049302 were mainly detected in the late laying period (PPC = 1.0–3.3%). Additional genes were detected specific to a period, e.g., CRHR2 for EN2, PLCD1 for EN3, and MYD88 for both EN2 and EN3.

After signal filtration, the CS sizes were reduced to 35–38 variants, with no variants retained for EN2 and EN10 – EN13. Following enrichment, GARS and ENSGALG00000049955 continued to be highlighted across EN3 – EN8. GARS showed the highest PPC under the “Impact” category (57.7–96.1%), followed by “Biotype” (1.8–48.7%), but could not be evaluated under “Consequence”. In contrast, ENSGALG00000049955 had the highest PPC under “Biotype” (39.3–96.1%), followed by “Consequence” (13.3–13.6%) and “Impact” (10.5–17.7%). Notably, three genes – ENSGALG00000005934, ENSGALG00000006018 and OXSR1 – were uniquely identified in EN1 after enrichement in the “Impact” category (PPC = 2.5–5.6%).

Discussion

Enhancing GWAS through Backcrossing Generations and Imputation

A backcrossing scheme involves the introduction of multiple recombination events that can break down long-range LD blocks, which allows for a higher resolution of QTL mapping and refinement of QTLs [23]. In this study, our results showed that GWAS identified more and stronger signals in “IC” than other generations, while GWAS including individuals from all generations thereby resulting to a larger sample size, have identified a greater number of QTLs compared to single-generation analyses. Furthermore, imputation of array-genotyped data significantly increases the density of variants across the genome, which strengthens the statistical power of QTLs that were identified in array GWAS. These findings highlight the benefits of combining breeding strategies with advanced genomics tools to enhance the resolution and detection power of GWAS.

Including correlated traits as covariates in GWAS reveals pleiotropic and trait-specific loci

Covariates play a critical role in GWAS by accounting for confounding factors that might obscure true genetic signals. Specifically, when two traits are correlated, conditioning GWAS on one trait by including it as a covariate can help to disentangle their shared and distinct genetic architectures. In this study, BW32 and EW traits exhibited strong phenotypic and genetic correlations across different time points. To clarify the genetic basis of these traits, we performed GWAS for EW traits while including BW32 as a covariate, and vice versa.

When BW32 was included as a covariate in the GWAS for EW at four time points, the previously identified QTLs on chr 1 disappeared and diminished on chr 4, suggesting that these associations were primarily driven by the effects of BW32 rather than by direct effects on EW traits. In contrast, when EW traits were included as covariates in the GWAS for BW32, the coinciding signals on chr 1 and 4 generally remained significant, although with reduced statistical power (i.e., larger p-values). This attenuation also extended to the signal on chr 27 when EW50 and EW70 were included. Together, these results indicate that the QTLs on chr 1 and 4 are pleiotropic — influencing both BW32 and EW traits — with stronger direct effects on BW32 and secondary, indirect effects on EW traits, prominently on chr 1. Meanwhile, the locus on chr 27 appears to specifically affect BW32, while also showing correlated effects with EW at later developmental stages.

Notably, signals on other chromosomes remained with comparable p-values, suggesting genomic regions that contribute specifically to the trait of interest. In some cases, covariate adjustment even helped clarify spurious associations by reducing or stabilizing p-values, as observed in the GWAS analyses of EW30 and EW70 when BW32 was included as a covariate. Further investigation into the biological mechanisms underlying both pleiotropic and trait-specific QTLs — for example, through functional annotation or gene expression analyses — may provide valuable insights into how these loci influence the traits.

These findings underscore the importance of considering correlated traits as covariate in GWAS to disentangle complex genetic relationships, refine association signals and enhance the accuracy and interpretability of GWAS results.

High variation of heritability estimation

The large variation in

$\:{h}^{2}$

estimates observed for both phenotypes and QTL regions in this backcrossing scheme reflects the complex genetic architecture underlying the traits studied, as well as their evaluation in a heterogeneous population. Overall,

$\:{h}^{2}$

tended to be higher for BW32, EW at different time points, and EN during the early stage of production (highest for EN1, moderate for EN2). In contrast, EN during the mid-to-late laying stages exhibited substantially lower

$\:{h}^{2}$

estimates (low for EN3 — EN13, moderate for EN10 and EN11), reaching zero during the peak laying period (EN4 — EN6). This decline in

$\:{h}^{2}$

with advancing age has been previously reported in chicken and is likely due to the increasing influence of environmental factors on EN over time [57, 58]. Our findings partially support this interpretation, as we observed a moderate increase in

$\:{h}^{2}$

during EN10 and EN11 compared with the surrounding low

$\:{h}^{2}$

periods. Furthermore, the absence of a clear trend of changes of environmental variation in EW with age [59] aligns with the observation of relatively stable

$\:{h}^{2}$

estimates for EW across various ages. Interestingly, GWAS results revealed more suggestive or significant variants during periods of moderate-to-low

$\:{h}^{2}$

(EN3, EN4, EN7 — EN11), while a highly polygenic architecture was observed during the period with the highest

$\:{h}^{2}$

(EN1,

$\:{h}^{2}$

= 0.40). For EN, the QTL on chr 2 derived from the multi-trait metaGWAS explained most variance for EN4, EN7, and EN8 (

$\:{h}^{2}$

= 0.03 — 0.09), which were also the stages with the suggestive or significant peak on chr 2 in GWAS, while the genome-wide

$\:{h}^{2}$

for these traits were low (0.00 — 0.06). Conversely, EN1 exhibited the highest

$\:{h}^{2}$

(0.40) but a smaller contribution from this QTL (0.02). This suggests that the genetic control of EN may shift across the laying cycle, with early production influenced by many small-effect loci, and later stages affected by a few loci with small-effect together with environmental factors.

We occasionally observed regional (QTL-based)

$\:{h}^{2}$

estimates for a trait exceeding the genome-wide trait

$\:{h}^{2}$

(e.g., for EW30, genome-wide

$\:{h}^{2}$

= 0.57, QTL-chr4

$\:{h}^{2}$

= 0.73). This likely reflects local haplotype and linkage disequilibrium in the backcross population, which can lead to inflated variance estimates when the QTL GRM captures not only the genetic effects, but also part of the background covariance due to ancestry. Nevertheless, QTL-based and trait-level

$\:{h}^{2}$

were strongly correlated (Pearson’s r = 0.74), indicating that the regional estimates broadly reflect trait genetic architecture. For EW, the QTL on chr 1 — previously reported to disappear after adjusting for BW32 — explained a smaller proportion of variance (EW30

$\:{h}^{2}$

= 0.15; EW40

$\:{h}^{2}$

= 0.06). In contrast, the QTL on chr 4, which showed attenuation in GWAS with BW32 covariate, accounted for substantial variation at the early stage (EW30

$\:{h}^{2}$

= 0.73) and remained moderately influential through later stages (EW40 — EW70

$\:{h}^{2}⩾$

0.2). For BW32, QTLs on chr 1 and 27 explained comparable proportions of variance (

$\:{h}^{2}$

around 0.38), while the QTL on chr 4 explained slightly more (

$\:{h}^{2}$

= 0.43). This pattern further supports the pleiotropic nature of QTLs on chr 1 and 4 for BW and EW traits, with the effect on chr 4 being more prominent.

Fine-mapping pinpoints promising genes for multiple traits

In this study, fine-mapping analyses using a Bayesian approach identified several promising candidate genes with pleiotropic effects across multiple traits. Notably, NCAPG and LCORL were highlighted as influencing BW32 and all EW measurements except EW30, supported by comparatively high PPC values. NCAPG encodes a subunit of the condensin complex essential for chromosomal regulation during mitosis and meiosis, while LCORL encodes a ligand-dependent nuclear receptor corepressor-like protein involved in transcriptional regulation via RNA polymerase II. Both genes have been extensively linked to growth traits across species. In cattle, the NCAPG–LCORL locus is robustly associated with growth, feed intake, and carcass traits, as supported by genomic association studies, gene expression analyses, and haplotype mapping [60–62]. In horses, this locus influences body size and conformation traits with strong GWAS signal [63, 64]. Selective sweep analysis has shown strong selection signature of LCORL associated with stature in pigs [65]. Recently, Bai et al. confirmed causal mutations in the NCAPG–LCORL region affecting body weights, through combined analyses of selective sweep across domesticated species and gene-editing studies in mice [66].

Despite strong evidence for direct effects on body weight, direct biological links between NCAPG–LCORL and egg weight remain limited. In our study, both pre- and post-enrichment PPC values for NCAPG–LCORL were consistently lower for EW than for BW, showing a sharp decline with increasing age. For example, while PPC values for EW40 reached approximately 30%, they dropped to below 7% in EW50 and EW70 before enrichment, and further declined to below 6% after signal filtration and enrichment from EW40 to EW70, highlighting a substantial reduction in their contribution. Combined with the strong phenotypic and genetic correlations observed between BW and EW, and the loss of these significant signals in GWAS when BW32 was included as a covariate, these findings suggests that NCAPG–LCORL may influence EW primarily through genetic effects mediating body weight, rather than a direct impact on egg weight itself.

Similarly, Fine-mapping of BW32 highlighted IGF2BP1 (insulin-like growth factor 2 mRNA-binding protein 1). IGF2BP1 is a well-known candidate gene associated with growth and body size across various livestock species [67–71]. In chickens, causal promoter variants in IGF2BP1 regulating body size were identified via pan-genome analysis, particularly between 3 to 12 weeks of age [72]. Additionally, IGF2BP1 has also been shown to promote fat and muscle development in chickens, especially at 8 weeks and 120 days, respectively (73, 74). It’s expression level has been reported to decline with age in poultry [69, 74, 75], which may partly explain its reduced effect on body weight at later developmental stages. Interestingly, IGF2BP1 alone has been reported to inhibit cell proliferation unless coupled with post-translational modification, suggesting that its biological activity may be context-dependent [75]. In our study, BW was measured at 32 weeks of age, which is later than in previous studies, yet IGF2BP1 was still consistently pinpointed with relatively high PPC values that even increased slightly after functional enrichment. This finding underscores the continued importance of IGF2BP1 and its potential long-term impact in influencing body weight.

Apart from IGF2BP1, Fine-mapping for BW32 also identified several nearby protein-coding genes, including GIP (involved in glucose response), SNF8 (protein transport), UBE2Z (apoptosis), and ENSGALG00000001525 (autophagosome regulation; NCBI gene name for CALCOCO2). These genes have been previously reported to be associated with breast muscle weights in chicken [76] and abdominal fat percentage [77], suggesting they may contribute to body weight variation through their roles in energy metabolism.

Fine-mapping identified several lncRNA genes on chr 1 associated with BW32, EW30 and EW40, including ENSGALG00000053256, ENSGALG00000052822, and FAM124A. These genes have been reported to be associated with body weight in chickens [78, 79], as well as with stature [80] and skeletal muscle production [81]. This suggests that their influence is primarily on body weight, which aligns with the substantially higher PPC values computed for BW32 compared to EW40 in our fine-mapping results.

On chr 4, fine-mapping pinpointed several genes associated with EW across different periods, including LDB2 and ENSGALG00000032384, as well as genes specific to EW30 (SLIT2, ENSGALG00000054285) and EW70 (TAPT1, PROM1, FBXL5, CD38). Among these, LDB2, SLIT2 and TAPT1 have been widely reported in the literature [3, 12, 24, 71, 82–88], while others such as PROM1, CD38 and ENSGALG00000054285 have been less frequently reported in chickens [82, 83]. Many of these genes are primarily linked to growth traits or body weight. Given their moderate to low PPC values, the loss of corresponding GWAS signals for EW after accounting for BW, and their biological functions — including transcription regulation (LDB2), regulation of cellular protein (FBXL5), cell signalling and regulation (SLIT2, CD38, PROM1, TAPT1) — these loci likely contribute to egg weight indirectly through body growth pathways.

For EN, fine-mapping identified several genes associated across and within specific laying periods. After signal filtration, two genes were consistently linked from EN3 to EN8, while others were unique to EN1. However, these genes generally did not overlap with findings reported in prior studies, in line with the observation that many QTLs identified for egg production in chickens re rarely replicated [1, 2, 57]. Despite this, their molecular and biological functions provide insights into potential mechanisms influencing reproductive traits.

At the early stages, functional annotations indicated involvement in oxidative stress response (OXSR1), transmembrane transport (ENSGALG00000005934, ENSGALG00000006018), and actin cytoskeleton organization (XIRP1) for EN1; cell surface receptor signalling (CRHR2) for EN2; lipid metabolism (PLCD1) for EN3; and Immune signalling (MYD88) for both EN2 and EN3.

At later stages, apart from OXSR1 and XIRP1, additional processes include defence response to tumour cells (DLEC1), protein phosphorylation (ENSGALG00000005884), and cell cycle regulation and differentiation via tumour suppressor activity (CTDSPL). Genes consistently identified across multiple periods, both before and after enrichment, were involved in protein biosynthesis (GARS) and included non-coding elements (ENSGALG00000049955).

Together, the biological processes associated with these genes suggest that reproductive performance may be influenced through diverse pathways, particularly those related to cell development, metabolism and immune regulation.

Overall, fine-mapping of GWAS-identified QTLs has successfully uncovered promising genes for multiple traits. However, it is important to keep in mind that the coverage and quality of imputed genotypes can limit the resolution of fine-mapping and downstream analyses such as signal filtration by a p-value threshold, and functional enrichment. In this study, imputation substantially increase the number of variants from ~ 46.6k to ~ 4.9M, yet the resulting dataset is still incomparable to the completeness of WGS data. Furthermore, the concordance between imputed and array genotypes did not exceed 96%, which introduces uncertainty. This limitation may lead to missing causal variants, particularly structural or rare variants, despite the efforts to extend QTL regions by 2Mb upstream and downstream to capture as many informative markers as possible. These factors should be carefully considered when interpreting fine-mapping results, especially in the context of imputed data.

Conclusions

In conclusion, our study demonstrated that GWAS achieves greater detection power and statistical strength when all individuals from a backcrossing scheme are combined and genotype imputation is applied to increase marker density. Moreover, incorporating correlated traits as covariates in GWAS offers valuable insights, helping to distinguish pleiotropic loci from trait-specific signals.

This study is the first to apply Bayesian fine-mapping to GWAS-identified QTLs in chickens, leveraging a backcrossing scheme to successfully pinpoint key genes underlying economically important traits. Our findings not only confirmed previously reported loci but also provided new insights into the complexity of trait inheritance. Nevertheless, the coverage and accuracy of genotype imputation may limit the fine-mapping resolution. Overall, this study highlights the power of Bayesian fine-mapping approaches to refine GWAS signals and deepen our understanding of the genetic architecture of complex traits, offering promising targets for future validation and breeding applications.

List of abbreviations

SNP

Single Nucleotide Polymorphism

GWAS

Genome-Wide Association Study

QTL

Quantitative Trait Locus

Egg Number

Egg Weight

Body Weight

Linkage Disequilibrium

White Layer

WGS

Whole-Genome Sequence

DR2

Dosage R-Square (Imputation Quality Metric)

IQS

Imputation Quality Score

GRM

Genetic Relationship Matrix

MAF

Minor Allele Frequency

Credible Set

PPC

Posterior Probability of Causality

VEP

Variant Effect Predictor

Coefficient of Variance

chr

Chromosome

Declarations

Ethics approval and consent to participate

This project is in accordance with the German legal and ethical requirements of appropriate procedures.

The project was reviewed and approved by the Ethics committee of the Friedrich-Loeffler-Institut (FLI), documented under number E-5-25-Nds.

All animal work followed relevant national and international animal welfare guidelines.

The animals are owned by the breeding company Lohmann Breeders GmbH. Blood samples were collected by the company’s veterinary department as part of routine health monitoring and in strict accordance with applicable animal welfare regulations. Remaining blood material was used for genotyping in this study. All other phenotypic data—daily egg number, egg weight, and a single body weight measurement at 32 weeks of age—were recorded non-invasively as part of the routine breeding program. No procedures caused harm or undue stress to the animals, and no birds were sacrificed.

Lohmann Breeders GmbH as well as H&N International GmbH are part of the EW Group GmbH. Senior scientists employed by these companies are co-authors of this study and have consented the use of the animals and their data. Informed consent was obtained from all owners. All authors revised the manuscript and approved the final version.

Data Availability

Restrictions apply to the availability of the genotype data analysed in this study, which is partially owned by Lohmann Breeders (Cuxhaven, Germany) and are available from the authors only with the permission of Lohmann Breeders upon a reasonable request.

Funding

The data of this work is supported by the EU Horizon 2020 Research and Innovation Programme IMAGE project (Innovative Management of Genetic Resources H2020: 677353). The primary author is funded by the RegioHuhn project (Project No. Ma-0103) supported by BMLEH (Bundesministeriums für Landwirtschaft, Ernährung und Heimat).

Author Contribution

SW and HS designed the backcrossing scheme. The breeding and animal management were carried out by Lohmann Breeders GmbH, who also provided the phenotypic and genotypic data. CR, JG, and SCM jointly designed the genomic analysis framework. SCM performed the genomic analyses, conducted data interpretation, and wrote the manuscript. CR contributed to GWAS analysis and manuscript writing. JB assisted with the development of the Snakemake workflow and contributed to the liftover and genotype imputation processes. All authors revised the manuscript and approved the final version.

Acknowledgement

We thank Dr. Jicai Jiang, author of the BFMAP software, for his kind and timely support in address-ing technical questions during our fine-mapping analysis.

Electronic Supplementary Material

Below is the link to the electronic supplementary material

Supplementary Material 1

References

Liu W, Li D, Liu J, Chen S, Qu L, Zheng J, et al. A genome-wide SNP scan reveals novel loci for egg production and quality traits in white leghorn and brown-egg dwarf layers. PLoS ONE. 2011;6:e28600. 10.1371/journal.pone.0028600.

Liao R, Zhang X, Chen Q, Wang Z, Wang Q, Yang C, Pan Y. Genome-wide association study reveals novel variants for growth and egg traits in Dongxiang blue-shelled and White Leghorn chickens. Anim Genet. 2016;47:588–96. 10.1111/age.12456.

Gu X, Feng C, Ma L, Song C, Wang Y, Da Y, et al. Genome-wide association study of body weight in chicken F2 resource population. PLoS ONE. 2011;6:e21872. 10.1371/journal.pone.0021872.

Xie L, Luo C, Zhang C, Zhang R, Tang J, Nie Q, et al. Genome-wide association study identified a narrow chromosome 1 region associated with chicken growth traits. PLoS ONE. 2012;7:e30910. 10.1371/journal.pone.0030910.

Sun Y, Zhao G, Liu R, Zheng M, Hu Y, Wu D, et al. The identification of 14 new genes for meat quality traits in chicken using a genome-wide association study. BMC Genomics. 2013;14:458. 10.1186/1471-2164-14-458.

Fife MS, Howell JS, Salmon N, Hocking PM, van Diemen PM, Jones MA, et al. Genome-wide SNP analysis identifies major QTL for Salmonella colonization in the chicken. Anim Genet. 2011;42:134–40. 10.1111/j.1365-2052.2010.02090.x.

Wolc A, Arango J, Jankowski T, Dunn I, Settar P, Fulton JE, et al. Genome-wide association study for egg production and quality in layer chickens. J Anim Breed Genet. 2014;131:173–82. 10.1111/jbg.12086.

Gao J, Xu W, Zeng T, Tian Y, Wu C, Liu S, et al. Genome-Wide Association Study of Egg-Laying Traits and Egg Quality in LingKun Chickens. Front Vet Sci. 2022;9:877739. 10.3389/fvets.2022.877739.

Wolc A, Arango J, Settar P, Fulton JE, O'Sullivan NP, Preisinger R, et al. Genome-wide association analysis and genetic architecture of egg weight and egg uniformity in layer chickens. Anim Genet. 2012;43(Suppl 1):87–96. 10.1111/j.1365-2052.2012.02381.x.

10.

Liu Z, Sun C, Yan Y, Li G, Wu G, Liu A, Yang N. Genome-Wide Association Analysis of Age-Dependent Egg Weights in Chickens. Front Genet. 2018;9:128. 10.3389/fgene.2018.00128.

11.

Yi G, Shen M, Yuan J, Sun C, Duan Z, Qu L, et al. Genome-wide association study dissects genetic architecture underlying longitudinal egg weights in chickens. BMC Genomics. 2015;16:746. 10.1186/s12864-015-1945-y.

12.

Zhang Y, Wang Y, Li Y, Wu J, Wang X, Bian C, et al. Genome-wide association study reveals the genetic determinism of growth traits in a Gushi-Anka F2 chicken population. Heredity (Edinb). 2021;126:293–307. 10.1038/s41437-020-00365-x.

13.

Liu R, Sun Y, Zhao G, Wang H, Zheng M, Li P, et al. Identification of loci and genes for growth related traits from a genome-wide association study in a slow- × fast-growing broiler chicken cross. Genes Genom. 2015;37:829–36. 10.1007/s13258-015-0314-1.

14.

Huang S, He Y, Ye S, Wang J, Yuan X, Zhang H, et al. Genome-wide association study on chicken carcass traits using sequence data imputed from SNP array. J Appl Genet. 2018;59:335–44. 10.1007/s13353-018-0448-3.

15.

Ye S, Chen Z-T, Zheng R, Diao S, Teng J, Yuan X, et al. New Insights From Imputed Whole-Genome Sequence-Based Genome-Wide Association Analysis and Transcriptome Analysis: The Genetic Mechanisms Underlying Residual Feed Intake in Chickens. Front Genet. 2020;11:243. 10.3389/fgene.2020.00243.

16.

Pértille F, Moreira GCM, Zanella R, Da Nunes JRS, Boschiero C, Rovadoscki GA, et al. Genome-wide association study for performance traits in chickens using genotype by sequencing approach. Sci Rep. 2017;7:41748. 10.1038/srep41748.

17.

Spencer CCA, Su Z, Donnelly P, Marchini J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 2009;5:e1000477. 10.1371/journal.pgen.1000477.

18.

Wang Y, Zhang F, Mukiibi R, Chen L, Vinsky M, Plastow G, et al. Genetic architecture of quantitative traits in beef cattle revealed by genome wide association studies of imputed whole genome sequence variants: II: carcass merit traits. BMC Genomics. 2020;21:38. 10.1186/s12864-019-6273-1.

19.

Zhang F, Wang Y, Mukiibi R, Chen L, Vinsky M, Plastow G, et al. Genetic architecture of quantitative traits in beef cattle revealed by genome wide association studies of imputed whole genome sequence variants: I: feed efficiency and component traits. BMC Genomics. 2020;21:36. 10.1186/s12864-019-6362-1.

20.

Ros-Freixedes R. The contribution of whole-genome sequence data to genome-wide association studies in livestock: Outcomes and perspectives. Livest Sci. 2024;281:105430. 10.1016/j.livsci.2024.105430.

21.

Li J, Wang Z, Lubritz D, Arango J, Fulton J, Settar P, et al. Genome-wide association studies for egg quality traits in White Leghorn layers using low-pass sequencing and SNP chip data. J Anim Breed Genet. 2022;139:380–97. 10.1111/jbg.12679.

22.

Heidaritabar M, Huisman A, Krivushin K, Stothard P, Dervishi E, Charagu P, et al. Imputation to whole-genome sequence and its use in genome-wide association studies for pork colour traits in crossbred and purebred pigs. Front Genet. 2022;13:1022681. 10.3389/fgene.2022.1022681.

23.

Besnier F, Wahlberg P, Rönnegård L, Ek W, Andersson L, Siegel PB, Carlborg O. Fine mapping and replication of QTL in outbred chicken advanced intercross lines. Genet Sel Evol. 2011;43:3. 10.1186/1297-9686-43-3.

24.

Lyu S, Arends D, Nassar MK, Brockmann GA. Fine mapping of a distal chromosome 4 QTL affecting growth and muscle mass in a chicken advanced intercross line. Anim Genet. 2017;48:295–302. 10.1111/age.12532.

25.

Zan Y, Sheng Z, Lillie M, Rönnegård L, Honaker CF, Siegel PB, Carlborg Ö. Artificial Selection Response due to Polygenic Adaptation from a Multilocus, Multiallelic Genetic Architecture. Mol Biol Evol. 2017;34:2678–89. 10.1093/molbev/msx194.

26.

Wang Y, Bu L, Cao X, Qu H, Zhang C, Ren J, et al. Genetic Dissection of Growth Traits in a Unique Chicken Advanced Intercross Line. Front Genet. 2020;11:894. 10.3389/fgene.2020.00894.

27.

Sahana G, Cai Z, Sanchez MP, Bouwman AC, Boichard D. Invited review: Good practices in genome-wide association studies to identify candidate sequence variants in dairy cattle. J Dairy Sci. 2023;106:5218–41. 10.3168/jds.2022-22694.

28.

Yuan J, Wang K, Yi G, Ma M, Dou T, Sun C, et al. Genome-wide association studies for feed intake and efficiency in two laying periods of chickens. Genet Sel Evol. 2015;47:82. 10.1186/s12711-015-0161-1.

29.

Shen M, Qu L, Ma M, Dou T, Lu J, Guo J, et al. Genome-Wide Association Studies for Comb Traits in Chickens. PLoS ONE. 2016;11:e0159081. 10.1371/journal.pone.0159081.

30.

Bolormaa S, Pryce JE, Reverter A, Zhang Y, Barendse W, Kemper K, et al. A multi-trait, meta-analysis for detecting pleiotropic polymorphisms for stature, fatness and reproduction in beef cattle. PLoS Genet. 2014;10:e1004198. 10.1371/journal.pgen.1004198.

31.

Pausch H, Emmerling R, Schwarzenbacher H, Fries R. A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle. Genet Sel Evol. 2016;48:14. 10.1186/s12711-016-0190-4.

32.

Jiang J, Cole JB, Freebern E, Da Y, VanRaden PM, Ma L. Functional annotation and Bayesian fine-mapping reveals candidate genes for important agronomic traits in Holstein bulls. Commun Biol. 2019;2:212. 10.1038/s42003-019-0454-y.

33.

Fang L, Jiang J, Li B, Zhou Y, Freebern E, VanRaden PM, et al. Genetic and epigenetic architecture of paternal origin contribute to gestation length in cattle. Commun Biol. 2019;2:100. 10.1038/s42003-019-0341-6.

34.

Freebern E, Santos DJA, Fang L, Jiang J, Parker Gaddis KL, Liu GE, et al. GWAS and fine-mapping of livability and six disease traits in Holstein cattle. BMC Genomics. 2020;21:41. 10.1186/s12864-020-6461-z.

35.

Wang Z, Qu L, Yao J, Yang X, Li G, Zhang Y, et al. An EAV-HP insertion in 5' Flanking region of SLCO1B3 causes blue eggshell in the chicken. PLoS Genet. 2013;9:e1003183. 10.1371/journal.pgen.1003183.

36.

Kranis A, Gheyas AA, Boschiero C, Turner F, Le Yu, Smith S, et al. Development of a high density 600K SNP genotyping array for chicken. BMC Genomics. 2013;14:59. 10.1186/1471-2164-14-59.

37.

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75. 10.1086/519795.

38.

Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–9. 10.1038/ng.608.

39.

Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82.

40.

Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet. 2014;46:100–6. 10.1038/ng.2876.

41.

Bland JM, Altman DG. Multiple significance tests: the Bonferroni method. BMJ. 1995;310:170. 10.1136/bmj.310.6973.170.

42.

Browning BL, Zhou Y, Browning SR. A One-Penny Imputed Genome from Next-Generation Reference Panels. Am J Hum Genet. 2018;103:338–48. 10.1016/j.ajhg.2018.07.015.

43.

Browning BL, Tian X, Zhou Y, Browning SR. Fast two-stage phasing of large-scale sequence data. Am J Hum Genet. 2021;108:1880–90. 10.1016/j.ajhg.2021.08.005.

44.

Smith J, Alfieri JM, Anthony N, Arensburger P, Athrey GN, Balacco J, et al. Fourth Report on Chicken Genes and Chromosomes 2022. Cytogenet Genome Res. 2022;162:405–528. 10.1159/000529376.

45.

Genovese G, Rockweiler NB, Gorman BR, Bigdeli TB, Pato MT, Pato CN, et al. BCFtools/liftover: an accurate and comprehensive tool to convert genetic variants across genome assemblies. Bioinformatics. 2024. 10.1093/bioinformatics/btae038.

46.

University of California, Santa Cruz (UCSC). LiftOver Howto - genomewiki. 26/04/2018. https://genomewiki.ucsc.edu/index.php/LiftOver_Howto

47.

Lin P, Hartz SM, Zhang Z, Saccone SF, Wang J, Tischfield JA, et al. A new statistic to evaluate imputation reliability. PLoS ONE. 2010;5:e9697. 10.1371/journal.pone.0009697.

48.

Jiang L, Zheng Z, Qi T, Kemper KE, Wray NR, Visscher PM, Yang J. A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet. 2019;51:1749–55. 10.1038/s41588-019-0530-8.

49.

R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2023.

50.

Taiyun Wei and Viliam Simko. R package 'corrplot'. Visualization of a Correlation Matrix; 2021.

51.

Lee SH, Yang J, Goddard ME, Visscher PM, Wray NR. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics. 2012;28:2540–2. 10.1093/bioinformatics/bts474.

52.

Deary IJ, Yang J, Davies G, Harris SE, Tenesa A, Liewald D, et al. Genetic contributions to stability and change in intelligence from childhood to old age. Nature. 2012;482:212–5. 10.1038/nature10781.

53.

Benner C, Spencer CCA, Havulinna AS, Salomaa V, Ripatti S, Pirinen M. Bioinformatics. 2016;32:1493–501. 10.1093/bioinformatics/btw018. FINEMAP: efficient variable selection using summary data from genome-wide association studies.

54.

Jicai Jiang. gemrich: Genetic Effect Mapping enRICHment_. R package version 0.1.3; 2025.

55.

McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17:122. 10.1186/s13059-016-0974-4.

56.

Köster J, Rahmann S. Snakemake–a scalable bioinformatics workflow engine. Bioinformatics. 2012;28:2520–2. 10.1093/bioinformatics/bts480.

57.

Liu Z, Yang N, Yan Y, Li G, Liu A, Wu G, Sun C. Genome-wide association analysis of egg production performance in chickens across the whole laying period. BMC Genet. 2019;20:67. 10.1186/s12863-019-0771-7.

58.

Yuan J, Sun C, Dou T, Yi G, Qu L, Qu L, et al. Identification of Promising Mutants Associated with Egg Production Traits Revealed by Genome-Wide Association Study. PLoS ONE. 2015;10:e0140615. 10.1371/journal.pone.0140615.

59.

Engström G, Liljedahl L, Wilhelmson M, Johansson K. The pattern of genetic and environmental variation in relation to ageing in laying hens. 1992:24:265–75.

60.

Lindholm-Perry AK, Andrea K, Sexten LA, Kuehn, Timothy PL, Smith DA, King, Steven D, Shackelford et al. Association, effects and validation of polymorphisms within the NCAPG - LCORL locus located on BTA6 with feed intake, gain, meat and carcass traits in beef cattle. 2011.

61.

Lindholm-Perry AK, Kuehn LA, Oliver WT, Sexten AK, Miles JR, Rempel LA, et al. Adipose and muscle tissue gene expression of two genes (NCAPG and LCORL) located in a chromosomal region associated with cattle feed intake and gain. PLoS ONE. 2013;8:e80882. 10.1371/journal.pone.0080882.

62.

Majeres LE, Dilger AC, Shike DW, McCann JC, Beever JE. PSV-8 Investigation of a haplotype and eQTL analysis of the LCORL-NCAPG locus associated with increased lean growth in beef cattle. J Anim Sci. 2024;102:512. 10.1093/jas/skae234.580.

63.

Metzger J, Schrimpf R, Philipp U, Distl O. Expression Levels of LCORLAre Associated with Body Size in Horses 2013. 10.1371/journal.pone.0056497.g001

64.

Reich P, Möller S, Stock KF, Nolte W, von Depka Prondzinski M, Reents R, et al. Genomic analyses of withers height and linear conformation traits in German Warmblood horses using imputed sequence-level genotypes. Genet Sel Evol. 2024;56:45. 10.1186/s12711-024-00914-6.

65.

Rubin C-J, Megens H-J, Martinez Barrio A, Maqbool K, Sayyab S, Schwochow D, et al. Strong signatures of selection in the domestic pig genome. Proc Natl Acad Sci U S A. 2012;109:19529–36. 10.1073/pnas.1217149109.

66.

Bai 白凤庭 F, Cai 蔡钰东 Y, Qiu 邱敏 M, Liang 梁晨 C, Pan 潘麟茜 L, Liu 刘雅怡 Y, et al. LCORL and STC2 Variants Increase Body Size and Growth Rate in Cattle and Other Animals. Genomics Proteom Bioinf. 2025. 10.1093/gpbjnl/qzaf025.

67.

Pereira AP, de Alencar MM, de Oliveira HN, Regitano LCdA. Association of GH and IGF-1 polymorphisms with growth traits in a synthetic beef cattle breed. Genet Mol Biol. 2005;28:230–6. 10.1590/S1415-47572005000200009.

68.

An X, Wang L, Hou J, Li G, Song Y, Wang J, et al. Novel polymorphisms of goat growth hormone and growth hormone receptor genes and their effects on growth traits. Mol Biol Rep. 2011;38:4037–43. 10.1007/s11033-010-0522-3.

69.

Zhou Z, Li M, Cheng H, Fan W, Yuan Z, Gao Q, et al. An intercross population study reveals genes associated with body size and plumage color in ducks. Nat Commun. 2018;9:2648. 10.1038/s41467-018-04868-4.

70.

Wang Z, Zhang X, Jiang E, Yan H, Zhu H, Chen H, et al. InDels within caprine IGF2BP1 intron 2 and the 3'-untranslated regions are associated with goat growth traits. Anim Genet. 2020;51:117–21. 10.1111/age.12871.

71.

Dou D, Shen L, Zhou J, Cao Z, Luan P, Li Y, et al. Genome-wide association studies for growth traits in broilers. BMC Genom Data. 2022;23:1. 10.1186/s12863-021-01017-7.

72.

Wang K, Hu H, Tian Y, Li J, Scheben A, Zhang C, et al. The Chicken Pan-Genome Reveals Gene Content Variation and a Promoter Region Deletion in IGF2BP1 Affecting Body Size. Mol Biol Evol. 2021;38:5066–81. 10.1093/molbev/msab231.

73.

Chen J, Ren X, Li L, Lu S, Chen T, Tan L, et al. Integrative Analyses of mRNA Expression Profile Reveal the Involvement of IGF2BP1 in Chicken Adipogenesis. Int J Mol Sci. 2019. 10.3390/ijms20122923.

74.

Luo J, Yang Z, Li X, Xiao C, Yuan H, Yang X, et al. High Muscle Expression of IGF2BP1 Gene Promotes Proliferation and Differentiation of Chicken Primary Myoblasts: Results of Transcriptome Analysis. Anim (Basel). 2024. 10.3390/ani14142024.

75.

Feng G, Zhao J, Lei J, Wang L, Wang S, Zhang T, Lu H. Identification of Igf2bp1 gene family and effect on chicken myoblast proliferation. J Appl Anim Res. 2021;49:194–202. 10.1080/09712119.2021.1932916.

76.

Tan X, Liu L, Liu X, Cui H, Liu R, Zhao G, Wen J. Large-Scale Whole Genome Sequencing Study Reveals Genetic Architecture and Key Variants for Breast Muscle Weight in Native Chickens. Genes (Basel). 2021. 10.3390/genes13010003.

77.

Zhu Y, Liu X, Wang Y, Liu L, Wang Y, Zhao G, et al. Genome-Wide Association Study Revealed the Effect of rs312715211 in ZNF652 Gene on Abdominal Fat Percentage of Chickens. Biology (Basel). 2022. 10.3390/biology11121849.

78.

Zhu X-N, Wang Y-Z, Li C, Wu H-Y, Zhang R, Hu X-X. Chicken chromatin accessibility atlas accelerates epigenetic annotation of birds and gene fine-mapping associated with growth traits. Zool Res. 2023;44:53–62. 10.24272/j.issn.2095-8137.2022.228.

79.

Ou J-H, Rönneburg T, Carlborg Ö, Honaker CF, Siegel PB, Rubin C-J. Complex genetic architecture of the chicken Growth1 QTL region. PLoS ONE. 2024;19:e0295109. 10.1371/journal.pone.0295109.

80.

Wu Z, Bortoluzzi C, Derks MFL, Liu L, Bosse M, Hiemstra SJ, et al. Heterogeneity of a dwarf phenotype in Dutch traditional chicken breeds revealed by genomic analyses. Evol Appl. 2021;14:1095–108. 10.1111/eva.13183.

81.

LI Y, BAI X, LIU X, WANG W, LI Z, Wang N, et al. Integration of genome-wide association study and selection signatures reveals genetic determinants for skeletal muscle production traits in an F2 chicken population. J Integr Agric. 2022;21:2065–75. 10.1016/S2095-3119(21)63805-4.

82.

Jiang X, Chu Q, Wei G, Gu H, Zhang X, Ren X, et al. A significant genomic region underlying growth traits in adult Beijing You chicken identified by genome-wide association analysis. Poult Sci. 2025;104:105326. 10.1016/j.psj.2025.105326.

83.

Lyu S, Arends D, Nassar MK, Weigend A, Weigend S, Preisinger R, Brockmann GA. Reducing the interval of a growth QTL on chromosome 4 in laying hens. Anim Genet. 2018;49:467–71. 10.1111/age.12685.

84.

Wang J, Liu J, Lei Q, Liu Z, Han H, Zhang S, et al. Elucidation of the genetic determination of body weight and size in Chinese local chicken breeds by large-scale genomic analyses. BMC Genomics. 2024;25:296. 10.1186/s12864-024-10185-6.

85.

Yang S, Ning C, Yang C, Li W, Zhang Q, Wang D, Tang H. Identify Candidate Genes Associated with the Weight and Egg Quality Traits in Wenshui Green Shell-Laying Chickens by the Copy Number Variation-Based Genome-Wide Association Study. Veterinary Sci. 2024;11:76. 10.3390/vetsci11020076.

86.

Li YD, Liu X, Li ZW, Wang WJ, Li YM, Cao ZP, et al. A combination of genome-wide association study and selection signature analysis dissects the genetic architecture underlying bone traits in chickens. Animal. 2021;15:100322. 10.1016/j.animal.2021.100322.

87.

Wang S, Wang Y, Li Y, Xiao F, Guo H, Gao H, et al. Genome-Wide Association Study and Selective Sweep Analysis Reveal the Genetic Architecture of Body Weights in a Chicken F2 Resource Population. Front Vet Sci. 2022;9:875454. 10.3389/fvets.2022.875454.

88.

Yang R, Xu Z, Wang Q, Di Zhu, Bian C, Ren J, et al. Genome–wide association study and genomic prediction for growth traits in yellow-plumage chicken using genotyping-by-sequencing. Genet Sel Evol. 2021;53:82. 10.1186/s12711-021-00672-9.

Supplementary Information

Supplementary Figs. 1. Combination of Manhattan plots of array and imputed GWAS across traits and groups.

A folder containing PNG files named by traits. Each PNG file shows several Manhattan plots accordingly and named by [Method]_[Trait]_[Group]. Chromosomes are on the x-axis and

$\:-{log}_{10}\left(P\right)$

on the y-axis. The horizontal blue line indicates the Bonferroni significance threshold. Variants passing the threshold were highlighted in bright colour.

Supplementary Figs. 2. Combination of Manhattan plots of array and imputed GWAS among groups, including BW32 as a covariate for EW (EW30, EW40, EW50, EW70) and vice versa.

A folder containing PNG files named by [Trait]cov[Covariate]. Each PNG file shows several Manhattan plots accordingly and named by [Method]_[Trait]cov[Covariate]_[Group]. Chromosomes are on the x-axis and

$\:-{log}_{10}\left(P\right)$

on the y-axis. The horizontal blue line indicates the Bonferroni significance threshold. Variants passing the threshold were highlighted in bright colour.

Supplementary Figs. 3. Visualization of phenotypic distribution between generations.

Page 1 – Percentage of missing data for each trait across generations.

Page 2, 3, 4 – Distribution of BW32, EW, EN, respectively, across generations.

Supplementary Data 1.

Phenotypic records and generation of individuals.

Supplementary Data 2.

QTLs of 15 traits for fine-mapping and their estimated heritability, number of variants in the credible set before signal filtration, number of variants within the QTL.

Supplementary Data 3.

Fine-mapping pinpointed genes before signal filtration with summed PPC over 0.01, ordered in descending order by chromosomes. This data is an Excel worksheet with each sheet showing results of one trait.

Supplementary Data 4.

Enrichment factor and probability of including causal variants for a functional annotated category, ordered by enrichment in descending order. This data is an Excel worksheet with each sheet showing results of one trait.

Supplementary Data 5.

Credible sets of fine-mapping pinpointed variants after signal filtration with their original and re-normalized PPC, ordered by the later in descending order. This data is an Excel worksheet with each sheet showing results of one trait.

Supplementary Data 6.

Functional enrichment of genes after signal filtration with summed PPC over 0.01, ordered in descending order by chromosomes. This data is an Excel worksheet with each sheet showing results of one trait.

Yes