首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Although genomic selection offers the prospect of improving the rate of genetic gain in meat, wool and dairy sheep breeding programs, the key constraint is likely to be the cost of genotyping. Potentially, this constraint can be overcome by genotyping selection candidates for a low density (low cost) panel of SNPs with sparse genotype coverage, imputing a much higher density of SNP genotypes using a densely genotyped reference population. These imputed genotypes would then be used with a prediction equation to produce genomic estimated breeding values. In the future, it may also be desirable to impute very dense marker genotypes or even whole genome re‐sequence data from moderate density SNP panels. Such a strategy could lead to an accurate prediction of genomic estimated breeding values across breeds, for example. We used genotypes from 48 640 (50K) SNPs genotyped in four sheep breeds to investigate both the accuracy of imputation of the 50K SNPs from low density SNP panels, as well as prospects for imputing very dense or whole genome re‐sequence data from the 50K SNPs (by leaving out a small number of the 50K SNPs at random). Accuracy of imputation was low if the sparse panel had less than 5000 (5K) markers. Across breeds, it was clear that the accuracy of imputing from sparse marker panels to 50K was higher if the genetic diversity within a breed was lower, such that relationships among animals in that breed were higher. The accuracy of imputation from sparse genotypes to 50K genotypes was higher when the imputation was performed within breed rather than when pooling all the data, despite the fact that the pooled reference set was much larger. For Border Leicesters, Poll Dorsets and White Suffolks, 5K sparse genotypes were sufficient to impute 50K with 80% accuracy. For Merinos, the accuracy of imputing 50K from 5K was lower at 71%, despite a large number of animals with full genotypes (2215) being used as a reference. For all breeds, the relationship of individuals to the reference explained up to 64% of the variation in accuracy of imputation, demonstrating that accuracy of imputation can be increased if sires and other ancestors of the individuals to be imputed are included in the reference population. The accuracy of imputation could also be increased if pedigree information was available and was used in tracking inheritance of large chromosome segments within families. In our study, we only considered methods of imputation based on population‐wide linkage disequilibrium (largely because the pedigree for some of the populations was incomplete). Finally, in the scenarios designed to mimic imputation of high density or whole genome re‐sequence data from the 50K panel, the accuracy of imputation was much higher (86–96%). This is promising, suggesting that in silico genome re‐sequencing is possible in sheep if a suitable pool of key ancestors is sequenced for each breed.  相似文献   

2.
Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval increases, the imputation accuracies decay, although not at an alarming rate. In absence of updating of the reference population, accuracy of GEBVs decays substantially in one or two generations at the rate of 20% to 25% per generation. When the reference population is updated by 1% or 5% every generation, the decay in accuracy was 8% to 11% after seven generations using true and imputed genotypes. These results indicate that imputed genotypes provide a viable alternative, even after several generations, as long the reference and training populations are appropriately updated to reflect the genetic change in the population.  相似文献   

3.
T Druet  I M Macleod  B J Hayes 《Heredity》2014,112(1):39-47
Genomic prediction from whole-genome sequence data is attractive, as the accuracy of genomic prediction is no longer bounded by extent of linkage disequilibrium between DNA markers and causal mutations affecting the trait, given the causal mutations are in the data set. A cost-effective strategy could be to sequence a small proportion of the population, and impute sequence data to the rest of the reference population. Here, we describe strategies for selecting individuals for sequencing, based on either pedigree relationships or haplotype diversity. Performance of these strategies (number of variants detected and accuracy of imputation) were evaluated in sequence data simulated through a real Belgian Blue cattle pedigree. A strategy (AHAP), which selected a subset of individuals for sequencing that maximized the number of unique haplotypes (from single-nucleotide polymorphism panel data) sequenced gave good performance across a range of variant minor allele frequencies. We then investigated the optimum number of individuals to sequence by fold coverage given a maximum total sequencing effort. At 600 total fold coverage (x 600), the optimum strategy was to sequence 75 individuals at eightfold coverage. Finally, we investigated the accuracy of genomic predictions that could be achieved. The advantage of using imputed sequence data compared with dense SNP array genotypes was highly dependent on the allele frequency spectrum of the causative mutations affecting the trait. When this followed a neutral distribution, the advantage of the imputed sequence data was small; however, when the causal mutations all had low minor allele frequencies, using the sequence data improved the accuracy of genomic prediction by up to 30%.  相似文献   

4.
  1. Download : Download high-res image (200KB)
  2. Download : Download full-size image
  相似文献   

5.
The dog is a valuable model species for the genetic analysis of complex traits, and the use of genotype imputation in dogs will be an important tool for future studies. It is of particular interest to analyse the effect of factors like single nucleotide polymorphism (SNP) density of genotyping arrays and relatedness between dogs on imputation accuracy due to the acknowledged genetic and pedigree structure of dog breeds. In this study, we simulated different genotyping strategies based on data from 1179 Labrador Retriever dogs. The study involved 5826 SNPs on chromosome 1 representing the high density (HighD) array; the low‐density (LowD) array was simulated by masking different proportions of SNPs on the HighD array. The correlations between true and imputed genotypes for a realistic masking level of 87.5% ranged from 0.92 to 0.97, depending on the scenario used. A correlation of 0.92 was found for a likely scenario (10% of dogs genotyped using HighD, 87.5% of HighD SNPs masked in the LowD array), which indicates that genotype imputation in Labrador Retrievers can be a valuable tool to reduce experimental costs while increasing sample size. Furthermore, we show that genotype imputation can be performed successfully even without pedigree information and with low relatedness between dogs in the reference and validation sets. Based on these results, the impact of genotype imputation was evaluated in a genome‐wide association analysis and genomic prediction in Labrador Retrievers.  相似文献   

6.
Joint genomic prediction (GP) is an attractive method to improve the accuracy of GP by combining information from multiple populations. However, many factors can negatively influence the accuracy of joint GP, such as differences in linkage disequilibrium phasing between single nucleotide polymorphisms (SNPs) and causal variants, minor allele frequencies and causal variants’ effect sizes across different populations. The objective of this study was to investigate whether the imputed high-density genotype data can improve the accuracy of joint GP using genomic best linear unbiased prediction (GBLUP), single-step GBLUP (ssGBLUP), multi-trait GBLUP (MT-GBLUP) and GBLUP based on genomic relationship matrix considering heterogenous minor allele frequencies across different populations (wGBLUP). Three traits, including days taken to reach slaughter weight, backfat thickness and loin muscle area, were measured on 67 276 Large White pigs from two different populations, for which 3334 were genotyped by SNP array. The results showed that a combined population could substantially improve the accuracy of GP compared with a single-population GP, especially for the population with a smaller size. The imputed SNP data had no effect for single population GP but helped to yield higher accuracy than the medium-density array data for joint GP. Of the four methods, ssGLBUP performed the best, but the advantage of ssGBLUP decreased as more individuals were genotyped. In some cases, MT-GBLUP and wGBLUP performed better than GBLUP. In conclusion, our results confirmed that joint GP could be beneficial from imputed high-density genotype data, and the wGBLUP and MT-GBLUP methods are promising for joint GP in pig breeding.  相似文献   

7.
8.
High-density single nucleotide polymorphism (SNP) platforms are currently used in genomic selection (GS) programs to enhance the selection response. However, the genotyping of a large number of animals with high-throughput platforms is rather expensive and may represent a constraint for a large-scale implementation of GS. The use of low-density marker (LDM) platforms could overcome this problem, but different SNP chips may be required for each trait and/or breed. In this study, a strategy of imputation independent from trait and breed is proposed. A simulated population of 5865 individuals with a genome of 6000 SNP equally distributed on six chromosomes was considered. First, reference and prediction populations were generated by mimicking high- and low-density SNP platforms, respectively. Then, the partial least squares regression (PLSR) technique was applied to reconstruct the missing SNP in the low-density chip. The proportion of SNP correctly reconstructed by the PLSR method ranged from 0.78 to 0.97 when 90% and 50%, respectively, of genotypes were predicted. Moreover, data sets consisting of a mixture of actual and PLSR-predicted SNP or only actual SNP were used to predict genomic breeding values (GEBVs). Correlations between GEBV and true breeding values varied from 0.74 to 0.76, respectively. The results of the study indicate that the PLSR technique can be considered a reliable computational strategy for predicting SNP genotypes in an LDM platform with reasonable accuracy.  相似文献   

9.
Genetic selection against boar taint, which is caused by high skatole and androstenone concentrations in fat, is a more acceptable alternative than is the current practice of castration. Genomic predictors offer an opportunity to overcome the limitations of such selection caused by the phenotype being expressed only in males at slaughter, and this study evaluated different approaches to obtain such predictors. Samples from 1000 pigs were included in a design which was dominated by 421 sib pairs, each pair having one animal with high and one with low skatole concentration (≥0.3 μg/g). All samples were measured for both skatole and androstenone and genotyped using the Illumina SNP60 porcine BeadChip for 62 153 single nucleotide polymorphisms. The accuracy of predicting phenotypes was assessed by cross‐validation using six different genomic evaluation methods: genomic best linear unbiased prediction (GBLUP) and five Bayesian regression methods. In addition, this was compared to the accuracy of predictions using only QTL that showed genome‐wide significance. The range of accuracies obtained by different prediction methods was narrow for androstenone, between 0.29 (Bayes Lasso) and 0.31 (Bayes B), and wider for skatole, between 0.21 (GBLUP) and 0.26 (Bayes SSVS). Relative accuracies, corrected for h2, were 0.54–0.56 and 0.75–0.94 for androstenone and skatole respectively. The whole‐genome evaluation methods gave greater accuracy than using only the QTL detected in the data. The results demonstrate that GBLUP for androstenone is the simplest genomic technology to implement and was also close to the most accurate method. More specialised models may be preferable for skatole.  相似文献   

10.
Since the beginning of the genomic era, the number of available single nucleotide polymorphism (SNP) arrays has grown considerably. In the bovine species alone, 11 SNP chips not completely covered by intellectual property are currently available, and the number is growing. Genomic/genotype data are not standardized, and this hampers its exchange and integration. In addition, software used for the analyses of these data usually requires not standard (i.e. case specific) input files which, considering the large amount of data to be handled, require at least some programming skills in their production. In this work, we describe a software toolkit for SNP array data management, imputation, genome‐wide association studies, population genetics and genomic selection. However, this toolkit does not solve the critical need for standardization of the genotypic data and software input files. It only highlights the chaotic situation each researcher has to face on a daily basis and gives some helpful advice on the currently available tools in order to navigate the SNP array data complexity.  相似文献   

11.
近年来,随着基因芯片技术的发展与育种技术的进步,动植物的基因组选择成为研究热点。在家畜育种中,基因组选择凭借其准确性高、世代间隔短和育种成本低等优势被应用于各种经济动物的种畜选择中。本文详细介绍了基因分型技术和基因组育种值估计方法(最小二乘法、RR-BLUP法、GBLUP法、ssGBLUP法、贝叶斯A法、贝叶斯B法等),并对这些育种方法选用的标记范围、准确性以及计算速度进行了比较,总结了我国和其他国家基因组选择在种畜选择中的应用情况及存在的问题,展望了目前国内外在基因组选择上的最新研究动态及进展,以期为其他育种工作者进一步了解基因组选择提供参考。  相似文献   

12.
13.
To isolate the novel genes related to human hepatocellular carcinoma (HCC), we sequenced P1-derived artificial chromosome PAC579 (D17S926 locus) mapped in the minimum LOH (loss of heterozygosity) deletion region of chromosome 17p13.3 in HCC. Four novel genes mapped in this genomic sequence area were isolated and cloned by wet-lab experiments, and the exons of these genes were located. 0–60 kb of this genomic sequence including the genes of interest was scanned with five different computational exon prediction programs as well as four splice site recognition programs. After analyzing and comparing the computationally predicted results with the wet-lab experiment results, some potential exons were predicted in the genomic sequence by using these programs.  相似文献   

14.
Genotyping sheep for genome‐wide SNPs at lower density and imputing to a higher density would enable cost‐effective implementation of genomic selection, provided imputation was accurate enough. Here, we describe the design of a low‐density (12k) SNP chip and evaluate the accuracy of imputation from the 12k SNP genotypes to 50k SNP genotypes in the major Australian sheep breeds. In addition, the impact of imperfect imputation on genomic predictions was evaluated by comparing the accuracy of genomic predictions for 15 novel meat traits including carcass and meat quality and omega fatty acid traits in sheep, from 12k SNP genotypes, imputed 50k SNP genotypes and real 50k SNP genotypes. The 12k chip design included 12 223 SNPs with a high minor allele frequency that were selected with intermarker spacing of 50–475 kb. SNPs for parentage and horned or polled tests also were represented. Chromosome ends were enriched with SNPs to reduce edge effects on imputation. The imputation performance of the 12k SNP chip was evaluated using 50k SNP genotypes of 4642 animals from six breeds in three different scenarios: (1) within breed, (2) single breed from multibreed reference and (3) multibreed from a single‐breed reference. The highest imputation accuracies were found with scenario 2, whereas scenario 3 was the worst, as expected. Using scenario 2, the average imputation accuracy in Border Leicester, Polled Dorset, Merino, White Suffolk and crosses was 0.95, 0.95, 0.92, 0.91 and 0.93 respectively. Imputation scenario 2 was used to impute 50k genotypes for 10 396 animals with novel meat trait phenotypes to compare genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP) with real and imputed 50k genotypes. The weighted mean imputation accuracy achieved was 0.92. The average accuracy of genomic estimated breeding values (GEBVs) based on only 12k data was 0.08 across traits and breeds, but accuracies varied widely. The mean GBLUP accuracies with imputed 50k data more than doubled to 0.21. Accuracies of genomic prediction were very similar for imputed and real 50k genotypes. There was no apparent impact on accuracy of GEBVs as a result of using imputed rather than real 50k genotypes, provided imputation accuracy was >90%.  相似文献   

15.
Computational analysis and prediction for exons of PAC579 genomic sequence   总被引:1,自引:0,他引:1  
To isolate the novel genes related to human hepatocellular carcinoma (HCC), we se-quenced P1-derived artificial chromosome PAC579 (D17S926 locus) mapped in the minimum LOH (loss of heterozygosity) deletion region of chromosome 17p13.3 in HCC. Four novel genes mapped in this genomic sequence area were isolated and cloned by wet-lab experiments, and the exons of these genes were located. 0-60 kb of this genomic sequence including the genes of interest was scanned with five different computational exon prediction programs as well as four splice site recognition programs. After analyzing and comparing the computationally predicted results with the wet-lab experiment results, some potential exons were predicted in the genomic sequence by using these programs.  相似文献   

16.
Single-step genomic BLUP (ssGBLUP) has been widely used in genomic evaluation due to relatively higher prediction accuracy and simplicity of use. The prediction accuracy from ssGBLUP depends on the amount of information available concerning both genotype and phenotype. This study investigated how information on genotype and phenotype that had been acquired from previous generations influences the prediction accuracy of ssGBLUP, and thus we sought an optimal balance about genotypic and phenotypic information to achieve a cost-effective and computationally efficient genomic evaluation. We generated two genetically correlated traits (h2 = 0.35 for trait A, h2 = 0.10 for trait B and genetic correlation 0.20) as well as two distinct populations mimicking purebred swine. Phenotypic and genotypic information in different numbers of previous generations and different genotyping rates for each litter were set to generate different datasets. Prediction accuracy was evaluated by correlating genomic estimated breeding values with true breeding values for genotyped animals in the last generation. The results revealed a negligible impact of previous generations that lacked genotyped animals on the prediction accuracy. Phenotypic and genotypic data, including the most recent three to four generations with a genotyping rate of 40% or 50% for each litter, could lead to asymptotic maximum prediction accuracy for genotyped animals in the last generation. Single-step genomic best linear unbiased prediction yielded an optimal balance about genotypic and phenotypic information to ensure a cost-effective and computationally efficient genomic evaluation of populations of polytocous animals such as purebred pigs.  相似文献   

17.
Genomic information could be used efficiently to improve traits that are expensive to measure, sex limited or expressed late in life. This study analyzed the phenotypic variation explained by major SNPs and windows for age at puberty in gilts, an indicator of reproductive longevity. A genome‐wide association study using 56 424 SNPs explained 25.2% of the phenotypic variation in age at puberty in a training set (= 820). All SNPs from the top 10% of 1‐Mb windows explained 33.5% of the phenotypic variance compared to 47.1% explained by the most informative markers (= 261). In an evaluation population, consisting of subsequent batches (= 412), the predictive ability of all SNPs from the major 1‐Mb windows was higher compared to the variance captured by the most informative SNP from each of these windows. The phenotypic variance explained in the evaluation population varied from 12.3% to 36.8% when all SNPs from major windows were used compared to 6.5–23.7% explained by most informative SNPs. The correlation between phenotype and genomic prediction values based on SNP effects estimated in the training population was marginal compared to their effects retrained in the evaluation population for all (0.46–0.81) or most informative SNPs (0.30–0.65) from major windows. An increase in genetic gain of 20.5% could be obtained if genomic selection included both sexes compared to females alone. The pleiotropic role of major genes such as AVPR1A could be exploited in selection of both age at puberty and reproductive longevity.  相似文献   

18.

Background

The main goal of our study was to investigate the implementation, prospects, and limits of marker imputation for quantitative genetic studies contrasting map-independent and map-dependent algorithms. We used a diversity panel consisting of 372 European elite wheat (Triticum aestivum L.) varieties, which had been genotyped with SNP arrays, and performed intensive simulation studies.

Results

Our results clearly showed that imputation accuracy was substantially higher for map-dependent compared to map-independent methods. The accuracy of marker imputation depended strongly on the linkage disequilibrium between the markers in the reference panel and the markers to be imputed. For the decay of linkage disequilibrium present in European wheat, we concluded that around 45,000 markers are needed for low cost, low-density marker profiling. This will facilitate high imputation accuracy, also for rare alleles. Genomic selection and diversity studies profited only marginally from imputing missing values. In contrast, the power of association mapping increased substantially when missing values were imputed.

Conclusions

Imputing missing values is especially of interest for an economic implementation of association mapping in breeding populations.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1366-y) contains supplementary material, which is available to authorized users.  相似文献   

19.
This review presents a broader approach to the implementation and study of runs of homozygosity (ROH) in animal populations, focusing on identifying and characterizing ROH and their practical implications. ROH are continuous homozygous segments that are common in individuals and populations. The ability of these homozygous segments to give insight into a population's genetic events makes them a useful tool that can provide information about the demographic evolution of a population over time. Furthermore, ROH provide useful information about the genetic relatedness among individuals, helping to minimize the inbreeding rate and also helping to expose deleterious variants in the genome. The frequency, size and distribution of ROH in the genome are influenced by factors such as natural and artificial selection, recombination, linkage disequilibrium, population structure, mutation rate and inbreeding level. Calculating the inbreeding coefficient from molecular information from ROH (FROH) is more accurate for estimating autozygosity and for detecting both past and more recent inbreeding effects than are estimates from pedigree data (FPED). The better results of FROH suggest that FROH can be used to infer information about the history and inbreeding levels of a population in the absence of genealogical information. The selection of superior animals has produced large phenotypic changes and has reshaped the ROH patterns in various regions of the genome. Additionally, selection increases homozygosity around the target locus, and deleterious variants are seen to occur more frequently in ROH regions. Studies involving ROH are increasingly common and provide valuable information about how the genome's architecture can disclose a population's genetic background. By revealing the molecular changes in populations over time, genome‐wide information is crucial to understanding antecedent genome architecture and, therefore, to maintaining diversity and fitness in endangered livestock breeds.  相似文献   

20.
The purpose of this study is review and evaluation of computing methods used in genomic selection for animal breeding. Commonly used models include SNP BLUP with extensions (BayesA, etc), genomic BLUP (GBLUP) and single-step GBLUP (ssGBLUP). These models are applied for genomewide association studies (GWAS), genomic prediction and parameter estimation. Solving methods include finite Cholesky decomposition possibly with a sparse implementation, and iterative Gauss–Seidel (GS) or preconditioned conjugate gradient (PCG), the last two methods possibly with iteration on data. Details are provided that can drastically decrease some computations. For SNP BLUP especially with sampling and large number of SNP, the only choice is GS with iteration on data and adjustment of residuals. If only solutions are required, PCG by iteration on data is a clear choice. A genomic relationship matrix (GRM) has limited dimensionality due to small effective population size, resulting in infinite number of generalized inverses of GRM for large genotyped populations. A specific inverse called APY requires only a small fraction of GRM, is sparse and can be computed and stored at a low cost for millions of animals. With APY inverse and PCG iteration, GBLUP and ssGBLUP can be applied to any population. Both tools can be applied to GWAS. When the system of equations is sparse but contains dense blocks, a recently developed package for sparse Cholesky decomposition and sparse inversion called YAMS has greatly improved performance over packages where such blocks were treated as sparse. With YAMS, GREML and possibly single-step GREML can be applied to populations with >50 000 genotyped animals. From a computational perspective, genomic selection is becoming a mature methodology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号