首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 18 毫秒
1.

Background

The most common application of imputation is to infer genotypes of a high-density panel of markers on animals that are genotyped for a low-density panel. However, the increase in accuracy of genomic predictions resulting from an increase in the number of markers tends to reach a plateau beyond a certain density. Another application of imputation is to increase the size of the training set with un-genotyped animals. This strategy can be particularly successful when a set of closely related individuals are genotyped.

Methods

Imputation on completely un-genotyped dams was performed using known genotypes from the sire of each dam, one offspring and the offspring’s sire. Two methods were applied based on either allele or haplotype frequencies to infer genotypes at ambiguous loci. Results of these methods and of two available software packages were compared. Quality of imputation under different population structures was assessed. The impact of using imputed dams to enlarge training sets on the accuracy of genomic predictions was evaluated for different populations, heritabilities and sizes of training sets.

Results

Imputation accuracy ranged from 0.52 to 0.93 depending on the population structure and the method used. The method that used allele frequencies performed better than the method based on haplotype frequencies. Accuracy of imputation was higher for populations with higher levels of linkage disequilibrium and with larger proportions of markers with more extreme allele frequencies. Inclusion of imputed dams in the training set increased the accuracy of genomic predictions. Gains in accuracy ranged from close to zero to 37.14%, depending on the simulated scenario. Generally, the larger the accuracy already obtained with the genotyped training set, the lower the increase in accuracy achieved by adding imputed dams.

Conclusions

Whenever a reference population resembling the family configuration considered here is available, imputation can be used to achieve an extra increase in accuracy of genomic predictions by enlarging the training set with completely un-genotyped dams. This strategy was shown to be particularly useful for populations with lower levels of linkage disequilibrium, for genomic selection on traits with low heritability, and for species or breeds for which the size of the reference population is limited.  相似文献   

2.

Background

Next-generation sequencing techniques, such as genotyping-by-sequencing (GBS), provide alternatives to single nucleotide polymorphism (SNP) arrays. The aim of this work was to evaluate the potential of GBS compared to SNP array genotyping for genomic selection in livestock populations.

Methods

The value of GBS was quantified by simulation analyses in which three parameters were varied: (i) genome-wide sequence read depth (x) per individual from 0.01x to 20x or using SNP array genotyping; (ii) number of genotyped markers from 3000 to 300 000; and (iii) size of training and prediction sets from 500 to 50 000 individuals. The latter was achieved by distributing the total available x of 1000x, 5000x, or 10 000x per genotyped locus among the varying number of individuals. With SNP arrays, genotypes were called from sequence data directly. With GBS, genotypes were called from sequence reads that varied between loci and individuals according to a Poisson distribution with mean equal to x. Simulated data were analyzed with ridge regression and the accuracy and bias of genomic predictions and response to selection were quantified under the different scenarios.

Results

Accuracies of genomic predictions using GBS data or SNP array data were comparable when large numbers of markers were used and x per individual was ~1x or higher. The bias of genomic predictions was very high at a very low x. When the total available x was distributed among the training individuals, the accuracy of prediction was maximized when a large number of individuals was used that had GBS data with low x for a large number of markers. Similarly, response to selection was maximized under the same conditions due to increasing both accuracy and selection intensity.

Conclusions

GBS offers great potential for developing genomic selection in livestock populations because it makes it possible to cover large fractions of the genome and to vary the sequence read depth per individual. Thus, the accuracy of predictions is improved by increasing the size of training populations and the intensity of selection is increased by genotyping a larger number of selection candidates.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0102-z) contains supplementary material, which is available to authorized users.  相似文献   

3.
Accuracy of genomic selection in European maize elite breeding populations   总被引:1,自引:0,他引:1  
Genomic selection is a promising breeding strategy for rapid improvement of complex traits. The objective of our study was to investigate the prediction accuracy of genomic breeding values through cross validation. The study was based on experimental data of six segregating populations from a half-diallel mating design with 788 testcross progenies from an elite maize breeding program. The plants were intensively phenotyped in multi-location field trials and fingerprinted with 960 SNP markers. We used random regression best linear unbiased prediction in combination with fivefold cross validation. The prediction accuracy across populations was higher for grain moisture (0.90) than for grain yield (0.58). The accuracy of genomic selection realized for grain yield corresponds to the precision of phenotyping at unreplicated field trials in 3–4 locations. As for maize up to three generations are feasible per year, selection gain per unit time is high and, consequently, genomic selection holds great promise for maize breeding programs.  相似文献   

4.

Background

In crossbreeding programs, genomic selection offers the opportunity to make efficient use of information on crossbred (CB) individuals in the selection of purebred (PB) candidates. In such programs, reference populations often contain genotyped PB animals, although the breeding objective is usually more focused on CB performance. The question is what would be the benefit of including a larger proportion of CB individuals in the reference population.

Methods

In a deterministic simulation study, we evaluated the benefit of including various proportions of CB animals in a reference population for genomic selection of PB animals in a crossbreeding program. We used a pig breeding scheme with selection for a moderately heritable trait and a size of 6000 for the reference population.

Results

Applying genomic selection to improve the performance of CB individuals, with a genetic correlation between PB and CB performance (rPC) of 0.7, selection accuracy of PB candidates increased from 0.49 to 0.52 if the reference population consisted of PB individuals, it increased to 0.55 if the reference population consisted of the same number of CB individuals, and to 0.60 if the size of the CB reference population was twice that of the reference population for each PB line. The advantage of using CB rather than PB individuals increased linearly with the proportion of CB individuals in the reference population. This advantage disappeared quickly if rPC was higher or if the breeding objective put some emphasis on PB performance. The benefit of adding CB individuals to an existing PB reference population was limited for high rPC.

Conclusions

Using CB rather than PB individuals in a reference population for genomic selection can provide substantial advantages, but only when correlations between PB and CB performances are not high and PB performance is not part of the breeding objective.  相似文献   

5.
6.
7.
The selection coefficient, s, quantifies the strength of selection acting on a genetic variant. Despite this parameter's central importance to population genetic models, until recently we have known relatively little about the value of s in natural populations. With the development of molecular genetic techniques in the late 20th century and the sequencing technologies that followed, biologists are now able to identify genetic variants and directly relate them to organismal fitness. We reviewed the literature for published estimates of natural selection acting at the genetic level and found over 3000 estimates of selection coefficients from 79 studies. Selection coefficients were roughly exponentially distributed, suggesting that the impact of selection at the genetic level is generally weak but can occasionally be quite strong. We used both nonparametric statistics and formal random‐effects meta‐analysis to determine how selection varies across biological and methodological categories. Selection was stronger when measured over shorter timescales, with the mean magnitude of s greatest for studies that measured selection within a single generation. Our analyses found conflicting trends when considering how selection varies with the genetic scale (e.g., SNPs or haplotypes) at which it is measured, suggesting a need for further research. Besides these quantitative conclusions, we highlight key issues in the calculation, interpretation, and reporting of selection coefficients and provide recommendations for future research.  相似文献   

8.
This study evaluated different female-selective genotyping strategies to increase the predictive accuracy of genomic breeding values (GBVs) in populations that have a limited number of sires with a large number of progeny. A simulated dairy population was utilized to address the aims of the study. The following selection strategies were used: random selection, two-tailed selection by yield deviations, two-tailed selection by breeding value, top yield deviation selection and top breeding value selection. For comparison, two other strategies, genotyping of sires and pedigree indexes from traditional genetic evaluation, were included in the analysis. Two scenarios were simulated, low heritability (h2 = 0.10) and medium heritability (h2 = 0.30). GBVs were estimated using the Bayesian Lasso. The accuracy of predicted GBVs using the two-tailed strategies was better than the accuracy obtained using other strategies (0.50 and 0.63 for the two-tailed selection by yield deviations strategy and 0.48 and 0.63 for the two-tailed selection by breeding values strategy in low- and medium-heritability scenarios, respectively, using 1000 genotyped cows). When 996 genotyped bulls were used as the training population, the sire’ strategy led to accuracies of 0.48 and 0.55 for low- and medium-heritability traits, respectively. The Random strategies required larger training populations to outperform the accuracies of the pedigree index; however, selecting females from the top of the yield deviations or breeding values of the population did not improve accuracy relative to that of the pedigree index. Bias was found for all genotyping strategies considered, although the Top strategies produced the most biased predictions. Strategies that involve genotyping cows can be implemented in breeding programs that have a limited number of sires with a reliable progeny test. The results of this study showed that females that exhibited upper and lower extreme values within the distribution of yield deviations may be included as training population to increase reliability in small reference populations. The strategies that selected only the females that had high estimated breeding values or yield deviations produced suboptimal results.  相似文献   

9.
Riebler A  Held L  Stephan W 《Genetics》2008,178(3):1817-1829
We extend an F(st)-based Bayesian hierarchical model, implemented via Markov chain Monte Carlo, for the detection of loci that might be subject to positive selection. This model divides the F(st)-influencing factors into locus-specific effects, population-specific effects, and effects that are specific for the locus in combination with the population. We introduce a Bayesian auxiliary variable for each locus effect to automatically select nonneutral locus effects. As a by-product, the efficiency of the original approach is improved by using a reparameterization of the model. The statistical power of the extended algorithm is assessed with simulated data sets from a Wright-Fisher model with migration. We find that the inclusion of model selection suggests a clear improvement in discrimination as measured by the area under the receiver operating characteristic (ROC) curve. Additionally, we illustrate and discuss the quality of the newly developed method on the basis of an allozyme data set of the fruit fly Drosophila melanogaster and a sequence data set of the wild tomato Solanum chilense. For data sets with small sample sizes, high mutation rates, and/or long sequences, however, methods based on nucleotide statistics should be preferred.  相似文献   

10.
The choice of a probabilistic model to describe sequence evolution can and should be justified. Underfitting the data through the use of overly simplistic models may miss out on interesting phenomena and lead to incorrect inferences. Overfitting the data with models that are too complex may ascribe biological meaning to statistical artifacts and result in falsely significant findings. We describe a likelihood-based approach for evolutionary model selection. The procedure employs a genetic algorithm (GA) to quickly explore a combinatorially large set of all possible time-reversible Markov models with a fixed number of substitution rates. When applied to stem RNA data subject to well-understood evolutionary forces, the models found by the GA 1) capture the expected overall rate patterns a priori; 2) fit the data better than the best available models based on a priori assumptions, suggesting subtle substitution patterns not previously recognized; 3) cannot be rejected in favor of the general reversible model, implying that the evolution of stem RNA sequences can be explained well with only a few substitution rate parameters; and 4) perform well on simulated data, both in terms of goodness of fit and the ability to estimate evolutionary rates. We also investigate the utility of several distance measures for comparing and contrasting inferred evolutionary models. Using widely available small computer clusters, our approach allows, for the first time, to evaluate the performance of existing RNA evolutionary models by comparing them with a large pool of candidate models and to validate common modeling assumptions. In addition, the new method provides the foundation for rigorous selection and comparison of substitution models for other types of sequence data.  相似文献   

11.
Recent years have seen a surge of interest in linking the theories of kin selection and sexual selection. In particular, there is a growing appreciation that kin selection, arising through demographic factors such as sex‐biased dispersal, may modulate sexual conflicts, including in the context of male–female arms races characterized by coevolutionary cycles. However, evolutionary conflicts of interest need not only occur between individuals, but may also occur within individuals, and sex‐specific demography is known to foment such intragenomic conflict in relation to social behavior. Whether and how this logic holds in the context of sexual conflict—and, in particular, in relation to coevolutionary cycles—remains obscure. We develop a kin‐selection model to investigate the interests of different genes involved in sexual and intragenomic conflict, and we show that consideration of these conflicting interests yields novel predictions concerning parent‐of‐origin specific patterns of gene expression and the detrimental effects of different classes of mutation and epimutation at loci underpinning sexually selected phenotypes.  相似文献   

12.
Indian demographic history includes special features such as founder effects, interpopulation segregation, complex social structure with a caste system and elevated frequency of consanguineous marriages. It also presents a higher frequency for some rare mendelian disorders and in the last two decades increased prevalence of some complex disorders. Despite the fact that India represents about one-sixth of the human population, deep genetic studies from this terrain have been scarce. In this study, we analyzed high-density genotyping and whole-exome sequencing data of a North and a South Indian population. Indian populations show higher differentiation levels than those reported between populations of other continents. In this work, we have analyzed its consequences, by specifically assessing the transferability of genetic markers from or to Indian populations. We show that there is limited genetic marker portability from available genetic resources such as HapMap or the 1,000 Genomes Project to Indian populations, which also present an excess of private rare variants. Conversely, tagSNPs show a high level of portability between the two Indian populations, in contrast to the common belief that North and South Indian populations are genetically very different. By estimating kinship from mates and consanguinity in our data from trios, we also describe different patterns of assortative mating and inbreeding in the two populations, in agreement with distinct mating preferences and social structures. In addition, this analysis has allowed us to describe genomic regions under recent adaptive selection, indicating differential adaptive histories for North and South Indian populations. Our findings highlight the importance of considering demography for design and analysis of genetic studies, as well as the need for extending human genetic variation catalogs to new populations and particularly to those with particular demographic histories.  相似文献   

13.
An efficient algorithm for genomic selection of moderately sized populations based on single nucleotide polymorphism chip technology is described. A total of 995 Israeli Holstein bulls with genetic evaluations based on daughter records were genotyped for either the BovineSNP50 BeadChip or the BovineSNP50 v2 BeadChip. Milk, fat, protein, somatic cell score, female fertility, milk production persistency and herd-life were analyzed. The 400 markers with the greatest effects on each trait were first selected based on individual analysis of each marker with the genetic evaluations of the bulls as the dependent variable. The effects of all 400 markers were estimated jointly using a 'cow model,' estimated from the data truncated to exclude lactations with freshening dates after September 2006. Genotype probabilities for each locus were computed for all animals with missing genotypes. In Method I, genetic evaluations were computed by analysis of the truncated data set with the sum of the marker effects subtracted from each record. Genomic estimated breeding values for the young bulls with genotypes, but without daughter records, were then computed as their parent averages combined with the sum of each animal's marker effects. Method II genomic breeding values were computed based on regressions of estimated breeding values of bulls with daughter record on their parent averages, sum of marker effects and birth year. Method II correlations of the current breeding values of young bulls without daughter records in the truncated data set were higher than the correlations of the current breeding values with the parent averages for fat and protein production, persistency and herd-life. Bias of evaluations, estimated as a difference between the mean of current breeding values of the young bulls and their genomic evaluations, was reduced for milk production traits, persistency and herd-life. Bias for milk production traits was slightly negative, as opposed to the positive bias of parent averages. Correlations of Method II with the means of daughter records adjusted for fixed effects were higher than parent averages for fat, protein, fertility, persistency and herd-life. Reducing the number of markers included in the analysis from 400 to 300 did not reduce correlations of genomic breeding values for protein with current breeding values, but did slightly reduce correlations with means of daughter records. Method II has the advantages as compared with the method of VanRaden in that genotypes of cows can be readily incorporated into the Method II analysis, and it is more effective for moderately sized populations.  相似文献   

14.
Self-fertilization (selfing) is commonly used for population development in plant breeding, and it is well established that selfing increases genetic variance between lines, thus increasing response to phenotypic selection. Furthermore, numerous studies have explored how selfing can be deployed to maximal benefit in the context of traditional plant breeding programs (Cornish in Heredity 65:201–211,1990a, Heredity 65:213–220,1990b; Liu et al. in Theor Appl Genet 109:370–376, 2004; Pooni and Jinks in Heredity 54:255–260, 1985). However, the impact of selfing on response to genomic selection has not been explored. In the current study we examined how selfing impacts the two key aspects of genomic selection—GEBV prediction (training) and selection response. We reach the following conclusions: (1) On average, selfing increases genomic selection gains by more than 70 %. (2) The gains in genomic selection response attributable to selfing hold over a wide range population sizes (100–500), heritabilities (0.2–0.8), and selection intensities (0.01–0.1). However, the benefits of selfing are dramatically reduced as the number of QTLs drops below 20. (3) The major cause of the improved response to genomic selection with selfing is through an increase in the occurrence of superior genotypes and not through improved GEBV predictions. While performance of the training population improves with selfing (especially with low heritability and small population sizes), the magnitude of these improvements is relatively small compared with improvements observed in the selection population. To illustrate the value of these insights, we propose a practical genomic selection scheme that substantially shortens the number of generations required to fully capture the benefits of selfing. Specifically, we provide simulation evidence that indicates the proposed scheme matches or exceeds the selection gains observed in advanced populations (i.e. F 8 and doubled haploid) across a broad range of heritability and QTL models. Without sacrificing selection gains, we also predict that fully inbred candidates for potential commercialization can be identified as early as the F 4 generation.  相似文献   

15.
The estimation of quantitative genetic parameters in wild populations is generally limited by the accuracy and completeness of the available pedigree information. Using relatedness at genomewide markers can potentially remove this limitation and lead to less biased and more precise estimates. We estimated heritability, maternal genetic effects and genetic correlations for body size traits in an unmanaged long‐term study population of Soay sheep on St Kilda using three increasingly complete and accurate estimates of relatedness: (i) Pedigree 1, using observation‐derived maternal links and microsatellite‐derived paternal links; (ii) Pedigree 2, using SNP‐derived assignment of both maternity and paternity; and (iii) whole‐genome relatedness at 37 037 autosomal SNPs. In initial analyses, heritability estimates were strikingly similar for all three methods, while standard errors were systematically lower in analyses based on Pedigree 2 and genomic relatedness. Genetic correlations were generally strong, differed little between the three estimates of relatedness and the standard errors declined only very slightly with improved relatedness information. When partitioning maternal effects into separate genetic and environmental components, maternal genetic effects found in juvenile traits increased substantially across the three relatedness estimates. Heritability declined compared to parallel models where only a maternal environment effect was fitted, suggesting that maternal genetic effects are confounded with direct genetic effects and that more accurate estimates of relatedness were better able to separate maternal genetic effects from direct genetic effects. We found that the heritability captured by SNP markers asymptoted at about half the SNPs available, suggesting that denser marker panels are not necessarily required for precise and unbiased heritability estimates. Finally, we present guidelines for the use of genomic relatedness in future quantitative genetics studies in natural populations.  相似文献   

16.

Background  

Automatic protein modelling pipelines are becoming ever more accurate; this has come hand in hand with an increasingly complicated interplay between all components involved. Nevertheless, there are still potential improvements to be made in template selection, refinement and protein model selection.  相似文献   

17.
This study investigated the potential application of genomic selection under a multi-breed scheme in the Spanish autochthonous beef cattle populations using a simulation study that replicates the structure of linkage disequilibrium obtained from a sample of 25 triplets of sire/dam/offspring per population and using the BovineHD Beadchip. Purebred and combined reference sets were used for the genomic evaluation and several scenarios of different genetic architecture of the trait were investigated. The single-breed evaluations yielded the highest within-breed accuracies. Across breed accuracies were found low but positive on average confirming the genetic connectedness between the populations. If the same genotyping effort is split in several populations, the accuracies were lower when compared with single-breed evaluation, but showed a small advantage over small-sized purebred reference sets over the accuracies of subsequent generations. Besides, the genetic architecture of the trait did not show any relevant effect on the accuracy with the exception of rare variants, which yielded slightly lower results and higher loss of predictive ability over the generations.  相似文献   

18.

Key message

Genomic prediction models for multi-year dry matter yield, via genotyping-by-sequencing in a composite training set, demonstrate potential for genetic gain improvement through within-half sibling family selection.

Abstract

Perennial ryegrass (Lolium perenne L.) is a key source of nutrition for ruminant livestock in temperate environments worldwide. Higher seasonal and annual yield of herbage dry matter (DMY) is a principal breeding objective but the historical realised rate of genetic gain for DMY is modest. Genomic selection was investigated as a tool to enhance the rate of genetic gain. Genotyping-by-sequencing (GBS) was undertaken in a multi-population (MP) training set of five populations, phenotyped as half-sibling (HS) families in five environments over 2 years for mean herbage accumulation (HA), a measure of DMY potential. GBS using the ApeKI enzyme yielded 1.02 million single-nucleotide polymorphism (SNP) markers from a training set of n = 517. MP-based genomic prediction models for HA were effective in all five populations, cross-validation-predictive ability (PA) ranging from 0.07 to 0.43, by trait and target population, and 0.40–0.52 for days-to-heading. Best linear unbiased predictor (BLUP)-based prediction methods, including GBLUP with either a standard or a recently developed (KGD) relatedness estimation, were marginally superior or equal to ridge regression and random forest computational approaches. PA was principally an outcome of SNP modelling genetic relationships between training and validation sets, which may limit application for long-term genomic selection, due to PA decay. However, simulation using data from the training experiment indicated a twofold increase in genetic gain for HA, when applying a prediction model with moderate PA in a single selection cycle, by combining among-HS family selection, based on phenotype, with within-HS family selection using genomic prediction.
  相似文献   

19.

Key message

We propose a statistical criterion to optimize multi-environment trials to predict genotype × environment interactions more efficiently, by combining crop growth models and genomic selection models.

Abstract

Genotype × environment interactions (GEI) are common in plant multi-environment trials (METs). In this context, models developed for genomic selection (GS) that refers to the use of genome-wide information for predicting breeding values of selection candidates need to be adapted. One promising way to increase prediction accuracy in various environments is to combine ecophysiological and genetic modelling thanks to crop growth models (CGM) incorporating genetic parameters. The efficiency of this approach relies on the quality of the parameter estimates, which depends on the environments composing this MET used for calibration. The objective of this study was to determine a method to optimize the set of environments composing the MET for estimating genetic parameters in this context. A criterion called OptiMET was defined to this aim, and was evaluated on simulated and real data, with the example of wheat phenology. The MET defined with OptiMET allowed estimating the genetic parameters with lower error, leading to higher QTL detection power and higher prediction accuracies. MET defined with OptiMET was on average more efficient than random MET composed of twice as many environments, in terms of quality of the parameter estimates. OptiMET is thus a valuable tool to determine optimal experimental conditions to best exploit MET and the phenotyping tools that are currently developed.
  相似文献   

20.
Bijma P  Woolliams JA 《Genetics》1999,151(3):1197-1210
A method to predict long-term genetic contributions of ancestors to future generations is studied in detail for a population with overlapping generations under mass or sib index selection. An existing method provides insight into the mechanisms determining the flow of genes through selected populations, and takes account of selection by modeling the long-term genetic contribution as a linear regression on breeding value. Total genetic contributions of age classes are modeled using a modified gene flow approach and long-term predictions are obtained assuming equilibrium genetic parameters. Generation interval was defined as the time in which genetic contributions sum to unity, which is equal to the turnover time of genes. Accurate predictions of long-term genetic contributions of individual animals, as well as total contributions of age classes were obtained. Due to selection, offspring of young parents had an above-average breeding value. Long-term genetic contributions of youngest age classes were therefore higher than expected from the age class distribution of parents, and generation interval was shorter than the average age of parents at birth of their offspring. Due to an increased selective advantage of offspring of young parents, generation interval decreased with increasing heritability and selection intensity. The method was compared to conventional gene flow and showed more accurate predictions of long-term genetic contributions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号