首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R2) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R2 based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1−b)2, where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R2. However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R2. Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R2 may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects.  相似文献   

2.
Prediction accuracies of estimated breeding values for economically important traits are expected to benefit from genomic information. Single nucleotide polymorphism (SNP) panels used in genomic prediction are increasing in density, but the Markov Chain Monte Carlo (MCMC) estimation of SNP effects can be quite time consuming or slow to converge when a large number of SNPs are fitted simultaneously in a linear mixed model. Here we present an EM algorithm (termed “fastBayesA”) without MCMC. This fastBayesA approach treats the variances of SNP effects as missing data and uses a joint posterior mode of effects compared to the commonly used BayesA which bases predictions on posterior means of effects. In each EM iteration, SNP effects are predicted as a linear combination of best linear unbiased predictions of breeding values from a mixed linear animal model that incorporates a weighted marker-based realized relationship matrix. Method fastBayesA converges after a few iterations to a joint posterior mode of SNP effects under the BayesA model. When applied to simulated quantitative traits with a range of genetic architectures, fastBayesA is shown to predict GEBV as accurately as BayesA but with less computing effort per SNP than BayesA. Method fastBayesA can be used as a computationally efficient substitute for BayesA, especially when an increasing number of markers bring unreasonable computational burden or slow convergence to MCMC approaches.  相似文献   

3.
4.
Prediction in mixed linear models by Henderson 's (1972) BLUP (Best Linear Unbiased Prediction) requires knowledge of the underlying variance/covariance components to have the property ‘best’. In breeding value prediction these parameters are not known, generally. They have to be replaced by estimations and BLUP becomes estimated BLUP (EBLUP). The aim of this investigation was the evaluation of EBLUP with help of a designed simulation experiment. Criteria used for the evaluation were the mean squared error (MSE) and the (genetic) selection differential (GSD). Besides, an idea of the overestimation of the accuracy of EBLUP by the naive MSE approximation based on the MSE formulas of BLUP with variance component estimations instead of unknown parameters is given.  相似文献   

5.
The availability of dense molecular markers has made possible the use of genomic selection (GS) for plant breeding. However, the evaluation of models for GS in real plant populations is very limited. This article evaluates the performance of parametric and semiparametric models for GS using wheat (Triticum aestivum L.) and maize (Zea mays) data in which different traits were measured in several environmental conditions. The findings, based on extensive cross-validations, indicate that models including marker information had higher predictive ability than pedigree-based models. In the wheat data set, and relative to a pedigree model, gains in predictive ability due to inclusion of markers ranged from 7.7 to 35.7%. Correlation between observed and predictive values in the maize data set achieved values up to 0.79. Estimates of marker effects were different across environmental conditions, indicating that genotype × environment interaction is an important component of genetic variability. These results indicate that GS in plant breeding can be an effective strategy for selecting among lines whose phenotypes have yet to be observed.PEDIGREE-BASED prediction of genetic values based on the additive infinitesimal model (Fisher 1918) has played a central role in genetic improvement of complex traits in plants and animals. Animal breeders have used this model for predicting breeding values either in a mixed model (best linear unbiased prediction, BLUP) (Henderson 1984) or in a Bayesian framework (Gianola and Fernando 1986). More recently, plant breeders have incorporated pedigree information into linear mixed models for predicting breeding values (Crossa et al. 2006, 2007; Oakey et al. 2006; Burgueño et al. 2007; Piepho et al. 2007).The availability of thousands of genome-wide molecular markers has made possible the use of genomic selection (GS) for prediction of genetic values (Meuwissen et al. 2001) in plants (e.g., Bernardo and Yu 2007; Piepho 2009; Jannink et al. 2010) and animals (Gonzalez-Recio et al. 2008; VanRaden et al. 2008; Hayes et al. 2009; de los Campos et al. 2009a). Implementing GS poses several statistical and computational challenges, such as how models can cope with the curse of dimensionality, colinearity between markers, or the complexity of quantitative traits. Parametric (e.g., Meuwissen et al. 2001) and semiparametric (e.g., Gianola et al. 2006; Gianola and van Kaam 2008) methods address these problems differently.In standard genetic models, phenotypic outcomes, , are viewed as the sum of a genetic value, , and a model residual, ; that is, . In parametric models for GS, is described as a regression on marker covariates (j = 1,  …  , p molecular markers) of the form , such that(or , in matrix notation), where is the regression of on the jth marker covariate .Estimation of via multiple regression by ordinary least squares (OLS) is not feasible when p > n. A commonly used alternative is to estimate marker effects jointly using penalized methods such as ridge regression (Hoerl and Kennard 1970) or the Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani 1996) or their Bayesian counterpart. This approach yields greater accuracy of estimated genetic values and can be coupled with geostatistical techniques commonly used in plant breeding to model multienvironments trials (Piepho 2009).In ridge regression (or its Bayesian counterpart) the extent of shrinkage is homogeneous across markers, which may not be appropriate if some markers are located in regions that are not associated with genetic variance, while markers in other regions may be linked to QTL (Goddard and Hayes 2007). To overcome this limitation, many authors have proposed methods that use marker-specific shrinkage. In a Bayesian setting, this can be implemented using priors of marker effects that are mixtures of scaled-normal densities. Examples of this are methods Bayes A and Bayes B of Meuwissen et al. (2001) and the Bayesian LASSO of Park and Casella (2008).An alternative to parametric regressions is to use semiparametric methods such as reproducing kernel Hilbert spaces (RKHS) regression (Gianola and van Kaam 2008). The Bayesian RKHS regression regards genetic values as random variables coming from a Gaussian process centered at zero and with a (co)variance structure that is proportional to a kernel matrix K (de los Campos et al. 2009b); that is, , where , are vectors of marker genotypes for the ith and jth individuals, respectively, and is a positive definite function evaluated in marker genotypes. In a finite-dimensional setting this amounts to modeling the vector of genetic values, , as multivariate normal; that is, where is a variance parameter. One of the most attractive features of RKHS regression is that the methodology can be used with almost any information set (e.g., covariates, strings, images, graphs). A second advantage is that with RKHS the model is represented in terms of n unknowns, which gives RKHS a great computational advantage relative to some parametric methods, especially when pn.This study presents an evaluation of several methods for GS, using two extensive data sets. One contains phenotypic records of a series of wheat trials and recently generated genomic data. The other data set pertains to international maize trials in which different traits were measured in maize lines evaluated under severe drought and well-watered conditions.  相似文献   

6.
In plant and animal breeding studies a distinction is made between the genetic value (additive plus epistatic genetic effects) and the breeding value (additive genetic effects) of an individual since it is expected that some of the epistatic genetic effects will be lost due to recombination. In this article, we argue that the breeder can take advantage of the epistatic marker effects in regions of low recombination. The models introduced here aim to estimate local epistatic line heritability by using genetic map information and combining local additive and epistatic effects. To this end, we have used semiparametric mixed models with multiple local genomic relationship matrices with hierarchical designs. Elastic-net postprocessing was used to introduce sparsity. Our models produce good predictive performance along with useful explanatory information.  相似文献   

7.
Multilocation trials are often used to analyse the adaptability of genotypes in different environments and to find for each environment the genotype that is best adapted; i.e. that is highest yielding in that environment. For this purpose, it is of interest to obtain a reliable estimate of the mean yield of a cultivar in a given environment. This article compares two different statistical estimation procedures for this task: the Additive Main Effects and Multiplicative Interaction (AMMI) analysis and Best Linear Unbiased Prediction (BLUP). A modification of a cross validation procedure commonly used with AMMI is suggested for trials that are laid out as a randomized complete block design. The use of these procedure is exemplified using five faba bean datasets from German registration trails. BLUP was found to outperform AMMI in four of five faba bean datasets.  相似文献   

8.
Use of Multiple Genetic Markers in Prediction of Breeding Values   总被引:13,自引:4,他引:13       下载免费PDF全文
Genotypes at a marker locus give information on transmission of genes from parents to offspring and that information can be used in predicting the individuals' additive genetic value at a linked quantitative trait locus (MQTL). In this paper a recursive method is presented to build the gametic relationship matrix for an autosomal MQTL which requires knowledge on recombination rate between the marker locus and the MQTL linked to it. A method is also presented to obtain the inverse of the gametic relationship matrix. This information can be used in a mixed linear model for simultaneous evaluation of fixed effects, gametic effects at the MQTL and additive genetic effects due to quantitative trait loci unlinked to the marker locus (polygenes). An equivalent model can be written at the animal level using the numerator relationship matrix for the MQTL and a method for obtaining the inverse of this matrix is presented. Information on several unlinked marker loci, each of them linked to a different locus affecting the trait of interest, can be used by including an effect for each MQTL. The number of equations per animal in this case is 2m + 1 where m is the number of MQTL. A method is presented to reduce the number of equations per animal to one by combining information on all MQTL and polygenes into one numerator relationship matrix. It is illustrated how the method can accommodate individuals with partial or no marker information. Numerical examples are given to illustrate the methods presented. Opportunities to use the presented model in constructing genetic maps are discussed.  相似文献   

9.
Maize (Zea mays L.) serves as model plant for heterosis research and is the crop where hybrid breeding was pioneered. We analyzed genomic and phenotypic data of 1254 hybrids of a typical maize hybrid breeding program based on the important Dent × Flint heterotic pattern. Our main objectives were to investigate genome properties of the parental lines (e.g., allele frequencies, linkage disequilibrium, and phases) and examine the prospects of genomic prediction of hybrid performance. We found high consistency of linkage phases and large differences in allele frequencies between the Dent and Flint heterotic groups in pericentromeric regions. These results can be explained by the Hill–Robertson effect and support the hypothesis of differential fixation of alleles due to pseudo-overdominance in these regions. In pericentromeric regions we also found indications for consistent marker–QTL linkage between heterotic groups. With prediction methods GBLUP and BayesB, the cross-validation prediction accuracy ranged from 0.75 to 0.92 for grain yield and from 0.59 to 0.95 for grain moisture. The prediction accuracy of untested hybrids was highest, if both parents were parents of other hybrids in the training set, and lowest, if none of them were involved in any training set hybrid. Optimizing the composition of the training set in terms of number of lines and hybrids per line could further increase prediction accuracy. We conclude that genomic prediction facilitates a paradigm shift in hybrid breeding by focusing on the performance of experimental hybrids rather than the performance of parental lines in testcrosses.  相似文献   

10.
The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict) originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics.  相似文献   

11.
The application of quantitative genetics in plant and animal breeding has largely focused on additive models, which may also capture dominance and epistatic effects. Partitioning genetic variance into its additive and nonadditive components using pedigree-based models (P-genomic best linear unbiased predictor) (P-BLUP) is difficult with most commonly available family structures. However, the availability of dense panels of molecular markers makes possible the use of additive- and dominance-realized genomic relationships for the estimation of variance components and the prediction of genetic values (G-BLUP). We evaluated height data from a multifamily population of the tree species Pinus taeda with a systematic series of models accounting for additive, dominance, and first-order epistatic interactions (additive by additive, dominance by dominance, and additive by dominance), using either pedigree- or marker-based information. We show that, compared with the pedigree, use of realized genomic relationships in marker-based models yields a substantially more precise separation of additive and nonadditive components of genetic variance. We conclude that the marker-based relationship matrices in a model including additive and nonadditive effects performed better, improving breeding value prediction. Moreover, our results suggest that, for tree height in this population, the additive and nonadditive components of genetic variance are similar in magnitude. This novel result improves our current understanding of the genetic control and architecture of a quantitative trait and should be considered when developing breeding strategies.  相似文献   

12.
Practical application of genomic-based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS) method. Celiac disease (CD), a common immune-mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European cohorts, we provide a proof-of-concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models of CD based on genome-wide SNP profiles. The high predictive capacity replicated both in cross-validation within each cohort (AUC of 0.87–0.89) and in independent replication across cohorts (AUC of 0.86–0.9), despite differences in ethnicity. The models explained 30–35% of disease variance and up to ∼43% of heritability. The GRS''s utility was assessed in different clinically relevant settings. Comparable to HLA typing, the GRS can be used to identify individuals without CD with ≥99.6% negative predictive value however, unlike HLA typing, fine-scale stratification of individuals into categories of higher-risk for CD can identify those that would benefit from more invasive and costly definitive testing. The GRS is flexible and its performance can be adapted to the clinical situation by adjusting the threshold cut-off. Despite explaining a minority of disease heritability, our findings indicate a genomic risk score provides clinically relevant information to improve upon current diagnostic pathways for CD and support further studies evaluating the clinical utility of this approach in CD and other complex diseases.  相似文献   

13.
14.

Background

Repeated exposure to certain low molecular weight (LMW) chemical compounds may result in development of allergic reactions in the skin or in the respiratory tract. In most cases, a certain LMW compound selectively sensitize the skin, giving rise to allergic contact dermatitis (ACD), or the respiratory tract, giving rise to occupational asthma (OA). To limit occurrence of allergic diseases, efforts are currently being made to develop predictive assays that accurately identify chemicals capable of inducing such reactions. However, while a few promising methods for prediction of skin sensitization have been described, to date no validated method, in vitro or in vivo, exists that is able to accurately classify chemicals as respiratory sensitizers.

Results

Recently, we presented the in vitro based Genomic Allergen Rapid Detection (GARD) assay as a novel testing strategy for classification of skin sensitizing chemicals based on measurement of a genomic biomarker signature. We have expanded the applicability domain of the GARD assay to classify also respiratory sensitizers by identifying a separate biomarker signature containing 389 differentially regulated genes for respiratory sensitizers in comparison to non-respiratory sensitizers. By using an independent data set in combination with supervised machine learning, we validated the assay, showing that the identified genomic biomarker is able to accurately classify respiratory sensitizers.

Conclusions

We have identified a genomic biomarker signature for classification of respiratory sensitizers. Combining this newly identified biomarker signature with our previously identified biomarker signature for classification of skin sensitizers, we have developed a novel in vitro testing strategy with a potent ability to predict both skin and respiratory sensitization in the same sample.  相似文献   

15.
Switchgrass (Panicum virgatum L.) is a perennial grass undergoing development as a biofuel feedstock. One of the most important factors hindering breeding efforts in this species is the need for accurate measurement of biomass yield on a per-hectare basis. Genomic selection on simple-to-measure traits that approximate biomass yield has the potential to significantly speed up the breeding cycle. Recent advances in switchgrass genomic and phenotypic resources are now making it possible to evaluate the potential of genomic selection of such traits. We leveraged these resources to study the ability of three widely-used genomic selection models to predict phenotypic values of morphological and biomass quality traits in an association panel consisting of predominantly northern adapted upland germplasm. High prediction accuracies were obtained for most of the traits, with standability having the highest ten-fold cross validation prediction accuracy (0.52). Moreover, the morphological traits generally had higher prediction accuracies than the biomass quality traits. Nevertheless, our results suggest that the quality of current genomic and phenotypic resources available for switchgrass is sufficiently high for genomic selection to significantly impact breeding efforts for biomass yield.  相似文献   

16.
Although animal breeding was practiced long before the science of genetics and the relevant disciplines of population and quantitative genetics were known, breeding programs have mainly relied on simply selecting and mating the best individuals on their own or relatives’ performance. This is based on sound quantitative genetic principles, developed and expounded by Lush, who attributed much of his understanding to Wright, and formalized in Fisher’s infinitesimal model. Analysis at the level of individual loci and gene frequency distributions has had relatively little impact. Now with access to genomic data, a revolution in which molecular information is being used to enhance response with “genomic selection” is occurring. The predictions of breeding value still utilize multiple loci throughout the genome and, indeed, are largely compatible with additive and specifically infinitesimal model assumptions. I discuss some of the history and genetic issues as applied to the science of livestock improvement, which has had and continues to have major spin-offs into ideas and applications in other areas.THE success of breeders in effecting immense changes in domesticated animals and plants greatly influenced Darwin’s insight into the power of selection and implications to evolution by natural selection. Following the Mendelian rediscovery, attempts were soon made to accommodate within the particulate Mendelian framework the continuous nature of many traits and the observation by Galton (1889) of a linear regression of an individual’s height on that of a relative, with the slope dependent on degree of relationship. A polygenic Mendelian model was first proposed by Yule (1902) (see Provine 1971; Hill 1984). After input from Pearson, Yule again, and Weinberg (who developed the theory a long way but whose work was ignored), its first full exposition in modern terms was by Ronald A. Fisher (1918) (biography by Box 1978). His analysis of variance partitioned the genotypic variance into additive, dominance and epistatic components. Sewall Wright (biography by Provine 1986) had by then developed the path coefficient method and subsequently (Wright 1921) showed how to compute inbreeding and relationship coefficients and their consequent effects on genetic variation of additive traits. His approach to relationship in terms of the correlation of uniting gametes may be less intuitive at the individual locus level than Malécot’s (1948) subsequent treatment in terms of identity by descent, but it transfers directly to the correlation of relatives for quantitative traits with additive effects.From these basic findings, the science of animal breeding was largely developed and expounded by Jay L. Lush (1896–1982) (see also commentaries by Chapman 1987 and Ollivier 2008). He was from a farming family and became interested in genetics as an undergraduate at Kansas State. Although his master’s degree was in genetics, his subsequent Ph.D. at the University of Wisconsin was in animal reproductive physiology. Following 8 years working in animal breeding at the University of Texas he went to Iowa State College (now University) in Ames in 1930. Wright was Lush’s hero: ‘I wish to acknowledge especially my indebtedness to Sewall Wright for many published and unpublished ideas upon which I have drawn, and for his friendly counsel” (Lush 1945, in the preface to his book Animal Breeding Plans). Lush commuted in 1931 to the University of Chicago to audit Sewall Wright’s course in statistical genetics and consult him. Speaking at the Poultry Breeders Roundtable in 1969: he said, “Those were by far the most fruitful 10 weeks I ever had.” (Chapman 1987, quoting A. E. Freeman). Lush was also exposed to and assimilated the work and ideas of R. A. Fisher, who lectured at Iowa State through the summers of 1931 and 1936 at the behest of G. W. Snedecor.Here I review Lush’s contributions and then discuss how animal breeding theory and methods have subsequently evolved. They have been based mainly on statistical methodology, supported to some extent by experiment and population genetic theory. Recently, the development of genomic methods and their integration into classical breeding theory has opened up ways to greatly enhance rates of genetic improvement. Lush focused on livestock improvement and spin-off into other areas was coincidental; but he had contact with corn breeders in Ames and beyond and made contributions to evolutionary biology and human genetics mainly through his developments in theory (e.g., Falconer 1965; Robertson 1966; Lande 1976, 1979; see also Hill and Kirkpatrick 2010). I make no attempt to be comprehensive, not least in choice of citations.  相似文献   

17.
18.
Ignacy Misztal 《Genetics》2016,202(2):401-409
Many computations with SNP data including genomic evaluation, parameter estimation, and genome-wide association studies use an inverse of the genomic relationship matrix. The cost of a regular inversion is cubic and is prohibitively expensive for large matrices. Recent studies in cattle demonstrated that the inverse can be computed in almost linear time by recursion on any subset of ∼10,000 individuals. The purpose of this study is to present a theory of why such a recursion works and its implication for other populations. Assume that, because of a small effective population size, the additive information in a genotyped population has a small dimensionality, even with a very large number of SNP markers. That dimensionality is visible as a limited number of effective SNP effects, independent chromosome segments, or the rank of the genomic relationship matrix. Decompose a population arbitrarily into core and noncore individuals, with the number of core individuals equal to that dimensionality. Then, breeding values of noncore individuals can be derived by recursions on breeding values of core individuals, with coefficients of the recursion computed from the genomic relationship matrix. A resulting algorithm for the inversion called “algorithm for proven and young” (APY) has a linear computing and memory cost for noncore animals. Noninfinitesimal genetic architecture can be accommodated through a trait-specific genomic relationship matrix, possibly derived from Bayesian regressions. For populations with small effective population size, the inverse of the genomic relationship matrix can be computed inexpensively for a very large number of genotyped individuals.  相似文献   

19.
Structural variation (SV) has been reported to be associated with numerous diseases such as cancer. With the advent of next generation sequencing (NGS) technologies, various types of SV can be potentially identified. We propose a model based clustering approach utilizing a set of features defined for each type of SV events. Our method, termed SVMiner, not only provides a probability score for each candidate, but also predicts the heterozygosity of genomic deletions. Extensive experiments on genome-wide deep sequencing data have demonstrated that SVMiner is robust against the variability of a single cluster feature, and it significantly outperforms several commonly used SV detection programs. SVMiner can be downloaded from http://cbc.case.edu/svminer/.  相似文献   

20.
Germ Cell Tumors (GCT) have a high cure rate, but we currently lack the ability to accurately identify the small subset of patients who will die from their disease. We used a combined genomic and expression profiling approach to identify genomic regions and underlying genes that are predictive of outcome in GCT patients. We performed array-based comparative genomic hybridization (CGH) on 53 non-seminomatous GCTs (NSGCTs) treated with cisplatin based chemotherapy and defined altered genomic regions using Circular Binary Segmentation. We identified 14 regions associated with two year disease-free survival (2yDFS) and 16 regions associated with five year disease-specific survival (5yDSS). From corresponding expression data, we identified 101 probe sets that showed significant changes in expression. We built several models based on these differentially expressed genes, then tested them in an independent validation set of 54 NSGCTs. These predictive models correctly classified outcome in 64–79.6% of patients in the validation set, depending on the endpoint utilized. Survival analysis demonstrated a significant separation of patients with good versus poor predicted outcome when using a combined gene set model. Multivariate analysis using clinical risk classification with the combined gene model indicated that they were independent prognostic markers. This novel set of predictive genes from altered genomic regions is almost entirely independent of our previously identified set of predictive genes for patients with NSGCTs. These genes may aid in the identification of the small subset of patients who are at high risk of poor outcome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号