首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The availability of high density panels of molecular markers has prompted the adoption of genomic selection (GS) methods in animal and plant breeding. In GS, parametric, semi-parametric and non-parametric regressions models are used for predicting quantitative traits. This article shows how to use neural networks with radial basis functions (RBFs) for prediction with dense molecular markers. We illustrate the use of the linear Bayesian LASSO regression model and of two non-linear regression models, reproducing kernel Hilbert spaces (RKHS) regression and radial basis function neural networks (RBFNN) on simulated data and real maize lines genotyped with 55,000 markers and evaluated for several trait-environment combinations. The empirical results of this study indicated that the three models showed similar overall prediction accuracy, with a slight and consistent superiority of RKHS and RBFNN over the additive Bayesian LASSO model. Results from the simulated data indicate that RKHS and RBFNN models captured epistatic effects; however, adding non-signal (redundant) predictors (interaction between markers) can adversely affect the predictive accuracy of the non-linear regression models.  相似文献   

2.
Accuracy of prediction of yet-to-be observed phenotypes for food conversion rate (FCR) in broilers was studied in a genome-assisted selection context. Data consisted of FCR measured on the progeny of 394 sires with SNP information. A Bayesian regression model (Bayes A) and a semi-parametric approach (Reproducing kernel Hilbert Spaces regression, RKHS) using all available SNPs (p = 3481) were compared with a standard linear model in which future performance was predicted using pedigree indexes in the absence of genomic data. The RKHS regression was also tested on several sets of pre-selected SNPs (p = 400) using alternative measures of the information gain provided by the SNPs. All analyses were performed using 333 genotyped sires as training set, and predictions were made on 61 birds as testing set, which were sons of sires in the training set. Accuracy of prediction was measured as the Spearman correlation (r¯S) between observed and predicted phenotype, with its confidence interval assessed through a bootstrap approach. A large improvement of genome-assisted prediction (up to an almost 4-fold increase in accuracy) was found relative to pedigree index. Bayes A and RKHS regression were equally accurate (r¯S = 0.27) when all 3481 SNPs were included in the model. However, RKHS with 400 pre-selected informative SNPs was more accurate than Bayes A with all SNPs.  相似文献   

3.
Predictive ability of models for litter size in swine on the basis of different sources of genetic information was investigated. Data represented average litter size on 2598, 1604 and 1897 60K genotyped sows from two purebred and one crossbred line, respectively. The average correlation (r) between observed and predicted phenotypes in a 10-fold cross-validation was used to assess predictive ability. Models were: pedigree-based mixed-effects model (PED), Bayesian ridge regression (BRR), Bayesian LASSO (BL), genomic BLUP (GBLUP), reproducing kernel Hilbert spaces regression (RKHS), Bayesian regularized neural networks (BRNN) and radial basis function neural networks (RBFNN). BRR and BL used the marker matrix or its principal component scores matrix (UD) as covariates; RKHS employed a Gaussian kernel with additive codes for markers whereas neural networks employed the additive genomic relationship matrix (G) or UD as inputs. The non-parametric models (RKHS, BRNN, RNFNN) gave similar predictions to the parametric counterparts (average r ranged from 0.15 to 0.23); most of the genome-based models outperformed PED (r = 0.16). Predictive abilities of linear models and RKHS were similar over lines, but BRNN varied markedly, giving the best prediction (r = 0.31) when G was used in crossbreds, but the worst (r = 0.02) when the G matrix was used in one of the purebred lines. The r values for RBFNN ranged from 0.16 to 0.23. Predictive ability was better in crossbreds (0.26) than in purebreds (0.15 to 0.22). This may be related to family structure in the purebred lines.  相似文献   

4.
The aim of the study was to infer (co)variance components for daily milk yield, fat and protein contents, and somatic cell score (SCS) in Burlina cattle (a local breed in northeast Italy). Data consisted of 13576 monthly test-day records of 666 cows (parities 1 to 8) collected in 10 herds between 1999 and 2009. Repeatability animal models were implemented using Bayesian methods. Flat priors were assumed for systematic effects of herd test date, days in milk, and parity, as well as for permanent environmental, genetic, and residual effects. On average, Burlina cows produced 17.0 kg of milk per day, with 3.66 and 3.33% of fat and protein, respectively, and 358000 cells per mL of milk. Marginal posterior medians (highest posterior density of 95%) of heritability were 0.18 (0.09–0.28), 0.28 (0.21–0.36), 0.35 (0.25–0.49), and 0.05 (0.01–0.11) for milk yield, fat content, protein content, and SCS, respectively. Marginal posterior medians of genetic correlations between the traits were low and a 95% Bayesian confidence region included zero, with the exception of the genetic correlation between fat and protein contents. Despite the low number of animals in the population, results suggest that genetic variance for production and quality traits exists in Burlina cattle.  相似文献   

5.
Lifestyle and genetic factors play a large role in the development of Type 2 Diabetes (T2D). Despite the important role of genetic factors, genetic information is not incorporated into the clinical assessment of T2D risk. We assessed and compared Whole Genome Regression methods to predict the T2D status of 5,245 subjects from the Framingham Heart Study. For evaluating each method we constructed the following set of regression models: A clinical baseline model (CBM) which included non-genetic covariates only. CBM was extended by adding the first two marker-derived principal components and 65 SNPs identified by a recent GWAS consortium for T2D (M-65SNPs). Subsequently, it was further extended by adding 249,798 genome-wide SNPs from a high-density array. The Bayesian models used to incorporate genome-wide marker information as predictors were: Bayes A, Bayes Cπ, Bayesian LASSO (BL), and the Genomic Best Linear Unbiased Prediction (G-BLUP). Results included estimates of the genetic variance and heritability, genetic scores for T2D, and predictive ability evaluated in a 10-fold cross-validation. The predictive AUC estimates for CBM and M-65SNPs were: 0.668 and 0.684, respectively. We found evidence of contribution of genetic effects in T2D, as reflected in the genomic heritability estimates (0.492±0.066). The highest predictive AUC among the genome-wide marker Bayesian models was 0.681 for the Bayesian LASSO. Overall, the improvement in predictive ability was moderate and did not differ greatly among models that included genetic information. Approximately 58% of the total number of genetic variants was found to contribute to the overall genetic variation, indicating a complex genetic architecture for T2D. Our results suggest that the Bayes Cπ and the G-BLUP models with a large set of genome-wide markers could be used for predicting risk to T2D, as an alternative to using high-density arrays when selected markers from large consortiums for a given complex trait or disease are unavailable.  相似文献   

6.
Weaning weights from 83 389 Limousin calves born between 1993 and 2002 in France and the Trans-Tasman block (Australia/New Zealand) were analysed to compare different strategies for running an international genetic evaluation for the breed. These records were a subset of the complete data for both countries and comprised a sample of herds that had recorded progeny of sires used across both countries. Genetic and phenotypic parameters for weaning weight were estimated within the countries. The estimates of direct genetic heritabilities were higher in France than in the Trans-Tasman block (0.31 vs. 0.22), while direct-maternal genetic correlations were less negative in the Trans-Tasman block (-0.10) than in France (-0.21). Different strategies for an international evaluation were studied, and the correlations between the estimated breeding values (EBV) of national evaluations and these strategies were derived. The international evaluation strategies were a) an animal model on raw performance data with non unity genetic correlations and heterogeneous residual and genetic variances across countries; b) the same animal model applied to pre-corrected (for fixed effects) performance data; and c) a sire model on de-regressed proofs (MACE). Estimates of the genetic correlations between weaning weight in both countries were 0.86 (0.80) for direct (maternal) genetic effects for the first strategy. Estimation of variance components by MACE appeared to be very sensitive to the sample of bulls and their reliability approximations. Variance component estimates obtained using pre-corrected data were inconsistent with estimates on raw data. However, the EBV predicted using pre-corrected data and parameters estimated from the raw data were similar to those predicted from raw data. Correlations between national and international EBV were always high (> 0.90) for sires, whichever genetic effect (direct or maternal) or international evaluation model was considered. The ranking of the bulls in the top 100 is of primary interest in terms of international genetic evaluation. In this study, some re-ranking of sires was observed for the top 100 bulls between countries and between the three international evaluation models. Thus, the origin of top sires may vary according to the implemented international evaluation strategy.  相似文献   

7.
L D Van Vleck 《Biometrics》1978,34(1):123-127
The genetic fetal effects model shows that the usual sire effect is composed of one-half the direct additive genetic value and one-fourth of the fetal additive genetic value of the sire. The usual sire component of variance is actually the variance of that function. Genetic covariances between records of relatives influenced by fetuses of related sires can easily be written. If the magnitude of fetal sire effects is such that nonrandom use of fetal sires on daughters of sires being evaluated on daughter performance results in bias, the bias can be eliminated (Henderson 1975) by considering the fetal sire effects to be fixed effects. Some reduction in prediction error variance is likely by including fetal sire in the sire evaluation model.  相似文献   

8.
Data on doe longevity in a rabbit population were analysed using a semiparametric log-Normal animal frailty model. Longevity was defined as the time from the first positive pregnancy test to death or culling due to pathological problems. Does culled for other reasons had right censored records of longevity. The model included time dependent covariates associated with year by season, the interaction between physiological state and the number of young born alive, and between order of positive pregnancy test and physiological state. The model also included an additive genetic effect and a residual in log frailty. Properties of marginal posterior distributions of specific parameters were inferred from a full Bayesian analysis using Gibbs sampling. All of the fully conditional posterior distributions defining a Gibbs sampler were easy to sample from, either directly or using adaptive rejection sampling. The marginal posterior mean estimates of the additive genetic variance and of the residual variance in log frailty were 0.247 and 0.690.  相似文献   

9.
The objectives of the present study were: (1) to evaluate the importance of genotype×production environment interaction for the genetic evaluation of birth weight (BW) and weaning weight (WW) in a population of composite beef cattle in Brazil, and (2) to investigate the importance of sire×contemporary group interaction (S×CG) to model G×E and improve the accuracy of prediction in routine genetic evaluations of this population. Analyses were performed with one, two (favorable and unfavorable) or three (favorable, intermediate, unfavorable) different definitions of production environments. Thus, BW and WW records of animals in a favorable environment were assigned to either trait 1, in an intermediate environment to trait 2 or in an unfavorable environment to trait 3. The (co)variance components were estimated using Gibbs sampling in single-, bi- or three-trait animal models according to the definition of number of production environments. In general, the estimates of genetic parameters for BW and WW were similar between environments. The additive genetic correlations between production environments were close to unity for BW; however, when examining the highest posterior density intervals, the correlation between favorable and unfavorable environments reached a value of only 0.70, a fact that may lead to changes in the ranking of sires across environments. The posterior mean genetic correlation between direct effects was 0.63 in favorable and unfavorable environments for WW. When S×CG was included in two- or three-trait analyses, all direct genetic correlations were close to unity, suggesting that there was no evidence of a genotype×production environment interaction. Furthermore, the model including S×CG contributed to prevent overestimation of the accuracy of breeding values of sires, provided a lower error of prediction for both direct and maternal breeding values, lower squared bias, residual variance and deviance information criterion than the model omitting S×CG. Thus, the model that included S×CG can therefore be considered the best model on the basis of these criteria. The genotype×production environment interaction should not be neglected in the genetic evaluation of BW and WW in the present population of beef cattle. The inclusion of S×CG in the model is a feasible and plausible alternative to model the effects of G×E in the genetic evaluations.  相似文献   

10.
Gianola D  Fernando RL  Stella A 《Genetics》2006,173(3):1761-1776
Semiparametric procedures for prediction of total genetic value for quantitative traits, which make use of phenotypic and genomic data simultaneously, are presented. The methods focus on the treatment of massive information provided by, e.g., single-nucleotide polymorphisms. It is argued that standard parametric methods for quantitative genetic analysis cannot handle the multiplicity of potential interactions arising in models with, e.g., hundreds of thousands of markers, and that most of the assumptions required for an orthogonal decomposition of variance are violated in artificial and natural populations. This makes nonparametric procedures attractive. Kernel regression and reproducing kernel Hilbert spaces regression procedures are embedded into standard mixed-effects linear models, retaining additive genetic effects under multivariate normality for operational reasons. Inferential procedures are presented, and some extensions are suggested. An example is presented, illustrating the potential of the methodology. Implementations can be carried out after modification of standard software developed by animal breeders for likelihood-based or Bayesian analysis.  相似文献   

11.
Aims were to estimate the extent of genetic heterogeneity in environmental variance. Data comprised 99 535 records of 35-day body weights from broiler chickens reared in a controlled environment. Residual variance within dam families was estimated using ASREML, after fitting fixed effects such as genetic groups and hatches, for each of 377 genetically contemporary sires with a large number of progeny (> 100 males or females each). Residual variance was computed separately for male and female offspring, and after correction for sampling, strong evidence for heterogeneity was found, the standard deviation between sires in within variance amounting to 15–18% of its mean. Reanalysis using log-transformed data gave similar results, and elimination of 2–3% of outlier data reduced the heterogeneity but it was still over 10%. The correlation between estimates for males and females was low, however. The correlation between sire effects on progeny mean and residual variance for body weight was small and negative (-0.1). Using a data set bigger than any yet presented and on a trait measurable in both sexes, this study has shown evidence for heterogeneity in the residual variance, which could not be explained by segregation of major genes unless very few determined the trait.  相似文献   

12.
We investigated the effect of stage of pregnancy on estimates of breeding values for milk yield and milk persistency in Gyr and Holstein dairy cattle in Brazil. Test-day milk yield records were analyzed using random regression models with or without the effect of pregnancy. Models were compared using residual variances, heritabilities, rank correlations of estimated breeding values of bulls and cows, and number of nonpregnant cows in the top 200 for milk yield and milk persistency. The estimates of residual variance and heritabilities obtained with the models with or without the effect of pregnancy were similar for the two breeds. Inclusion of the effect of pregnancy in genetic evaluation models for these populations did not affect the ranking of cows and sires based on their predicted breeding values for 305-day cumulative milk yield. In contrast, when we examined persistency of milk yield, lack of adjustment for the effect of pregnancy overestimated breeding values of nonpregnant cows and cows with a long days open period and underestimated breeding values of cows with a short days open period. We recommend that models include the effect of days of pregnancy for estimation of adjustment factors for the effect of pregnancy in genetic evaluations of Dairy Gyr and Holstein cattle.  相似文献   

13.
Multi-category classification methods were used to detect SNP-mortality associations in broilers. The objective was to select a subset of whole genome SNPs associated with chick mortality. This was done by categorizing mortality rates and using a filter-wrapper feature selection procedure in each of the classification methods evaluated. Different numbers of categories (2, 3, 4, 5 and 10) and three classification algorithms (naïve Bayes classifiers, Bayesian networks and neural networks) were compared, using early and late chick mortality rates in low and high hygiene environments. Evaluation of SNPs selected by each classification method was done by predicted residual sum of squares and a significance test-related metric. A naïve Bayes classifier, coupled with discretization into two or three categories generated the SNP subset with greatest predictive ability. Further, an alternative categorization scheme, which used only two extreme portions of the empirical distribution of mortality rates, was considered. This scheme selected SNPs with greater predictive ability than those chosen by the methods described previously. Use of extreme samples seems to enhance the ability of feature selection procedures to select influential SNPs in genetic association studies.  相似文献   

14.
Heritability is a central element in quantitative genetics. New molecular markers to assess genetic variance and heritability are continually under development. The availability of molecular single nucleotide polymorphism (SNP) markers can be applied for estimation of variance components and heritability on population, where relationship information is unknown. In this study, we evaluated the capabilities of two Bayesian genomic models to estimate heritability in simulated populations. The populations comprised different family structures of either no or a limited number of relatives, a single quantitative trait, and with one of two densities of SNP markers. All individuals were both genotyped and phenotyped. Results illustrated that the two models were capable of estimating heritability, when true heritability was 0.15 or higher and populations had a sample size of 400 or higher. For heritabilities of 0.05, all models had difficulties in estimating the true heritability. The two Bayesian models were compared with a restricted maximum likelihood (REML) approach using a genomic relationship matrix. The comparison showed that the Bayesian approaches performed equally well as the REML approach. Differences in family structure were in general not found to influence the estimation of the heritability. For the sample sizes used in this study, a 10-fold increase of SNP density did not improve precision estimates compared with set-ups with a less dense distribution of SNPs. The methods used in this study showed that it was possible to estimate heritabilities on the basis of SNPs in animals with direct measurements. This conclusion is valuable in cases when quantitative traits are either difficult or expensive to measure.  相似文献   

15.
P. Uimari  G. Thaller    I. Hoeschele 《Genetics》1996,143(4):1831-1842
Information on multiple linked genetic markers was used in a Bayesian method for the statistical mapping of quantitative trait loci (QTL). Bayesian parameter estimation and hypothesis testing were implemented via Markov chain Monte Carlo algorithms. Variables sampled were the augmented data (marker-QTL genotypes, polygenic effects), an indicator variable for linkage or nonlinkage, and the parameters. The parameter vector included allele frequencies at the markers and the QTL, map distances of the markers and the QTL, QTL substitution effect, and polygenic and residual variances. The criterion for QTL detection was the marginal posterior probability of a QTL being located on the chromosome carrying the markers. The method was evaluated empirically by analyzing simulated granddaughter designs consisting of 2000 sons, 20 related sires, and their ancestors.  相似文献   

16.
A Bayesian approach to the statistical mapping of Quantitative Trait Loci (QTLs) using single markers was implemented via Markov Chain Monte Carlo (MCMC) algorithms for parameter estimation and hypothesis testing. Parameter estimators were marginal posterior means computed using a Gibbs sampler with data augmentation. Variables sampled included the augmented data (marker-QTL genotypes, polygenic effects), an indicator variable for linkage, and the parameters (allele frequency, QTL substitution effect, recombination rate, polygenic and residual variances). Several MCMC algorithms were derived for computing Bayesian tests of linkage, which consisted of the marginal posterior probability of linkage and the marginal likelihood of the QTL variance associated with the marker.  相似文献   

17.
A Bayesian procedure was used to estimate linear reaction norms (i.e. individual G × E plots) on 297 518 litter size records of 121 104 sows, daughters of 2040 sires, recorded on 144 farms in North and Latin America, Europe, Asia and Australia. The method allowed for simultaneous estimation of all parameters involved. The analysis was carried out on three subsets, comprising (i) parity 1 records of 33 641 sows of line B, (ii) all parity records of 52 120 sows of line B and (iii) all parity records of 121 104 sows of lines A, B and A × B. Estimated heritabilities ranged from 0.09 to 0.10 (smallest to largest subset) for the intercept of the reaction norms, and were 0.15, 0.08 and 0.02 (ditto) for the slope. Estimated genetic correlations between intercept and slope were -0.09, +0.26 and +0.69 (ditto). The three subsets therefore showed a progressively lower genetic component to environmental sensitivity, and progressively less re-ranking of genotypes across the environmental (herd-year-season) range. In a genetic evaluation that does not include reaction norms in the statistical model, part of the G × E effect remains confounded with the additive genetic effect, which may lead to errors in the estimates of the additive genetic effect; the reaction norms model removes this confounding. The intercept estimates from the largest data subset show correlations with litter size estimated breeding values (EBV) from routine genetic evaluation (without reaction norms included) of 0.78 to 0.85 for sows with one to seven litter records, and 0.75 for sires. Hence, including reaction norms in genetic evaluation would increase the reliability of the EBV of young selection candidates without own performance or progeny data by considerably more than 100 × (1/0.75-1) = 33%. Reaction norm slope estimates turn out to be very demanding statistics; environmental sensitivity must therefore be classified as a 'hard-to-measure' trait.  相似文献   

18.
A hierarchical animal model was developed for inference on genetic merit of livestock with uncertain paternity. Fully conditional posterior distributions for fixed and genetic effects, variance components, sire assignments and their probabilities are derived to facilitate a Bayesian inference strategy using MCMC methods. We compared this model to a model based on the Henderson average numerator relationship (ANRM) in a simulation study with 10 replicated datasets generated for each of two traits. Trait 1 had a medium heritability (h2) for each of direct and maternal genetic effects whereas Trait 2 had a high h2 attributable only to direct effects. The average posterior probabilities inferred on the true sire were between 1 and 10% larger than the corresponding priors (the inverse of the number of candidate sires in a mating pasture) for Trait 1 and between 4 and 13% larger than the corresponding priors for Trait 2. The predicted additive and maternal genetic effects were very similar using both models; however, model choice criteria (Pseudo Bayes Factor and Deviance Information Criterion) decisively favored the proposed hierarchical model over the ANRM model.  相似文献   

19.

Background

Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual’s breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used.

Contributions

In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value.

Results

We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.  相似文献   

20.
Fertility quantitative trait loci (QTL) are of high interest in dairy cattle since insemination failure has dramatically increased in some breeds such as Holstein. High-throughput SNP analysis and SNP microarrays give the opportunity to genotype many animals for hundreds SNPs per chromosome. In this study, due to these techniques a dense SNP marker map was used to fine map a QTL underlying nonreturn rate measured 90 days after artificial insemination previously detected with a low-density microsatellite marker map. A granddaughter design with 17 Holstein half-sib families (926 offspring) was genotyped for a set of 437 SNPs mapping to BTA3. Linkage analysis was performed by both regression and variance components analysis. An additional analysis combining both linkage analysis and linkage-disequilibrium information was applied. This method first estimated identity-by-descent probabilities among base haplotypes. These probabilities were then used to group the base haplotypes in different clusters. A QTL explaining 14% of the genetic variance was found with high significance (P < 0.001) at position 19 cM with the linkage analysis and four sires were estimated to be heterozygous (P < 0.05). Addition of linkage-disequilibrium information refined the QTL position to a set of narrow peaks. The use of the haplotypes of heterozygous sires offered the possibility to give confidence in some peaks while others could be discarded. Two peaks with high likelihood-ratio test values in the region of which heterozygous sires shared a common haplotype appeared particularly interesting. Despite the fact that the analysis did not fine map the QTL in a unique narrow region, the method proved to be able to handle efficiently and automatically a large amount of information and to refine the QTL position to a small set of narrow intervals. In addition, the QTL identified was confirmed to have a large effect (explaining 13.8% of the genetic variance) on dairy cow fertility as estimated by nonreturn rate at 90 days.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号