共查询到13条相似文献,搜索用时 15 毫秒
1.
Genetic correlations between quantitative traits measured in many breeding programs are pervasive. These correlations indicate that measurements of one trait carry information on other traits. Current single-trait (univariate) genomic selection does not take advantage of this information. Multivariate genomic selection on multiple traits could accomplish this but has been little explored and tested in practical breeding programs. In this study, three multivariate linear models (i.e., GBLUP, BayesA, and BayesCπ) were presented and compared to univariate models using simulated and real quantitative traits controlled by different genetic architectures. We also extended BayesA with fixed hyperparameters to a full hierarchical model that estimated hyperparameters and BayesCπ to impute missing phenotypes. We found that optimal marker-effect variance priors depended on the genetic architecture of the trait so that estimating them was beneficial. We showed that the prediction accuracy for a low-heritability trait could be significantly increased by multivariate genomic selection when a correlated high-heritability trait was available. Further, multiple-trait genomic selection had higher prediction accuracy than single-trait genomic selection when phenotypes are not available on all individuals and traits. Additional factors affecting the performance of multiple-trait genomic selection were explored. 相似文献
2.
A popular approach to detecting positive selection is to estimate the parameters of a probabilistic model of codon evolution and perform inference based on its maximum likelihood parameter values. This approach has been evaluated intensively in a number of simulation studies and found to be robust when the available data set is large. However, uncertainties in the estimated parameter values can lead to errors in the inference, especially when the data set is small or there is insufficient divergence between the sequences. We introduce a Bayesian model comparison approach to infer whether the sequence as a whole contains sites at which the rate of nonsynonymous substitution is greater than the rate of synonymous substitution. We incorporated this probabilistic model comparison into a Bayesian approach to site-specific inference of positive selection. Using simulated sequences, we compared this approach to the commonly used empirical Bayes approach and investigated the effect of tree length on the performance of both methods. We found that the Bayesian approach outperforms the empirical Bayes method when the amount of sequence divergence is small and is less prone to false-positive inference when the sequences are saturated, while the results are indistinguishable for intermediate levels of sequence divergence. 相似文献
3.
Shizhong Xu 《Genetics》2013,195(3):1103-1115
The correct models for quantitative trait locus mapping are the ones that simultaneously include all significant genetic effects. Such models are difficult to handle for high marker density. Improving statistical methods for high-dimensional data appears to have reached a plateau. Alternative approaches must be explored to break the bottleneck of genomic data analysis. The fact that all markers are located in a few chromosomes of the genome leads to linkage disequilibrium among markers. This suggests that dimension reduction can also be achieved through data manipulation. High-density markers are used to infer recombination breakpoints, which then facilitate construction of bins. The bins are treated as new synthetic markers. The number of bins is always a manageable number, on the order of a few thousand. Using the bin data of a recombinant inbred line population of rice, we demonstrated genetic mapping, using all bins in a simultaneous manner. To facilitate genomic selection, we developed a method to create user-defined (artificial) bins, in which breakpoints are allowed within bins. Using eight traits of rice, we showed that artificial bin data analysis often improves the predictability compared with natural bin data analysis. Of the eight traits, three showed high predictability, two had intermediate predictability, and two had low predictability. A binary trait with a known gene had predictability near perfect. Genetic mapping using bin data points to a new direction of genomic data analysis. 相似文献
4.
Modeling epistasis in genomic selection is impeded by a high computational load. The extended genomic best linear unbiased prediction (EG-BLUP) with an epistatic relationship matrix and the reproducing kernel Hilbert space regression (RKHS) are two attractive approaches that reduce the computational load. In this study, we proved the equivalence of EG-BLUP and genomic selection approaches, explicitly modeling epistatic effects. Moreover, we have shown why the RKHS model based on a Gaussian kernel captures epistatic effects among markers. Using experimental data sets in wheat and maize, we compared different genomic selection approaches and concluded that prediction accuracy can be improved by modeling epistasis for selfing species but may not for outcrossing species. 相似文献
5.
In this article, we propose a model selection method, the Bayesian composite model space approach, to map quantitative trait loci (QTL) in a half-sib population for continuous and binary traits. In our method, the identity-by-descent-based variance component model is used. To demonstrate the performance of this model, the method was applied to map QTL underlying production traits on BTA6 in a Chinese half-sib dairy cattle population. A total of four QTLs were detected, whereas only one QTL was identified using the traditional least square (LS) method. We also conducted two simulation experiments to validate the efficiency of our method. The results suggest that the proposed method based on a multiple-QTL model is efficient in mapping multiple QTL for an outbred half-sib population and is more powerful than the LS method based on a single-QTL model. 相似文献
6.
Currently available methods for model selection used in phylogenetic analysis are based on an initial fixed-tree topology. Once a model is picked based on this topology, a rigorous search of the tree space is run under that model to find the maximum-likelihood estimate of the tree (topology and branch lengths) and the maximum-likelihood estimates of the model parameters. In this paper, we propose two extensions to the decision-theoretic (DT) approach that relax the fixed-topology restriction. We also relax the fixed-topology restriction for the Bayesian information criterion (BIC) and the Akaike information criterion (AIC) methods. We compare the performance of the different methods (the relaxed, restricted, and the likelihood-ratio test [LRT]) using simulated data. This comparison is done by evaluating the relative complexity of the models resulting from each method and by comparing the performance of the chosen models in estimating the true tree. We also compare the methods relative to one another by measuring the closeness of the estimated trees corresponding to the different chosen models under these methods. We show that varying the topology does not have a major impact on model choice. We also show that the outcome of the two proposed extensions is identical and is comparable to that of the BIC, Extended-BIC, and DT. Hence, using the simpler methods in choosing a model for analyzing the data is more computationally feasible, with results comparable to the more computationally intensive methods. Another outcome of this study is that earlier conclusions about the DT approach are reinforced. That is, LRT, Extended-AIC, and AIC result in more complicated models that do not contribute to the performance of the phylogenetic inference, yet cause a significant increase in the time required for data analysis. 相似文献
7.
Ana I. Vazquez Gustavo de los Campos Yann C. Klimentidis Guilherme J. M. Rosa Daniel Gianola Nengjun Yi David B. Allison 《Genetics》2012,192(4):1493-1502
Prediction of genetic risk for disease is needed for preventive and personalized medicine. Genome-wide association studies have found unprecedented numbers of variants associated with complex human traits and diseases. However, these variants explain only a small proportion of genetic risk. Mounting evidence suggests that many traits, relevant to public health, are affected by large numbers of small-effect genes and that prediction of genetic risk to those traits and diseases could be improved by incorporating large numbers of markers into whole-genome prediction (WGP) models. We developed a WGP model incorporating thousands of markers for prediction of skin cancer risk in humans. We also considered other ways of incorporating genetic information into prediction models, such as family history or ancestry (using principal components, PCs, of informative markers). Prediction accuracy was evaluated using the area under the receiver operating characteristic curve (AUC) estimated in a cross-validation. Incorporation of genetic information (i.e., familial relationships, PCs, or WGP) yielded a significant increase in prediction accuracy: from an AUC of 0.53 for a baseline model that accounted for nongenetic covariates to AUCs of 0.58 (pedigree), 0.62 (PCs), and 0.64 (WGP). In summary, prediction of skin cancer risk could be improved by considering genetic information and using a large number of single-nucleotide polymorphisms (SNPs) in a WGP model, which allows for the detection of patterns of genetic risk that are above and beyond those that can be captured using family history. We discuss avenues for improving prediction accuracy and speculate on the possible use of WGP to prospectively identify individuals at high risk. 相似文献
8.
M J Sillanp?? 《Heredity》2011,106(4):511-519
Population-based genomic association analyses are more powerful than within-family analyses. However, population stratification (unknown or ignored origin of individuals from multiple source populations) and cryptic relatedness (unknown or ignored covariance between individuals because of their relatedness) are confounding factors in population-based genomic association analyses, which inflate the false-positive rate. As a consequence, false association signals may arise in genomic data association analyses for reasons other than true association between the tested genomic factor (marker genotype, gene or protein expression) and the study phenotype. It is therefore important to correct or account for these confounders in population-based genomic data association analyses. The common correction techniques for population stratification and cryptic relatedness problems are presented here in the phenotype–marker association analysis context, and comments on their suitability for other types of genomic association analyses (for example, phenotype–expression association) are also provided. Even though many of these techniques have originally been developed in the context of human genetics, most of them are also applicable to model organisms and breeding populations. 相似文献
9.
C. S. Wang D. Gianola D. A. Sorensen J. Jensen A. Christensen J. J. Rutledge 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》1994,88(2):220-230
A replicated selection experiment aimed at increasing litter size (total number of pigs born per litter) in Danish Landrace pigs was conducted from 1984 to 1991. The experiment included two selection and two control lines. In each generation, 30 and 14 first litters were produced in selection and control lines, respectively, and dams produced two litters. Each replicate, consisting of one selection and one control line, was founded from 60 families chosen randomly from the population at large. Family selection was practiced, and the criterion was the predicted breeding value for litter size computed using a repeatability animal model, and taking into account all available information. The data consisted of 947 records from 523 dams (424 dams had two litters) representing five cycles of selection of increased litter size. Data were analyzed from a Bayesian perspective, based on marginal posterior distributions of genetic parameters of interest. Marginalization was achieved using Gibbs sampling, with a single chain length of 1 205 000. After discarding the first 5 000 iterations, a sample was drawn every ten iterations, so 120 000 samples in total were saved. Densities were estimated and plotted, and summary statistics were computed from the estimated densities. The posterior means (± standard error) of heritability and repeatability were 0.22 ± 0.06 and 0.32 ± 0.05, respectively. These point estimates of genetic parameters were within the range of literature values, although on the high side. The posterior mean (± standard error) of genetic response to selection, defined as the difference between the mean breeding values of the selected lines and that of the base population, was 1.37 ± 0.43 pigs after five cycles of selection. The regression (through the origin) of breeding values in the selected lines on generation was 0.25 ± 0.08 pigs. Several informative priors constructed from information obtained with field data in this population were used to examine their influence on inferences. The priors were influential because of the relatively small scale of the experiment. An analysis excluding data from one of the control lines gave smaller genetic variance and heritability, and a smaller response to selection. However, it appears that selection for litter size is effective, but that the true rate of response is probably smaller than data from this experiment suggest. 相似文献
10.
Data were analysed from a divergent selection experiment for an indicator of body composition in the mouse, the ratio of gonadal fat pad to body weight (GFPR). Lines were selected for 20 generations for fat (F), lean (L) or were unselected (C), with three replicates of each. Selection was within full-sib families, 16 families per replicate for the first seven generations, eight subsequently. At generation 20, GFPR in the F lines was twice and in the L lines half that of C. A log transformation removed both asymmetry of response and heterogeneity of variance among lines, and so was used throughout. Estimates of genetic variance and heritability (approximately 50%) obtained using REML with an animal model were very similar, whether estimated from the first few generations of selection, or from all 20 generations, or from late generations having fitted pedigree. The estimates were also similar when estimated from selected or control lines. Estimates from REML also agreed with estimates of realised heritability. The results all accord with expectations under the infinitesimal model, despite the four-fold changes in mean. Relaxed selection lines, derived from generation 20, showed little regression in fatness after 40 generations without selection. 相似文献
11.
The rate of molecular evolution can vary among lineages. Sources of this variation have differential effects on synonymous and nonsynonymous substitution rates. Changes in effective population size or patterns of natural selection will mainly alter nonsynonymous substitution rates. Changes in generation length or mutation rates are likely to have an impact on both synonymous and nonsynonymous substitution rates. By comparing changes in synonymous and nonsynonymous rates, the relative contributions of the driving forces of evolution can be better characterized. Here, we introduce a procedure for estimating the chronological rates of synonymous and nonsynonymous substitutions on the branches of an evolutionary tree. Because the widely used ratio of nonsynonymous and synonymous rates is not designed to detect simultaneous increases or simultaneous decreases in synonymous and nonsynonymous rates, the estimation of these rates rather than their ratio can improve characterization of the evolutionary process. With our Bayesian approach, we analyze cytochrome oxidase subunit I evolution in primates and infer that nonsynonymous rates have a greater tendency to change over time than do synonymous rates. Our analysis of these data also suggests that rates have been positively correlated. 相似文献
12.
Huaan Yang Jianbo Jian Xuan Li Daniel Renshaw Jonathan Clements Mark W. Sweetingham Cong Tan Chengdao Li 《BMC genomics》2015,16(1)
Background
Molecular marker-assisted breeding provides an efficient tool to develop improved crop varieties. A major challenge for the broad application of markers in marker-assisted selection is that the marker phenotypes must match plant phenotypes in a wide range of breeding germplasm. In this study, we used the legume crop species Lupinus angustifolius (lupin) to demonstrate the utility of whole genome sequencing and re-sequencing on the development of diagnostic markers for molecular plant breeding.Results
Nine lupin cultivars released in Australia from 1973 to 2007 were subjected to whole genome re-sequencing. The re-sequencing data together with the reference genome sequence data were used in marker development, which revealed 180,596 to 795,735 SNP markers from pairwise comparisons among the cultivars. A total of 207,887 markers were anchored on the lupin genetic linkage map. Marker mining obtained an average of 387 SNP markers and 87 InDel markers for each of the 24 genome sequence assembly scaffolds bearing markers linked to 11 genes of agronomic interest. Using the R gene PhtjR conferring resistance to phomopsis stem blight disease as a test case, we discovered 17 candidate diagnostic markers by genotyping and selecting markers on a genetic linkage map. A further 243 candidate diagnostic markers were discovered by marker mining on a scaffold bearing non-diagnostic markers linked to the PhtjR gene. Nine out from the ten tested candidate diagnostic markers were confirmed as truly diagnostic on a broad range of commercial cultivars. Markers developed using these strategies meet the requirements for broad application in molecular plant breeding.Conclusions
We demonstrated that low-cost genome sequencing and re-sequencing data were sufficient and very effective in the development of diagnostic markers for marker-assisted selection. The strategies used in this study may be applied to any trait or plant species. Whole genome sequencing and re-sequencing provides a powerful tool to overcome current limitations in molecular plant breeding, which will enable plant breeders to precisely pyramid favourable genes to develop super crop varieties to meet future food demands.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1878-5) contains supplementary material, which is available to authorized users. 相似文献13.
Huaan Yang Daniel Renshaw Geoff Thomas Bevan Buirchell Mark Sweetingham 《Molecular breeding : new strategies in plant improvement》2008,21(4):473-483
A key challenge in marker-assisted selection (MAS) for molecular plant breeding is to develop markers linked to genes of interest
which are applicable to multiple breeding populations. In this study representative F2 plants from a cross Mandalup (resistant to anthracnose disease) × Quilinock (susceptible) of Lupinus angustifolius were used in DNA fingerprinting by Microsatellite-anchored Fragment Length Polymorphism (MFLP). Nine candidate MFLP markers
linked to anthracnose resistance were identified, then ‘validated’ on 17 commercial cultivars. The number of “false positives”
(showing resistant-allele band but lack of the R gene) for each of the nine candidate MFLP markers on the 17 cultivars ranged
from 1 to 9. The candidate marker with least number of false positive was selected, sequenced, and was converted into a co-dominant,
sequence-specific, simple PCR based marker suitable for routine implementation. Testing on 180 F2 plants confirmed that the converted marker was linked to the R gene at 5.1 centiMorgan. The banding pattern of the converted
marker was consistent with the disease phenotype on 23 out of the 24 cultivars. This marker, designated “AnManM1”, is now
being used for MAS in the Australian lupin breeding program. We conclude that generation of multiple candidate markers, followed
by a validation step to select the best marker before conversion to an implementable form is an efficient strategy to ensure
wide applicability for MAS. 相似文献