首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Recent developments in marginal likelihood estimation for model selection in the field of Bayesian phylogenetics and molecular evolution have emphasized the poor performance of the harmonic mean estimator (HME). Although these studies have shown the merits of new approaches applied to standard normally distributed examples and small real-world data sets, not much is currently known concerning the performance and computational issues of these methods when fitting complex evolutionary and population genetic models to empirical real-world data sets. Further, these approaches have not yet seen widespread application in the field due to the lack of implementations of these computationally demanding techniques in commonly used phylogenetic packages. We here investigate the performance of some of these new marginal likelihood estimators, specifically, path sampling (PS) and stepping-stone (SS) sampling for comparing models of demographic change and relaxed molecular clocks, using synthetic data and real-world examples for which unexpected inferences were made using the HME. Given the drastically increased computational demands of PS and SS sampling, we also investigate a posterior simulation-based analogue of Akaike's information criterion (AIC) through Markov chain Monte Carlo (MCMC), a model comparison approach that shares with the HME the appealing feature of having a low computational overhead over the original MCMC analysis. We confirm that the HME systematically overestimates the marginal likelihood and fails to yield reliable model classification and show that the AICM performs better and may be a useful initial evaluation of model choice but that it is also, to a lesser degree, unreliable. We show that PS and SS sampling substantially outperform these estimators and adjust the conclusions made concerning previous analyses for the three real-world data sets that we reanalyzed. The methods used in this article are now available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.  相似文献   

2.
Unbalanced repeated-measures models with structured covariance matrices   总被引:32,自引:0,他引:32  
The question of how to analyze unbalanced or incomplete repeated-measures data is a common problem facing analysts. We address this problem through maximum likelihood analysis using a general linear model for expected responses and arbitrary structural models for the within-subject covariances. Models that can be fit include standard univariate and multivariate models with incomplete data, random-effects models, and models with time-series and factor-analytic error structures. We describe Newton-Raphson and Fisher scoring algorithms for computing maximum likelihood estimates, and generalized EM algorithms for computing restricted and unrestricted maximum likelihood estimates. An example fitting several models to a set of growth data is included.  相似文献   

3.
Statistical methods to map quantitative trait loci (QTL) in outbred populations are reviewed, extensions and applications to human and plant genetic data are indicated, and areas for further research are identified. Simple and computationally inexpensive methods include (multiple) linear regression of phenotype on marker genotypes and regression of squared phenotypic differences among relative pairs on estimated proportions of identity-by-descent at a locus. These methods are less suited for genetic parameter estimation in outbred populations but allow the determination of test statistic distributions via simulation or data permutation; however, further inferences including confidence intervals of QTL location require the use of Monte Carlo or bootstrap sampling techniques. A method which is intermediate in computational requirements is residual maximum likelihood (REML) with a covariance matrix of random QTL effects conditional on information from multiple linked markers. Testing for the number of QTLs on a chromosome is difficult in a classical framework. The computationally most demanding methods are maximum likelihood and Bayesian analysis, which take account of the distribution of multilocus marker-QTL genotypes on a pedigree and permit investigators to fit different models of variation at the QTL. The Bayesian analysis includes the number of QTLs on a chromosome as an unknown.  相似文献   

4.
MOTIVATION: In recent years there has been increased interest in producing large and accurate phylogenetic trees using statistical approaches. However for a large number of taxa, it is not feasible to construct large and accurate trees using only a single processor. A number of specialized parallel programs have been produced in an attempt to address the huge computational requirements of maximum likelihood. We express a number of concerns about the current set of parallel phylogenetic programs which are currently severely limiting the widespread availability and use of parallel computing in maximum likelihood-based phylogenetic analysis. RESULTS: We have identified the suitability of phylogenetic analysis to large-scale heterogeneous distributed computing. We have completed a distributed and fully cross-platform phylogenetic tree building program called distributed phylogeny reconstruction by maximum likelihood. It uses an already proven maximum likelihood-based tree building algorithm and a popular phylogenetic analysis library for all its likelihood calculations. It offers one of the most extensive sets of DNA substitution models currently available. We are the first, to our knowledge, to report the completion of a distributed phylogenetic tree building program that can achieve near-linear speedup while only using the idle clock cycles of machines. For those in an academic or corporate environment with hundreds of idle desktop machines, we have shown how distributed computing can deliver a 'free' ML supercomputer.  相似文献   

5.
Mapping epistatic quantitative trait loci with one-dimensional genome searches   总被引:14,自引:0,他引:14  
Jannink JL  Jansen R 《Genetics》2001,157(1):445-454
The discovery of epistatically interacting QTL is hampered by the intractability and low power to detect QTL in multidimensional genome searches. We describe a new method that maps epistatic QTL by identifying loci of high QTL by genetic background interaction. This approach allows detection of QTL involved not only in pairwise but also higher-order interaction, and does so with one-dimensional genome searches. The approach requires large populations derived from multiple related inbred-line crosses as is more typically available for plants. Using maximum likelihood, the method contrasts models in which QTL allelic values are either nested within, or fixed over, populations. We apply the method to simulated doubled-haploid populations derived from a diallel among three inbred parents and illustrate the power of the method to detect QTL of different effect size and different levels of QTL by genetic background interaction. Further, we show how the method can be used in conjunction with standard two-locus QTL detection models that use two-dimensional genome searches and find that the method may double the power to detect first-order epistasis.  相似文献   

6.
Procedures for performing cladistic analyses can provide powerful tools for understanding the evolution of neuropeptide and polypeptide hormone coding genes. These analyses can be done on either amino acid data sets or nucleotide data sets and can utilize several different algorithms that are dependent on distinct sets of operating assumptions and constraints. In some cases, the results of these analyses can be used to gauge phylogenetic relationships between taxa. Selecting the proper cladistic analysis strategy is dependent on the taxonomic level of analysis and the rate of evolution within the orthologous genes being evaluated. For example, previous studies have shown that the amino acid sequence of proopiomelanocortin (POMC), the common precursor for the melanocortins and beta-endorphin, can be used to resolve phylogenetic relationships at the class and order level. This study tested the hypothesis that POMC sequences could be used to resolve phylogenetic relationships at the family taxonomic level. Cladistic analyses were performed on amphibian POMC sequences characterized from the marine toad, Bufo marinus (family Bufonidae; this study), the spadefoot toad, Spea multiplicatus (family Pelobatidae), the African clawed frog, Xenopus laevis (family Pipidae) and the laughing frog, Rana ridibunda (family Ranidae). In these analyses the sequence of Australian lungfish POMC was used as the outgroup. The analyses were done at the amino acid level using the maximum parsimony algorithm and at the nucleotide level using the maximum likelihood algorithm. For the anuran POMC genes, analysis at the nucleotide level using the maximum likelihood algorithm generated a cladogram with higher bootstrap values than the maximum parsimony analysis of the POMC amino acid data set. For anuran POMC sequences, analysis of nucleotide sequences using the maximum likelihood algorithm would appear to be the preferred strategy for resolving phylogenetic relationships at the family taxonomic level.  相似文献   

7.
We developed a generalized linear model of QTL mapping for discrete traits in line crossing experiments. Parameter estimation was achieved using two different algorithms, a mixture model-based EM (expectation–maximization) algorithm and a GEE (generalized estimating equation) algorithm under a heterogeneous residual variance model. The methods were developed using ordinal data, binary data, binomial data and Poisson data as examples. Applications of the methods to simulated as well as real data are presented. The two different algorithms were compared in the data analyses. In most situations, the two algorithms were indistinguishable, but when large QTL are located in large marker intervals, the mixture model-based EM algorithm can fail to converge to the correct solutions. Both algorithms were coded in C++ and interfaced with SAS as a user-defined SAS procedure called PROC QTL.  相似文献   

8.
A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.  相似文献   

9.
Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.   相似文献   

10.
Likelihood applications have become a central approach for molecular evolutionary analyses since the first computationally tractable treatment two decades ago. Although Felsenstein's original pruning algorithm makes likelihood calculations feasible, it is usually possible to take advantage of repetitive structure present in the data to arrive at even greater computational reductions. In particular, alignment columns with certain similarities have components of the likelihood calculation that are identical and need not be recomputed if columns are evaluated in an optimal order. We develop an algorithm for exploiting this speed improvement via an application of graph theory. The reductions provided by the method depend on both the tree and the data, but typical savings range between 15%and 50%. Real-data examples with time reductions of 80%have been identified. The overhead costs associated with implementing the algorithm are minimal, and they are recovered in all but the smallest data sets. The modifications will provide faster likelihood algorithms, which will allow likelihood methods to be applied to larger sets of taxa and to include more thorough searches of the tree topology space.  相似文献   

11.
Higher-level relationships within, and the root of Placentalia, remain contentious issues. Resolution of the placental tree is important to the choice of mammalian genome projects and model organisms, as well as for understanding the biogeography of the eutherian radiation. We present phylogenetic analyses of 63 species representing all extant eutherian mammal orders for a new molecular phylogenetic marker, a 1.3kb portion of exon 26 of the apolipoprotein B (APOB) gene. In addition, we analyzed a multigene concatenation that included APOB sequences and a previously published data set (Murphy et al., 2001b) of three mitochondrial and 19 nuclear genes, resulting in an alignment of over 17kb for 42 placentals and two marsupials. Due to computational difficulties, previous maximum likelihood analyses of large, multigene concatenations for placental mammals have used quartet puzzling, less complex models of sequence evolution, or phylogenetic constraints to approximate a full maximum likelihood bootstrap. Here, we utilize a Unix load sharing facility to perform maximum likelihood bootstrap analyses for both the APOB and concatenated data sets with a GTR+Gamma+I model of sequence evolution, tree-bisection and reconnection branch-swapping, and no phylogenetic constraints. Maximum likelihood and Bayesian analyses of both data sets provide support for the superordinal clades Boreoeutheria, Euarchontoglires, Laurasiatheria, Xenarthra, Afrotheria, and Ostentoria (pangolins+carnivores), as well as for the monophyly of the orders Eulipotyphla, Primates, and Rodentia, all of which have recently been questioned. Both data sets recovered an association of Hippopotamidae and Cetacea within Cetartiodactyla, as well as hedgehog and shrew within Eulipotyphla. APOB showed strong support for an association of tarsier and Anthropoidea within Primates. Parsimony, maximum likelihood and Bayesian analyses with both data sets placed Afrotheria at the base of the placental radiation. Statistical tests that employed APOB to examine a priori hypotheses for the root of the placental tree rejected rooting on myomorphs and hedgehog, but did not discriminate between rooting at the base of Afrotheria, at the base of Xenarthra, or between Atlantogenata (Xenarthra+Afrotheria) and Boreoeutheria. An orthologous deletion of 363bp in the aligned APOB sequences proved phylogenetically informative for the grouping of the order Carnivora with the order Pholidota into the superordinal clade Ostentoria. A smaller deletion of 237-246bp was diagnostic of the superordinal clade Afrotheria.  相似文献   

12.
Two outlines for mixed model based approaches to quantitative trait locus (QTL) mapping in existing maize hybrid selection programs are presented: a restricted maximum likelihood (REML) and a Bayesian Markov Chain Monte Carlo (MCMC) approach. The methods use the in-silico-mapping procedure developed by Parisseaux and Bernardo (2004) as a starting point. The original single-point approach is extended to a multi-point approach that facilitates interval mapping procedures. For computational and conceptual reasons, we partition the full set of relationships from founders to parents of hybrids into two types of relations by defining so-called intermediate founders. QTL effects are defined in terms of those intermediate founders. Marker based identity by descent relationships between intermediate founders define structuring matrices for the QTL effects that change along the genome. The dimension of the vector of QTL effects is reduced by the fact that there are fewer intermediate founders than parents. Furthermore, additional reduction in the number of QTL effects follows from the identification of founder groups by various algorithms. As a result, we obtain a powerful mixed model based statistical framework to identify QTLs in genetic backgrounds relevant to the elite germplasm of a commercial breeding program. The identification of such QTLs will provide the foundation for effective marker assisted and genome wide selection strategies. Analyses of an example data set show that QTLs are primarily identified in different heterotic groups and point to complementation of additive QTL effects as an important factor in hybrid performance.  相似文献   

13.
Regression has always been an important tool for quantitative geneticists. The use of maximum likelihood (ML) has been advocated for the detection of quantitative trait loci (QTL) through linkage with molecular markers, and this approach can be very effective. However, linear regression models have also been proposed which perform similarly to ML, while retaining the many beneficial features of regression and, hence, can be more tractable and versatile than ML in some circumstances. Here, the use of linear regression to detect QTL in structured outbred populations is reviewed and its perceived shortfalls are revisited. It is argued that the approach is valuable now and will remain so in the future.  相似文献   

14.
We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program covSEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than whole-tree phylogenies, may be the most realistic goal for phylogenetic methods. [Reviewing Editor: Dr. Nicolas Galtier]  相似文献   

15.
Combined analysis of multiple phylogenetic data sets can reveal emergent character support that is not evident in separate analyses of individual data sets. Previous parsimony analyses have shown that this hidden support often accounts for a large percentage of the overall phylogenetic signal in cladistic studies. Here, reanalysis of a large comparative genomic data set for yeast (genus Saccharomyces) demonstrates that hidden support can be an important factor in maximum likelihood analyses of multiple data sets as well. Emergent signal in a concatenation of 106 genes was responsible for up to 64% of the likelihood support at a particular node (the difference in log likelihood scores between optimal topologies that included and excluded a supported clade). A grouping of four yeast species (S. cerevisiae, S. paradoxus, S. mikatae, and S. kudriavzevii) was robustly supported by combined analysis of all 106 genes, but separate analyses of individual genes suggested numerous conflicts. Forty-eight genes strictly contradicted S. cerevisiae + S. paradoxus + S. mikatae + S. kudriavzevii in separate analyses, but combined likelihood analyses that included up to 45 of the "wrong" data sets supported this group. Extensive hidden support also emerged in a combined likelihood analysis of 41 genes that each recovered the exact same topology in separate analyses of the individual genes. These results show that isolated analyses of individual data sets can mask congruence and distort interpretations of clade stability, even in strictly model-based phylogenetic methods. Consensus and supertree procedures that ignore hidden phylogenetic signals are, at best, incomplete.  相似文献   

16.
Over the past two decades many quantitative trait loci (QTL) have been detected; however, very few have been incorporated into breeding programs. The recent development of genome-wide association studies (GWAS) in plants provides the opportunity to detect QTL in germplasm collections such as unstructured populations from breeding programs. The overall goal of the barley Coordinated Agricultural Project was to conduct GWAS with the intent to couple QTL detection and breeding. The basic idea is that breeding programs generate a vast amount of phenotypic data and combined with cheap genotyping it should be possible to use GWAS to detect QTL that would be immediately accessible and used by breeding programs. There are several constraints to using breeding program-derived phenotype data for conducting GWAS namely: limited population size and unbalanced data sets. We chose the highly heritable trait heading date to study these two variables. We examined 766 spring barley breeding lines (panel #1) grown in balanced trials and a subset of 384 spring barley breeding lines (panel #2) grown in balanced and unbalanced trials. In panel #1, we detected three major QTL for heading date that have been detected in previous bi-parental mapping studies. Simulation studies showed that population sizes greater than 384 individuals are required to consistently detect QTL. We also showed that unbalanced data sets from panel #2 can be used to detect the three major QTL. However, unbalanced data sets resulted in an increase in the false-positive rate. Interestingly, one-step analysis performed better than two-step analysis in reducing the false-positive rate. The results of this work show that it is possible to use phenotypic data from breeding programs to detect QTL, but that careful consideration of population size and experimental design are required.  相似文献   

17.
The common assumption in quantitative trait locus (QTL) linkage mapping studies that parents of multiple connected populations are unrelated is unrealistic for many plant breeding programs. We remove this assumption and propose a Bayesian approach that clusters the alleles of the parents of the current mapping populations from locus-specific identity by descent (IBD) matrices that capture ancestral marker and pedigree information. Moreover, we demonstrate how the parental IBD data can be incorporated into a QTL linkage analysis framework by using two approaches: a Threshold IBD model (TIBD) and a Latent Ancestral Allele Model (LAAM). The TIBD and LAAM models are empirically tested via numerical simulation based on the structure of a commercial maize breeding program. The simulations included a pilot dataset with closely linked QTL on a single linkage group and 100 replicated datasets with five linkage groups harboring four unlinked QTL. The simulation results show that including parental IBD data (similarly for TIBD and LAAM) significantly improves the power and particularly accuracy of QTL mapping, e.g., position, effect size and individuals’ genotype probability without significantly increasing computational demand.  相似文献   

18.
Deconvolution enhances contrast in fluorescence microscopy images, especially in low-contrast, high-background wide-field microscope images, improving characterization of features within the sample. Deconvolution can also be combined with other imaging modalities, such as confocal microscopy, and most software programs seek to improve resolution as well as contrast. Quantitative image analyses require instrument calibration and with deconvolution, necessitate that this process itself preserves the relative quantitative relationships between fluorescence intensities. To ensure that the quantitative nature of the data remains unaltered, deconvolution algorithms need to be tested thoroughly. This study investigated whether the deconvolution algorithms in AutoQuant X3 preserve relative quantitative intensity data. InSpeck Green calibration microspheres were prepared for imaging, z-stacks were collected using a wide-field microscope, and the images were deconvolved using the iterative deconvolution algorithms with default settings. Afterwards, the mean intensities and volumes of microspheres in the original and the deconvolved images were measured. Deconvolved data sets showed higher average microsphere intensities and smaller volumes than the original wide-field data sets. In original and deconvolved data sets, intensity means showed linear relationships with the relative microsphere intensities given by the manufacturer. Importantly, upon normalization, the trend lines were found to have similar slopes. In original and deconvolved images, the volumes of the microspheres were quite uniform for all relative microsphere intensities. We were able to show that AutoQuant X3 deconvolution software data are quantitative. In general, the protocol presented can be used to calibrate any fluorescence microscope or image processing and analysis procedure.  相似文献   

19.
Lide Han  Shizhong Xu 《Genetica》2010,138(9-10):1099-1109
The identity-by-descent (IBD) based variance component analysis is an important method for mapping quantitative trait loci (QTL) in outbred populations. The interval-mapping approach and various modified versions of it may have limited use in evaluating the genetic variances of the entire genome because they require evaluation of multiple models and model selection. In this study, we developed a multiple variance component model for genome-wide evaluation using both the maximum likelihood (ML) method and the MCMC implemented Bayesian method. We placed one QTL in every few cM on the entire genome and estimated the QTL variances and positions simultaneously in a single model. Genomic regions that have no QTL usually showed no evidence of QTL while regions with large QTL always showed strong evidence of QTL. While the Bayesian method produced the optimal result, the ML method is computationally more efficient than the Bayesian method. Simulation experiments were conducted to demonstrate the efficacy of the new methods.  相似文献   

20.
Genetic correlations among phenotypic characters result when two traits are influenced by the same genes or sets of genes. By reducing the degree to which traits in two environments can evolve independently (e.g., Lande 1979; Via and Lande 1985), such correlations are likely to play a central role in both the evolution of ecological specialization and in its link to speciation. For example, negative genetic correlations between fitness traits in different environments (i.e., genetic trade-offs) are thought to influence the evolution of specialization, while positive genetic correlations between performance and characters influencing assortative mating can accelerate the evolution of reproductive isolation between ecologically specialized populations. We first discuss how the genetic architecture of a suite of traits may affect the evolutionary role of genetic correlations among them and review how the mechanisms of correlations can be analyzed using quantitative trait locus (QTL) mapping. We then consider the implications of such data for understanding the evolution of specialization and its link to speciation. We illustrate this approach with a QTL analysis of key characters in two races of pea aphids that are highly specialized on different host plants and partially reproductively isolated. Our results suggest that antagonism among QTL effects on performance in the two environments leads to a genetic trade-off in this system. We also found evidence for parallel QTL effects on host-plant acceptance and fecundity on the accepted host, which could produce assortative mating. These results suggest that the genetic architecture of traits associated with host use may have played a central role in the evolution of specialization and reproductive isolation in pea aphids.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号