共查询到20条相似文献,搜索用时 15 毫秒
1.
A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data 总被引:4,自引:2,他引:4
Phylogeny reconstruction is a difficult computational problem, because the
number of possible solutions increases with the number of included taxa.
For example, for only 14 taxa, there are more than seven trillion possible
unrooted phylogenetic trees. For this reason, phylogenetic inference
methods commonly use clustering algorithms (e.g., the neighbor-joining
method) or heuristic search strategies to minimize the amount of time spent
evaluating nonoptimal trees. Even heuristic searches can be painfully slow,
especially when computationally intensive optimality criteria such as
maximum likelihood are used. I describe here a different approach to
heuristic searching (using a genetic algorithm) that can tremendously
reduce the time required for maximum-likelihood phylogenetic inference,
especially for data sets involving large numbers of taxa. Genetic
algorithms are simulations of natural selection in which individuals are
encoded solutions to the problem of interest. Here, labeled phylogenetic
trees are the individuals, and differential reproduction is effected by
allowing the number of offspring produced by each individual to be
proportional to that individual's rank likelihood score. Natural selection
increases the average likelihood in the evolving population of phylogenetic
trees, and the genetic algorithm is allowed to proceed until the likelihood
of the best individual ceases to improve over time. An example is presented
involving rbcL sequence data for 55 taxa of green plants. The genetic
algorithm described here required only 6% of the computational effort
required by a conventional heuristic search using tree
bisection/reconnection (TBR) branch swapping to obtain the same
maximum-likelihood topology.
相似文献
2.
Comparison of Bayesian and maximum-likelihood inference of population genetic parameters 总被引:9,自引:0,他引:9
Beerli P 《Bioinformatics (Oxford, England)》2006,22(3):341-345
Comparison of the performance and accuracy of different inference methods, such as maximum likelihood (ML) and Bayesian inference, is difficult because the inference methods are implemented in different programs, often written by different authors. Both methods were implemented in the program MIGRATE, that estimates population genetic parameters, such as population sizes and migration rates, using coalescence theory. Both inference methods use the same Markov chain Monte Carlo algorithm and differ from each other in only two aspects: parameter proposal distribution and maximization of the likelihood function. Using simulated datasets, the Bayesian method generally fares better than the ML approach in accuracy and coverage, although for some values the two approaches are equal in performance. MOTIVATION: The Markov chain Monte Carlo-based ML framework can fail on sparse data and can deliver non-conservative support intervals. A Bayesian framework with appropriate prior distribution is able to remedy some of these problems. RESULTS: The program MIGRATE was extended to allow not only for ML(-) maximum likelihood estimation of population genetics parameters but also for using a Bayesian framework. Comparisons between the Bayesian approach and the ML approach are facilitated because both modes estimate the same parameters under the same population model and assumptions. 相似文献
3.
Only recently has Bayesian inference of phylogeny been proposed. The method is now a practical alternative to the other methods; indeed, the method appears to possess advantages over the other methods in terms of ability to use complex models of evolution, ease of interpretation of the results, and computational efficiency. However, the method should be used cautiously. The results of a Bayesian analysis should be examined with respect to the sensitivity of the results to the priors used and the reliability of the Markov chain Monte Carlo approximation of the probabilities of trees. 相似文献
4.
SUMMARY: BAli-Phy is a Bayesian posterior sampler that employs Markov chain Monte Carlo to explore the joint space of alignment and phylogeny given molecular sequence data. Simultaneous estimation eliminates bias toward inaccurate alignment guide-trees, employs more sophisticated substitution models during alignment and automatically utilizes information in shared insertion/deletions to help infer phylogenies. AVAILABILITY: Software is available for download at http://www.biomath.ucla.edu/msuchard/bali-phy. 相似文献
5.
Rannala B 《Systematic biology》2002,51(5):754-760
Methods for Bayesian inference of phylogeny using DNA sequences based on Markov chain Monte Carlo (MCMC) techniques allow the incorporation of arbitrarily complex models of the DNA substitution process, and other aspects of evolution. This has increased the realism of models, potentially improving the accuracy of the methods, and is largely responsible for their recent popularity. Another consequence of the increased complexity of models in Bayesian phylogenetics is that these models have, in several cases, become overparameterized. In such cases, some parameters of the model are not identifiable; different combinations of nonidentifiable parameters lead to the same likelihood, making it impossible to decide among the potential parameter values based on the data. Overparameterized models can also slow the rate of convergence of MCMC algorithms due to large negative correlations among parameters in the posterior probability distribution. Functions of parameters can sometimes be found, in overparameterized models, that are identifiable, and inferences based on these functions are legitimate. Examples are presented of overparameterized models that have been proposed in the context of several Bayesian methods for inferring the relative ages of nodes in a phylogeny when the substitution rate evolves over time. 相似文献
6.
Evaluating the performance of a successive-approximations approach to parameter optimization in maximum-likelihood phylogeny estimation 总被引:8,自引:1,他引:8
Almost all studies that estimate phylogenies from DNA sequencedata under the maximum-likelihood (ML) criterion employ an approximateapproach. Most commonly, model parameters are estimated on someinitial phylogenetic estimate derived using a rapid method (neighbor-joiningor parsimony). Parameters are then held constant during a treesearch, and ideally, the procedure is repeated until convergenceis achieved. However, the effectiveness of this approximationhas not been formally assessed, in part because doing so requirescomputationally intensive, full-optimization analyses. Here,we report both indirect and direct evaluations of the effectivenessof successive approximations. We obtained an indirect evaluationby comparing the results of replicate runs on real data thatuse random trees to provide initial parameter estimates. Forsix real data sets taken from the literature, all replicateiterative searches converged to the same joint estimates oftopology and model parameters, suggesting that the approximationis not starting-point dependent, as long as the heuristic searchesof tree space are rigorous. We conducted a more direct assessmentusing simulations in which we compared the accuracy of phylogeniesestimated using full optimization of all model parameters oneach tree evaluated to the accuracy of trees estimated via successiveapproximations. There is no significant difference between theaccuracy of the approximation searches relative to full-optimizationsearches. Our results demonstrate that successive approximationis reliable and provide reassurance that this much faster approachis safe to use for ML estimation of topology. 相似文献
7.
Sridhar S Lam F Blelloch GE Ravi R Schwartz R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(3):323-331
Reconstruction of phylogenetic trees is a fundamental problem in computational biology. While excellent heuristic methods are available for many variants of this problem, new advances in phylogeny inference will be required if we are to be able to continue to make effective use of the rapidly growing stores of variation data now being gathered. In this paper, we present two integer linear programming (ILP) formulations to find the most parsimonious phylogenetic tree from a set of binary variation data. One method uses a flow-based formulation that can produce exponential numbers of variables and constraints in the worst case. The method has, however, proven extremely efficient in practice on datasets that are well beyond the reach of the available provably efficient methods, solving several large mtDNA and Y-chromosome instances within a few seconds and giving provably optimal results in times competitive with fast heuristics than cannot guarantee optimality. An alternative formulation establishes that the problem can be solved with a polynomial-sized ILP. We further present a web server developed based on the exponential-sized ILP that performs fast maximum parsimony inferences and serves as a front end to a database of precomputed phylogenies spanning the human genome. 相似文献
8.
Gene sequences contain a gold mine of phylogenetic information. But unfortunately for taxonomists this information does not only tell the story of the species from which it was collected. Genes have their own complex histories which record speciation events, of course, but also many other events. Among them, gene duplications, transfers and losses are especially important to identify. These events are crucial to account for when reconstructing the history of species, and they play a fundamental role in the evolution of genomes, the diversification of organisms and the emergence of new cellular functions. We review reconciliations between gene and species trees, which are rigorous approaches for identifying duplications, transfers and losses that mark the evolution of a gene family. Existing reconciliation models and algorithms are reviewed and difficulties in modeling gene transfers are discussed. We also compare different reconciliation programs along with their advantages and disadvantages. 相似文献
9.
Genetic algorithms and evolution 总被引:1,自引:0,他引:1
The genetic algorithm (GA) as developed by Holland (1975, Adaptation in Natural and Artificial Systems. Ann Arbor: University of Michigan Press) is an optimization technique based on natural selection. We use a modified version of this technique to investigate which aspects of natural selection make it an efficient search procedure. Our main modification to Holland's GA is the subdividing of the population into semi-isolated demes. We consider two examples. One is a fitness landscape with many local optima. The other is a model of singing in birds that has been previously analysed using dynamic programming. Both examples have epistatic interactions. In the first example we show that the GA can find the global optimum and that its success is improved by subdividing the population. In the second example we show that GAs can evolve to the optimal policy found by dynamic programming. 相似文献
10.
Background
The rapid accumulation of whole-genome data has renewed interest in the study of using gene-order data for phylogenetic analyses and ancestral reconstruction. Current software and web servers typically do not support duplication and loss events along with rearrangements.Results
MLGO (Maximum Likelihood for Gene-Order Analysis) is a web tool for the reconstruction of phylogeny and/or ancestral genomes from gene-order data. MLGO is based on likelihood computation and shows advantages over existing methods in terms of accuracy, scalability and flexibility.Conclusions
To the best of our knowledge, it is the first web tool for analysis of large-scale genomic changes including not only rearrangements but also gene insertions, deletions and duplications. The web tool is available from http://www.geneorder.org/server.php. 相似文献11.
12.
We used simulated data to investigate a number of properties of maximum-
likelihood (ML) phylogenetic tree estimation for the case of four taxa.
Simulated data were generated under a broad range of conditions, including
wide variation in branch lengths, differences in the ratio of transition
and transversion substitutions, and the absence of presence of
gamma-distributed site-to-site rate variation. Data were analyzed in the ML
framework with two different substitution models, and we compared the
ability of the two models to reconstruct the correct topology. Although
both models were inconsistent for some branch-length combinations in the
presence of site-to-site variation, the models were efficient predictors of
topology under most simulation conditions. We also examined the performance
of the likelihood ratio (LR) test for significant positive interior branch
length. This test was found to be misleading under many simulation
conditions, rejecting too often under some simulation conditions. Under the
null hypothesis of zero length internal branch, LR statistics are assumed
to be asymptotically distributed chi 2(1); with limited data, the
distribution of LR statistics under the null hypothesis varies from chi
2(1).
相似文献
13.
Perception is about making sense, that is, understanding what events in the outside world caused the sensory observations. Consistent with this intuition, many aspects of human behavior confronting noise and ambiguity are well explained by principles of causal inference. Extending these insights, recent studies have applied the same powerful set of tools to perceptual processing at the neural level. According to these approaches, microscopic neural structures solve elementary probabilistic tasks and can be combined to construct hierarchical predictive models of the sensory input. This framework suggests that variability in neural responses reflects the inherent uncertainty associated with sensory interpretations and that sensory neurons are active predictors rather than passive filters of their inputs. Causal inference can account parsimoniously and quantitatively for non-linear dynamical properties in single synapses, single neurons and sensory receptive fields. 相似文献
14.
Xiao-Guang Yang 《Biologia》2009,64(4):811-818
The phylogeny of Cetacea (whales, dolphins, porpoises) has long attracted the interests of biologists and has been investigated
by many researchers based on different datasets. However, some phylogenetic relationships within Cetacea still remain controversial.
In this study, Bayesian analyses were performed to infer the phylogeny of 25 representative species within Cetacea based on
their mitochondrial genomes for the first time. The analyses recovered the clades resolved by the previous studies and strongly
supported most of the current cetacean classifications, such as the monophyly of Odontoceti (toothed whales) and Mysticeti
(baleen whales). The analyses provided a reliable and comprehensive phylogeny of Cetacea which can provide a foundation for
further exploration of cetacean ecology, conservation and biology. The results also showed that: (i) the mitochondrial genomes
were very informative for inferring phylogeny of Cetacea; and (ii) the Bayesian analyses outperformed other phylogenetic methods
on inferring mitochondrial genome-based phylogeny of Cetacea. 相似文献
15.
Adipokinetic neuropeptides from the corpora cardiaca of 17 species of Odonata encompassing mainly the families Corduliidae and Libellulidae were isolated and structurally elucidated using liquid chromatography coupled with ion trap electrospray ionization mass spectrometry. It became evident that all species of the family Corduliidae studied express the peptide code-named Libau-AKH (pGlu-Val-Asn-Phe-Thr-Pro-Ser-Trp amide), which is also present in all but one libellulid species, Erythemis simplicicollis which expresses Erysi-AKH (pGlu-Leu-Asn-Phe-Thr-Pro-Ser-Trp amide). This divergence from all other Libellulids is due to a nonsynonymous missense single nucleotide polymorphism (SNP) in the nucleotide coding sequence (CDS) of prepro-AKH CDS and supports the polyphyletic nature of Sympetrinae and other subfamilies of libellulids. Despite this exception, these findings then support the hypothesis that Corduliidae and Libellulidae are closely related as stated in most phylogenies. The presence of Anaim-AKH (pGlu-Val-Asn-Phe-Ser-Pro-Ser-Trp amide) in Macromiidae likely distinguishes species in this family from Corduliidae. Current molecular genetic phylogenies and our AKH findings suggest that Syncordulia gracilis, which expresses Anaim-AKH, does not belong in Corduliidae. Evolution of AKHs in anisopteran Odonata are likely due to nucleotide substitution involving nonsynonymous missense SNPs in the CDS of prepro-AKH. 相似文献
16.
Summary A maximum likelihood method for inferring protein phylogeny was developed. It is based on a Markov model that takes into account the unequal transition probabilities among pairs of amino acids and does not assume constancy of rate among different lineages. Therefore, this method is expected to be powerful in inferring phylogeny among distantly related proteins, either orthologous or parallogous, where the evolutionary rate may deviate from constancy. Not only amino acid substitutions but also insertion/deletion events during evolution were incorporated into the Markov model. A simple method for estimating a bootstrap probability for the maximum likelihood tree among alternatives without performing a maximum likelihood estimation for each resampled data set was developed. These methods were applied to amino acid sequence data of a photosynthetic membrane protein,psbA, from photosystem II, and the phylogeny of this protein was discussed in relation to the origin of chloroplasts. 相似文献
17.
18.
Background
While in principle a seemingly infinite variety of combinations of mutations could result in tumor development, in practice it appears that most human cancers fall into a relatively small number of "sub-types," each characterized a roughly equivalent sequence of mutations by which it progresses in different patients. There is currently great interest in identifying the common sub-types and applying them to the development of diagnostics or therapeutics. Phylogenetic methods have shown great promise for inferring common patterns of tumor progression, but suffer from limits of the technologies available for assaying differences between and within tumors. One approach to tumor phylogenetics uses differences between single cells within tumors, gaining valuable information about intra-tumor heterogeneity but allowing only a few markers per cell. An alternative approach uses tissue-wide measures of whole tumors to provide a detailed picture of averaged tumor state but at the cost of losing information about intra-tumor heterogeneity. 相似文献19.
20.