首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The generalized Gibbs sampler (GGS) is a recently developed Markov chain Monte Carlo (MCMC) technique that enables Gibbs-like sampling of state spaces that lack a convenient representation in terms of a fixed coordinate system. This paper describes a new sampler, called the tree sampler, which uses the GGS to sample from a state space consisting of phylogenetic trees. The tree sampler is useful for a wide range of phylogenetic applications, including Bayesian, maximum likelihood, and maximum parsimony methods. A fast new algorithm to search for a maximum parsimony phylogeny is presented, using the tree sampler in the context of simulated annealing. The mathematics underlying the algorithm is explained and its time complexity is analyzed. The method is tested on two large data sets consisting of 123 sequences and 500 sequences, respectively. The new algorithm is shown to compare very favorably in terms of speed and accuracy to the program DNAPARS from the PHYLIP package.  相似文献   

2.
We tested whether it is beneficial for the accuracy of phylogenetic inference to sample characters that are evolving under different sets of parameters, using both Bayesian MCMC (Markov chain Monte Carlo) and parsimony approaches. We examined differential rates of evolution among characters, differential character-state frequencies and character-state space, and differential relative branch lengths among characters. We also compared the relative performance of parsimony and Bayesian analyses by progressively incorporating more of these heterogeneous parameters and progressively increasing the severity of this heterogeneity. Bayesian analyses performed better than parsimony when heterogeneous simulation parameters were incorporated into the substitution model. However, parsimony outperformed Bayesian MCMC when heterogeneous simulation parameters were not incorporated into the Bayesian substitution model. The higher the rate of evolution simulated, the better parsimony performed relative to Bayesian analyses. Bayesian and parsimony analyses converged in their performance as the number of simulated heterogeneous model parameters increased. Up to a point, rate heterogeneity among sites was generally advantageous for phylogenetic inference using both approaches. In contrast, branch-length heterogeneity was generally disadvantageous for phylogenetic inference using both parsimony and Bayesian approaches. Parsimony was found to be more conservative than Bayesian analyses, in that it resolved fewer incorrect clades.
© The Willi Hennig Society 2006.  相似文献   

3.
In a recent paper, I proposed that natural selection should act to increase offspring number when diversification bet hedging is favoured. The simple underlying reasoning is that a target diversification strategy is more reliably generated with increasing sample size. The intention of opening a discussion has been realized; recent criticisms of the idea argue that selection does not act to increase offspring number when population size is large or infinite. Here I agree that criticisms have merit; indeed they are largely confined to the caveats discussed in my original paper. The critique, however, implies a verdict of outright rejection of the idea of selection on offspring number, which would be erroneous. Contrary to the assertions of the criticism, then, the importance of selection acting directly on offspring number remains an open question.  相似文献   

4.
One of the lasting controversies in phylogenetic inference is the degree to which specific evolutionary models should influence the choice of methods. Model‐based approaches to phylogenetic inference (likelihood, Bayesian) are defended on the premise that without explicit statistical models there is no science, and parsimony is defended on the grounds that it provides the best rationalization of the data, while refraining from assigning specific probabilities to trees or character‐state reconstructions. Authors who favour model‐based approaches often focus on the statistical properties of the methods and models themselves, but this is of only limited use in deciding the best method for phylogenetic inference—such decision also requires considering the conditions of evolution that prevail in nature. Another approach is to compare the performance of parsimony and model‐based methods in simulations, which traditionally have been used to defend the use of models of evolution for DNA sequences. Some recent papers, however, have promoted the use of model‐based approaches to phylogenetic inference for discrete morphological data as well. These papers simulated data under models already known to be unfavourable to parsimony, and modelled morphological evolution as if it evolved just like DNA, with probabilities of change for all characters changing in concert along tree branches. The present paper discusses these issues, showing that under reasonable and less restrictive models of evolution for discrete characters, equally weighted parsimony performs as well or better than model‐based methods, and that parsimony under implied weights clearly outperforms all other methods.  相似文献   

5.
MOTIVATION: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrapment in local optima. Metropolis coupled MCMC [(MC)(3)], a variant of MCMC, allows multiple peaks in the landscape of trees to be more readily explored, but at the cost of increased execution time. RESULTS: This paper presents a parallel algorithm for (MC)(3). The proposed parallel algorithm retains the ability to explore multiple peaks in the posterior distribution of trees while maintaining a fast execution time. The algorithm has been implemented using two popular parallel programming models: message passing and shared memory. Performance results indicate nearly linear speed improvement in both programming models for small and large data sets.  相似文献   

6.
The application of mixed nucleotide/doublet substitution models has recently received attention in RNA‐based phylogenetics. Within a Bayesian approach, it was shown that mixed models outperformed analyses relying on simple nucleotide models. We analysed an mt RNA data set of dragonflies representing all major lineages of Anisoptera plus outgroups, using a mixed model in a Bayesian and parsimony (MP) approach. We used a published mt 16S rRNA secondary consensus structure model and inferred consensus models for the mt 12S rRNA and tRNA valine. Secondary structure information was used to set data partitions for paired and unpaired sites on which doublet or nucleotide models were applied, respectively. Several different doublet models are currently available of which we chose the most appropriate one by a Bayes factor test. The MP reconstructions relied on recoded data for paired sites in order to account for character covariance and an application of the ratchet strategy to find most parsimonious trees. Bayesian and parsimony reconstructions are partly differently resolved, indicating sensitivity of the reconstructions to model specification. Our analyses depict a tree in which the damselfly family Lestidae is sister group to a monophyletic clade Epiophlebia + Anisoptera, contradicting recent morphological and molecular work. In Bayesian analyses, we found a deep split between Libelluloidea and a clade ‘Aeshnoidea’ within Anisoptera largely congruent with Tillyard’s early ideas of anisopteran evolution, which had been based on evidently plesiomorphic character states. However, parsimony analysis did not support a clade ‘Aeshnoidea’, but instead, placed Gomphidae as sister taxon to Libelluloidea. Monophyly of Libelluloidea is only modestly supported, and many inter‐family relationships within Libelluloidea do not receive substantial support in Bayesian and parsimony analyses. We checked whether high Bayesian node support was inflated owing to either: (i) wrong secondary consensus structures; (ii) under‐sampling of the MCMC process, thereby missing other local maxima; or (iii) unrealistic prior assumptions on topologies or branch lengths. We found that different consensus structure models exert strong influence on the reconstruction, which demonstrates the importance of taxon‐specific realistic secondary structure models in RNA phylogenetics.  相似文献   

7.
Null community is a spatio‐temporal abstraction of an initial regional species pool from which local species pools and actual community assemblages are organized. Any process that causes joint responses of species with similar susceptibilities affects community assembly. Through time, sequential assembly processes change the composition of a species pool in a way analogous to the one in which evolutionary processes promote character changes from an ancestor to current species. The segregation of species occurrences in an actual community suggests that assembly processes non‐randomly structured the observed community assemblages. However, going backwards to imply the causes of a particular arrangement of species is a non‐trivial challenge. I merge these premises with the philosophical and methodological foundations of cladistics. I propound parsimony analysis of species co‐occurrences as an outstanding means of devising operational hypotheses about the assembly of any non‐randomly structured set of actual community assemblages related to a common species pool. To explore this approach, I used field data gathered in a suite of 10 wetland assemblages. First, I tested independence of 101 plant species occurrences by a null model. As significant non‐random species co‐occurrence was detected, I applied a parsimony analysis taking the species occurrences as attributes, the assemblages as terminal units, and a putative null community constituted by all the present local species as the root of the assembly suite. The analysis produced four most parsimonious trees of assembly relationships. These trees maximize the number of similarities among community assemblages that can be explained by the sole fact of sharing a common regional species pool. One most parsimonious spatio‐temporal arrangement of species occurrence changes was reconstructed on one of the trees. I interpret this reconstruction in terms of assembly events, species exclusions and recruitments, showing the potentialities of this analysis to formulate operational hypotheses about community organization.  相似文献   

8.
It is a tremendous honor for my group and me to receive the recognition of the 2014 Women in Cell Biology Junior Award. I would like to take the opportunity of this essay to describe my scientific journey, discuss my philosophy about running a group, and propose what I think is a generalizable model to efficiently establish an academic laboratory. This essay is about my view on the critical components that go into establishing a highly functional academic laboratory during the current tough, competitive times.  相似文献   

9.
Comparisons are made of the accuracy of the restricted maximum-likelihood, Wagner parsimony, and UPGMA (unweighted pair-group method using arithmetic averages) clustering methods to estimate phylogenetic trees. Data matrices were generated by constructing simulated stochastic evolution in a multidimensional gene-frequency space using a simple genetic-drift model (Brownian-motion, random-walk) with constant rates of divergence in all lineages. Ten differentphylogenetic tree topologies of 20 operational taxonomic units (OTU's), representing a range of tree shapes, were used. Felsenstein's restricted maximum-likelihood method, Wagner parsimony, and UPGMA clustering were used to construct trees from the resulting data matrices. The computations for the restricted maximum-likelihood method were performed on a Cray-1 supercomputer since the required calculations (especially when optimized for the vector hardware) are performed substantially faster than on more conventional computing systems. The overall level of accuracy of tree reconstruction depends on the topology of the true phylogenetic tree. The UPGMA clustering method, especially when genetic-distance coefficients are used, gives the most accurate estimates of the true phylogeny (for our model with constant evolutionary rates). For large numbers of loci, all methods give similar results, but trends in the results imply that the restricted maximum-likelihood method would produce the most accurate trees if sample sizes were large enough.  相似文献   

10.
Mardulyn P 《Molecular ecology》2012,21(14):3385-3390
Phylogenetic trees and networks are both used in the scientific literature to display DNA sequence variation at the intraspecific level. Should we rather use trees or networks? I argue that the process of inferring the most parsimonious genealogical relationships among a set of DNA sequences should be dissociated from the problem of displaying this information in a graph. A network graph is probably more appropriate than a strict consensus tree if many alternative, equally most parsimonious, genealogies are to be included. Within the maximum parsimony framework, current phylogenetic inference and network‐building algorithms are both unable to guarantee the finding of all most parsimonious (MP) connections. In fact, each approach can find MP connections that the other does not. Although it should be possible to improve at least the maximum parsimony approach, current implementations of these algorithms are such that it is advisable to use both approaches to increase the probability of finding all possible MP connections among a set of DNA sequences.  相似文献   

11.
DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. Availability: DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree  相似文献   

12.
MOTIVATION: Comparative genomics in general and orthology analysis in particular are becoming increasingly important parts of gene function prediction. Previously, orthology analysis and reconciliation has been performed only with respect to the parsimony model. This discards many plausible solutions and sometimes precludes finding the correct one. In many other areas in bioinformatics probabilistic models have proven to be both more realistic and powerful than parsimony models. For instance, they allow for assessing solution reliability and consideration of alternative solutions in a uniform way. There is also an added benefit in making model assumptions explicit and therefore making model comparisons possible. For orthology analysis, uncertainty has recently been addressed using parsimonious reconciliation combined with bootstrap techniques. However, until now no probabilistic methods have been available. RESULTS: We introduce a probabilistic gene evolution model based on a birth-death process in which a gene tree evolves 'inside' a species tree. Based on this model, we develop a tool with the capacity to perform practical orthology analysis, based on Fitch's original definition, and more generally for reconciling pairs of gene and species trees. Our gene evolution model is biologically sound (Nei et al., 1997) and intuitively attractive. We develop a Bayesian analysis based on MCMC which facilitates approximation of an a posteriori distribution for reconciliations. That is, we can find the most probable reconciliations and estimate the probability of any reconciliation, given the observed gene tree. This also gives a way to estimate the probability that a pair of genes are orthologs. The main algorithmic contribution presented here consists of an algorithm for computing the likelihood of a given reconciliation. To the best of our knowledge, this is the first successful introduction of this type of probabilistic methods, which flourish in phylogeny analysis, into reconciliation and orthology analysis. The MCMC algorithm has been implemented and, although not yet being in its final form, tests show that it performs very well on synthetic as well as biological data. Using standard correspondences, our results carry over to allele trees as well as biogeography.  相似文献   

13.
Tree search and its more complicated variant, tree search and simultaneous multiple DNA sequence alignment, are difficult NP-complete optimization problems, which require the application of advanced computational techniques, if large data sets are to be solved within reasonable computation times. Traditionally tree search has been attacked with a search strategy that is best described as multistart hill-climbing; local search by branch swapping has been performed on several different starting trees. Recently a different tree search strategy was tested in the Parsigal parsimony program, which used a combination of evolutionary optimization and local search. Evolutionary optimization algorithms use principles adopted from biological evolution to solve technical optimization tasks. Evolutionary optimization is a stochastic global search method, which means that the method is able to escape local optima, and is in principle able to produce any solution in the search space (although this may take a long time). Local search techniques, such as branch swapping, employ a completely different search strategy; they exploit local information maximally in order to achieve quick improvement in the value of the objective function. However, local search algorithms lack the ability to escape from local optima, which is a fundamental requirement for any search algorithm that aims to be able to discover the global optimum of a multimodal optimization problem. Hence it seems that an optimization strategy combining the good properties of both evolutionary algorithms and local search would be ideal. In this study, aspects of global optimization and local search are discussed, and the method of simulated evolutionary optimization is reviewed in detail. The application of simulated evolutionary optimization to tree search in Parsigal is then reviewed briefly.  相似文献   

14.
In intraspecific studies, reticulated graphs are valuable tools for visualization, within a single figure, of alternative genealogical pathways among haplotypes. As available software packages implementing the global maximum parsimony (MP) approach only give the possibility to merge resulting topologies into less-resolved consensus trees, MP has often been neglected as an alternative approach to purely algorithmic (i.e., methods defined solely on the basis of an algorithm) "network" construction methods. Here, we propose to search tree space using the MP criterion and present a new algorithm for uniting all equally most parsimonious trees into a single (possibly reticulated) graph. Using simulated sequence data, we compare our method with three purely algorithmic and widely used graph construction approaches (minimum-spanning network, statistical parsimony, and median-joining network). We demonstrate that the combination of MP trees into a single graph provides a good estimate of the true genealogy. Moreover, our analyses indicate that, when internal node haplotypes are not sampled, the median-joining and MP methods provide the best estimate of the true genealogy whereas the minimum-spanning algorithm shows very poor performances.  相似文献   

15.
The purpose of Reflections articles, it seems, is to give elderly scientists a chance to write about the "good old days," when everyone walked to school in the snow. They enjoy this activity so much that your editor, Martha Fedor, must have known that I would accept her invitation to write such an article, no matter how much I demurred at first. As everyone knows, flattery will get you everywhere. It may comfort the apprehensive reader to learn that there is not going to be much walking to school in the snow in this story. On the contrary, rather than thinking how hard I had it during my scientific career, I find it inconceivable that anyone could have had a smoother ride. At the time I began my career, science was an expanding enterprise in the United States that welcomed the young. Only in such an opportunity-rich environment would someone like me have stood a chance. The contrast between that world and the dog-eat-dog world young scientists confront today is stark.  相似文献   

16.

Background

Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past.

Results

In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores.

Conclusion

The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are common to all the branching patterns introduced by the reticulate vertices. Thus the score contains an in-built cost for the number of reticulate vertices in the network, and would provide a criterion that is comparable among all networks. Although the problem of finding the parsimony score on the network is believed to be computationally hard to solve, heuristics such as the ones described here would be beneficial in our efforts to find a most parsimonious network.  相似文献   

17.
I would like to express my thanks to all those who have helped me in the preparation of this article, in particular my mother, Mrs K. Brett, for supplying information about the Kirby family; my cousin, Johanna Meyer, for information about the Kappel family; my son, Geoffrey Dommett, for his computer expertise; and Gina Douglas, librarian, for making available to me the archives of the Linnean Society.  相似文献   

18.
19.
I am not big on celebrations, nor do I accept many invitations to receive awards. There is much work to be done, and the reward is in the doing. I learned this lesson early from my parents, Martha and Robert Guyden. However, I am humbled that anyone would even mention my name in association with E. E. Just. I, like he, was born into a segregated America, and somehow we both found biology. I think Just's life story instigates a discussion on diversity in science, as well it should. However, after reading Tyrone Hayes' (2010 E. E. Just Award recipient) essay from last year, "Diversifying the Biological Sciences: Past Efforts and Future Challenges" (Hayes, 2010), I have little to add on the subject. His words gave voice to my thoughts. That being said, I would like to use these pages to describe my journey into the "Cell" and the people who "hoed the row ahead of me."  相似文献   

20.
Markov chain Monte Carlo (MCMC) methods have been proposed to overcome computational problems in linkage and segregation analyses. This approach involves sampling genotypes at the marker and trait loci. Among MCMC methods, scalar-Gibbs is the easiest to implement, and it is used in genetics. However, the Markov chain that corresponds to scalar-Gibbs may not be irreducible when the marker locus has more than two alleles, and even when the chain is irreducible, mixing has been observed to be slow. Joint sampling of genotypes has been proposed as a strategy to overcome these problems. An algorithm that combines the Elston-Stewart algorithm and iterative peeling (ESIP sampler) to sample genotypes jointly from the entire pedigree is used in this study. Here, it is shown that the ESIP sampler yields an irreducible Markov chain, regardless of the number of alleles at a locus. Further, results obtained by ESIP sampler are compared with other methods in the literature. Of the methods that are guaranteed to be irreducible, ESIP was the most efficient.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号