首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Tree search and its more complicated variant, tree search and simultaneous multiple DNA sequence alignment, are difficult NP-complete optimization problems, which require the application of advanced computational techniques, if large data sets are to be solved within reasonable computation times. Traditionally tree search has been attacked with a search strategy that is best described as multistart hill-climbing; local search by branch swapping has been performed on several different starting trees. Recently a different tree search strategy was tested in the Parsigal parsimony program, which used a combination of evolutionary optimization and local search. Evolutionary optimization algorithms use principles adopted from biological evolution to solve technical optimization tasks. Evolutionary optimization is a stochastic global search method, which means that the method is able to escape local optima, and is in principle able to produce any solution in the search space (although this may take a long time). Local search techniques, such as branch swapping, employ a completely different search strategy; they exploit local information maximally in order to achieve quick improvement in the value of the objective function. However, local search algorithms lack the ability to escape from local optima, which is a fundamental requirement for any search algorithm that aims to be able to discover the global optimum of a multimodal optimization problem. Hence it seems that an optimization strategy combining the good properties of both evolutionary algorithms and local search would be ideal. In this study, aspects of global optimization and local search are discussed, and the method of simulated evolutionary optimization is reviewed in detail. The application of simulated evolutionary optimization to tree search in Parsigal is then reviewed briefly.  相似文献   

2.
The generalized Gibbs sampler (GGS) is a recently developed Markov chain Monte Carlo (MCMC) technique that enables Gibbs-like sampling of state spaces that lack a convenient representation in terms of a fixed coordinate system. This paper describes a new sampler, called the tree sampler, which uses the GGS to sample from a state space consisting of phylogenetic trees. The tree sampler is useful for a wide range of phylogenetic applications, including Bayesian, maximum likelihood, and maximum parsimony methods. A fast new algorithm to search for a maximum parsimony phylogeny is presented, using the tree sampler in the context of simulated annealing. The mathematics underlying the algorithm is explained and its time complexity is analyzed. The method is tested on two large data sets consisting of 123 sequences and 500 sequences, respectively. The new algorithm is shown to compare very favorably in terms of speed and accuracy to the program DNAPARS from the PHYLIP package.  相似文献   

3.
The maximum likelihood (ML) method of phylogenetic tree construction is not as widely used as other tree construction methods (e.g., parsimony, neighbor-joining) because of the prohibitive amount of time required to find the ML tree when the number of sequences under consideration is large. To overcome this difficulty, we propose a stochastic search strategy for estimation of the ML tree that is based on a simulated annealing algorithm. The algorithm works by moving through tree space by way of a "local rearrangement" strategy so that topologies that improve the likelihood are always accepted, whereas those that decrease the likelihood are accepted with a probability that is related to the proportionate decrease in likelihood. Besides greatly reducing the time required to estimate the ML tree, the stochastic search strategy is less likely to become trapped in local optima than are existing algorithms for ML tree estimation. We demonstrate the success of the modified simulated annealing algorithm by comparing it with two existing algorithms (Swofford's PAUP* and Felsenstein's DNAMLK) for several theoretical and real data examples.  相似文献   

4.
MOTIVATION: Maximum likelihood (ML) methods have become very popular for constructing phylogenetic trees from sequence data. However, despite noticeable recent progress, with large and difficult datasets (e.g. multiple genes with conflicting signals) current ML programs still require huge computing time and can become trapped in bad local optima of the likelihood function. When this occurs, the resulting trees may still show some of the defects (e.g. long branch attraction) of starting trees obtained using fast distance or parsimony programs. METHODS: Subtree pruning and regrafting (SPR) topological rearrangements are usually sufficient to intensively search the tree space. Here, we propose two new methods to make SPR moves more efficient. The first method uses a fast distance-based approach to detect the least promising candidate SPR moves, which are then simply discarded. The second method locally estimates the change in likelihood for any remaining potential SPRs, as opposed to globally evaluating the entire tree for each possible move. These two methods are implemented in a new algorithm with a sophisticated filtering strategy, which efficiently selects potential SPRs and concentrates most of the likelihood computation on the promising moves. RESULTS: Experiments with real datasets comprising 35-250 taxa show that, while indeed greatly reducing the amount of computation, our approach provides likelihood values at least as good as those of the best-known ML methods so far and is very robust to poor starting trees. Furthermore, combining our new SPR algorithm with local moves such as PHYML's nearest neighbor interchanges, the time needed to find good solutions can sometimes be reduced even more.  相似文献   

5.
This study describes novel algorithms for searching for most parsimonious trees. These algorithms are implemented as a parsimony computer program, PARSIGAL, which performs well even with difficult data sets. For high level search, PARSIGAL uses an evolutionary optimization algorithm, which feeds good tree candidates to a branch-swapping local search procedure. This study also describes an extremely fast method of recomputing state sets for binary characters (additive or nonadditive characters with two states), based on packing 32 characters into a single memory word and recomputing the tree simultaneously for all 32 characters using fast bitwise logical operations. The operational principles of PARSIGAL are quite different from those previously published for other parsimony computer programs. Hence it is conceivable that PARSIGAL may be able to locate islands of trees that are different from those that are easily located with existing parsimony computer programs.  相似文献   

6.
Kang S  Tang J  Schaeffer SW  Bader DA 《PloS one》2011,6(8):e22483
Maximum parsimony (MP) methods aim to reconstruct the phylogeny of extant species by finding the most parsimonious evolutionary scenario using the species' genome data. MP methods are considered to be accurate, but they are also computationally expensive especially for a large number of species. Several disk-covering methods (DCMs), which decompose the input species to multiple overlapping subgroups (or disks), have been proposed to solve the problem in a divide-and-conquer way. We design a new DCM based on the spectral method and also develop the COGNAC (Comparing Orders of Genes using Novel Algorithms and high-performance Computers) software package. COGNAC uses the new DCM to reduce the phylogenetic tree search space and selects an output tree from the reduced search space based on the MP principle. We test the new DCM using gene order data and inversion distance. The new DCM not only reduces the number of candidate tree topologies but also excludes erroneous tree topologies which can be selected by original MP methods. Initial labeling of internal genomes affects the accuracy of MP methods using gene order data, and the new DCM enables more accurate initial labeling as well. COGNAC demonstrates superior accuracy as a consequence. We compare COGNAC with FastME and the combination of the state of the art DCM (Rec-I-DCM3) and GRAPPA. COGNAC clearly outperforms FastME in accuracy. COGNAC--using the new DCM--also reconstructs a much more accurate tree in significantly shorter time than GRAPPA with Rec-I-DCM3.  相似文献   

7.
Abstract— The effectiveness and efficiency of J. S. Farris' new microcomputer parsimony program (Hennig86, version 1.5) are evaluated with reference to 60 data sets, including those used to benchmark earlier mainframe and microcomputer packages. By overcoming the arbitrary resolution and consequent redundancy problems that have plagued previously available microcomputer programs, as well as their limitations on data set size, cladogram storage space, and execution speed, Hennig86 advances enormously the accuracy and ease with which cladistic analyses can be conducted. Hennig86 has such an impressive edge in both effectiveness and efficiency that earlier parsimony programs (including those by Farris) have essentially been rendered obsolete. For exact analyses, both exhaustive and minimal options arc provided; of the options available for approximate analyses, the branch breaker (bb) used in conjunction with the mhennig* and tread commands performed best.  相似文献   

8.
The accuracy of phylogenetic methods is reinvestigated for the four-taxon case with a two-edge rate and a three-edge rate. Unlike previous studies involving computer simulations, the two-edge rate relates to branches that are sister taxa in the model tree. As with previous studies, certain methods are found to behave inaccurately in a portion of the parameter space where the two-edge rate is proportionally large. This phenomenon, to which parsimony is immune, is termed “long-branch repulsion” and the region of poor performance is called the Farris Zone. Maximum likelihood methods are shown to be particularly prone to failure when closely related taxa have long branches. Long-branch repulsion is demonstrated with an empirical case involving Strepsiptera and Diptera.  相似文献   

9.
Within phylogenetics, two methods are known to implement cladistics: parsimony or maximum parsimony (MP) and three-item analysis (3ia). Despite the lack of suitable software, 3ia is occasionally used in systematic, and more regularly, in historical biogeography. Here, we present LisBeth, the first and only phylogenetic/biogeographic program freely available that uses the 3ia approach and offer some insights into its theoretical propositions. LisBeth does not rely on the conventional taxon/character matrix. Instead, characters are represented as rooted trees. LisBeth performs 3ia analyses based on maximum congruence of three-item statements and calculates the intersection tree (which differs from usual consensus). In biogeography, it applies the transparent method to handle widespread taxa and implements paralogy-free subtree analysis to remove redundant distributions. For the sake of interoperability, LisBeth may import/export characters from/to matrix in NEXUS format, allowing comparison with other cladistic programs. LisBeth also imports phylogenetic characters from Xper2 knowledge bases.  相似文献   

10.
A phylogenetic method is a consistent estimator of phylogeny if and only if it is guaranteed to give the correct tree, given that sufficient (possibly infinite) independent data are examined. The following methods are examined for consistency: UPGMA (unweighted pair-group method, averages), NJ (neighbor joining), MF (modified Farris), and P (parsimony). A two-parameter model of nucleotide sequence substitution is used, and the expected distribution of character states is calculated. Without perfect correction for superimposed substitutions, all four methods may be inconsistent if there is but one branch evolving at a faster rate than the other branches. Partial correction of observed distances improves the robustness of the NJ method to rate variation, and perfect correction makes the NJ method a consistent estimator for all combinations of rates that were examined. The sensitivity of all the methods to unequal rates varies over a wide range, so relative-rate tests are unlikely to be a reliable guide for accepting or rejecting phylogenies based on parsimony analysis.  相似文献   

11.
12.
A naturalistic account of the strengths and limitations of cladistic practice is offered. The success of cladistics is claimed to be largely rooted in the parsimony-implementing congruence test. Cladists may use the congruence test to iteratively refine assessments of homology, and thereby increase the odds of reliable phylogenetic inference under parsimony. This explanation challenges alternative views which tend to ignore the effects of parsimony on the process of character individuation in systematics. In a related theme, the concept of homeostatic property cluster natural kinds is used to explain why cladistics is well suited to provide a traditional, verbal reference system for the evolutionary properties of species and clades. The advantages of more explicitly probabilistic approaches to phylogenetic inference appear to manifest themselves in situations where evolutionary homeostasis has for the most part broken down, and predictive classifications are no longer possible.  相似文献   

13.
Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima   总被引:19,自引:3,他引:16  
New methods for parsimony analysis of large data sets are presented. The new methods are sectorial searches, tree-drifting, and tree-fusing. For Chase et al. 's 500-taxon data set these methods (on a 266-MHz Pentium II) find a shortest tree in less than 10 min (i.e., over 15,000 times faster than PAUP and 1000 times faster than PAUP*). Making a complete parsimony analysis requires hitting minimum length several times independently, but not necessarily all "islands" for Chase et al. 's data set, this can be done in 4 to 6 h. The new methods also perform well in other cases analyzed (which range from 170 to 854 taxa).  相似文献   

14.
DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. Availability: DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree  相似文献   

15.
Martin FN  Tooley PW 《Mycologia》2003,95(2):269-284
The phylogenetic relationships of 51 isolates representing 27 species of Phytophthora were assessed by sequence alignment of 568 bp of the mitochondrially encoded cytochrome oxidase II gene. A total of 1299 bp of the cytochrome oxidase I gene also were examined for a subset of 13 species. The cox II gene trees constructed by a heuristic search, based on maximum parsimony for a bootstrap 50% majority-rule consensus tree, revealed 18 species grouping into seven clades and nine species unaffiliated with a specific clade. The phylogenetic relationships among species observed on cox II gene trees did not exhibit consistent similarities in groupings for morphology, pathogenicity, host range or temperature optima. The topology of cox I gene trees, constructed by a heuristic search based on maximum parsimony for a bootstrap 50% majority-rule consensus tree for 13 species of Phytophthora, revealed 10 species grouping into three clades and three species unaffiliated with a specific clade. The groupings in general agreed with what was observed in the cox II tree. Species relationships observed for the cox II gene tree were in agreement with those based on ITS regions, with several notable exceptions. Some of these differences were noted in species in which the same isolates were used for both ITS and cox II analysis, suggesting either a differential rate of evolutionary divergence for these two regions or incorrect assumptions about alignment of ITS sequences. Analysis of combined data sets of ITS and cox II sequences generated a tree that did not differ substantially from analysis of ITS data alone, however, the results of a partition homogeneity test suggest that combining data sets may not be valid.  相似文献   

16.
Accuracy of estimated phylogenetic trees from molecular data   总被引:2,自引:0,他引:2  
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained.  相似文献   

17.

Background

The abundance of new genomic data provides the opportunity to map the location of gene duplication and loss events on a species phylogeny. The first methods for mapping gene duplications and losses were based on a parsimony criterion, finding the mapping that minimizes the number of duplication and loss events. Probabilistic modeling of gene duplication and loss is relatively new and has largely focused on birth-death processes.

Results

We introduce a new maximum likelihood model that estimates the speciation and gene duplication and loss events in a gene tree within a species tree with branch lengths. We also provide an, in practice, efficient algorithm that computes optimal evolutionary scenarios for this model. We implemented the algorithm in the program DrML and verified its performance with empirical and simulated data.

Conclusions

In test data sets, DrML finds optimal gene duplication and loss scenarios within minutes, even when the gene trees contain sequences from several hundred species. In many cases, these optimal scenarios differ from the lca-mapping that results from a parsimony gene tree reconciliation. Thus, DrML provides a new, practical statistical framework on which to study gene duplication.
  相似文献   

18.
In response to comments by J. S. Farris (2000, Cladistics 16, 403–410) on the strongest evidence (SE) approach to phylogenetic analysis, I examine the concepts on which it is founded and reevaluate its merits. SE's null model of signal absence in characters is not treated as background knowledge, but as a reference point for evaluating a data set's phylogenetic signal in a tree-specific manner. In simulation tests, the SE methods perform reasonably well; although parsimony is generally more accurate and less biased than SE, SE is distinctly more accurate in some circumstances. Simulations further indicate that jackknifing is often beneficial in both SE and parsimony analyses. Iterative fixation of splits shows promise as an auxiliary procedure for SE and other methods that weight according to apparent homoplasy.  相似文献   

19.
The S-system model is one of the nonlinear differential equation models of gene regulatory networks, and it can describe various dynamics of the relationships among genes. If we successfully infer rigorous S-system model parameters that describe a target gene regulatory network, we can simulate gene expressions mathematically. However, the problem of finding an optimal S-system model parameter is too complex to be solved analytically. Thus, some heuristic search methods that offer approximate solutions are needed for reducing the computational time. In previous studies, several heuristic search methods such as Genetic Algorithms (GAs) have been applied to the parameter search of the S-system model. However, they have not achieved enough estimation accuracy. One of the conceivable reasons is that the mechanisms to escape local optima. We applied an Immune Algorithm (IA) to search for the S-system parameters. IA is also a heuristic search method, which is inspired by the biological mechanism of acquired immunity. Compared to GA, IA is able to search large solution space, thereby avoiding local optima, and have multiple candidates of the solutions. These features work well for searching the S-system model. Actually, our algorithm showed higher performance than GA for both simulation and real data analyses.  相似文献   

20.
The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis   总被引:26,自引:2,他引:26  
The Parsimony Ratchet 1 1 This method, the Parsimony Ratchet, was originally presented at the Numerical Cladistics Symposium at the American Museum of Natural History, New York, in May 1998 (see Horovitz, 1999) and at the Meeting of the Willi Hennig Society (Hennig XVII) in September 1998 in São Paulo, Brazil.
is presented as a new method for analysis of large data sets. The method can be easily implemented with existing phylogenetic software by generating batch command files. Such an approach has been implemented in the programs DADA (Nixon, 1998) and Winclada (Nixon, 1999). The Parsimony Ratchet has also been implemented in the most recent versions of NONA (Goloboff, 1998). These implementations of the ratchet use the following steps: (1) Generate a starting tree (e.g., a “Wagner” tree followed by some level of branch swapping or not). (2) Randomly select a subset of characters, each of which is given additional weight (e.g., add 1 to the weight of each selected character). (3) Perform branch swapping (e.g., “branch-breaking” or TBR) on the current tree using the reweighted matrix, keeping only one (or few) tree. (4) Set all weights for the characters to the “original” weights (typically, equal weights). (5) Perform branch swapping (e.g., branch-breaking or TBR) on the current tree (from step 3) holding one (or few) tree. (6) Return to step 2. Steps 2–6 are considered to be one iteration, and typically, 50–200 or more iterations are performed. The number of characters to be sampled for reweighting in step 2 is determined by the user; I have found that between 5 and 25% of the characters provide good results in most cases. The performance of the ratchet for large data sets is outstanding, and the results of analyses of the 500 taxon seed plant rbcL data set (Chase et al., 1993) are presented here. A separate analysis of a three-gene data set for 567 taxa will be presented elsewhere (Soltis et al., in preparation) demonstrating the same extraordinary power. With the 500-taxon data set, shortest trees are typically found within 22 h (four runs of 200 iterations) on a 200-MHz Pentium Pro. These analyses indicate efficiency increases of 20×–80× over “traditional methods” such as varying taxon order randomly and holding few trees, followed by more complete analyses of the best trees found, and thousands of times faster than nonstrategic searches with PAUP. Because the ratchet samples many tree islands with fewer trees from each island, it provides much more accurate estimates of the “true” consensus than collecting many trees from few islands. With the ratchet, Goloboff's NONA, and existing computer hardware, data sets that were previously intractable or required months or years of analysis with PAUP* can now be adequately analyzed in a few hours or days.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号