共查询到20条相似文献,搜索用时 0 毫秒
1.
The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis 总被引:26,自引:2,他引:26
Kevin C. Nixon 《Cladistics : the international journal of the Willi Hennig Society》1999,15(4):407-414
The Parsimony Ratchet 1 1 This method, the Parsimony Ratchet, was originally presented at the Numerical Cladistics Symposium at the American Museum of Natural History, New York, in May 1998 (see Horovitz, 1999) and at the Meeting of the Willi Hennig Society (Hennig XVII) in September 1998 in São Paulo, Brazil.
is presented as a new method for analysis of large data sets. The method can be easily implemented with existing phylogenetic software by generating batch command files. Such an approach has been implemented in the programs DADA (Nixon, 1998) and Winclada (Nixon, 1999). The Parsimony Ratchet has also been implemented in the most recent versions of NONA (Goloboff, 1998). These implementations of the ratchet use the following steps: (1) Generate a starting tree (e.g., a “Wagner” tree followed by some level of branch swapping or not). (2) Randomly select a subset of characters, each of which is given additional weight (e.g., add 1 to the weight of each selected character). (3) Perform branch swapping (e.g., “branch-breaking” or TBR) on the current tree using the reweighted matrix, keeping only one (or few) tree. (4) Set all weights for the characters to the “original” weights (typically, equal weights). (5) Perform branch swapping (e.g., branch-breaking or TBR) on the current tree (from step 3) holding one (or few) tree. (6) Return to step 2. Steps 2–6 are considered to be one iteration, and typically, 50–200 or more iterations are performed. The number of characters to be sampled for reweighting in step 2 is determined by the user; I have found that between 5 and 25% of the characters provide good results in most cases. The performance of the ratchet for large data sets is outstanding, and the results of analyses of the 500 taxon seed plant rbcL data set (Chase et al., 1993) are presented here. A separate analysis of a three-gene data set for 567 taxa will be presented elsewhere (Soltis et al., in preparation) demonstrating the same extraordinary power. With the 500-taxon data set, shortest trees are typically found within 22 h (four runs of 200 iterations) on a 200-MHz Pentium Pro. These analyses indicate efficiency increases of 20×–80× over “traditional methods” such as varying taxon order randomly and holding few trees, followed by more complete analyses of the best trees found, and thousands of times faster than nonstrategic searches with PAUP. Because the ratchet samples many tree islands with fewer trees from each island, it provides much more accurate estimates of the “true” consensus than collecting many trees from few islands. With the ratchet, Goloboff's NONA, and existing computer hardware, data sets that were previously intractable or required months or years of analysis with PAUP* can now be adequately analyzed in a few hours or days. 相似文献
is presented as a new method for analysis of large data sets. The method can be easily implemented with existing phylogenetic software by generating batch command files. Such an approach has been implemented in the programs DADA (Nixon, 1998) and Winclada (Nixon, 1999). The Parsimony Ratchet has also been implemented in the most recent versions of NONA (Goloboff, 1998). These implementations of the ratchet use the following steps: (1) Generate a starting tree (e.g., a “Wagner” tree followed by some level of branch swapping or not). (2) Randomly select a subset of characters, each of which is given additional weight (e.g., add 1 to the weight of each selected character). (3) Perform branch swapping (e.g., “branch-breaking” or TBR) on the current tree using the reweighted matrix, keeping only one (or few) tree. (4) Set all weights for the characters to the “original” weights (typically, equal weights). (5) Perform branch swapping (e.g., branch-breaking or TBR) on the current tree (from step 3) holding one (or few) tree. (6) Return to step 2. Steps 2–6 are considered to be one iteration, and typically, 50–200 or more iterations are performed. The number of characters to be sampled for reweighting in step 2 is determined by the user; I have found that between 5 and 25% of the characters provide good results in most cases. The performance of the ratchet for large data sets is outstanding, and the results of analyses of the 500 taxon seed plant rbcL data set (Chase et al., 1993) are presented here. A separate analysis of a three-gene data set for 567 taxa will be presented elsewhere (Soltis et al., in preparation) demonstrating the same extraordinary power. With the 500-taxon data set, shortest trees are typically found within 22 h (four runs of 200 iterations) on a 200-MHz Pentium Pro. These analyses indicate efficiency increases of 20×–80× over “traditional methods” such as varying taxon order randomly and holding few trees, followed by more complete analyses of the best trees found, and thousands of times faster than nonstrategic searches with PAUP. Because the ratchet samples many tree islands with fewer trees from each island, it provides much more accurate estimates of the “true” consensus than collecting many trees from few islands. With the ratchet, Goloboff's NONA, and existing computer hardware, data sets that were previously intractable or required months or years of analysis with PAUP* can now be adequately analyzed in a few hours or days. 相似文献
2.
Fischer M 《Journal of mathematical biology》2012,65(2):293-308
In this paper, we investigate a conjecture by Arndt von Haeseler concerning the Maximum Parsimony method for phylogenetic estimation, which was published by the Newton Institute in Cambridge on a list of open phylogenetic problems in 2007. This conjecture deals with the question whether Maximum Parsimony trees are hereditary. The conjecture suggests that a Maximum Parsimony tree for a particular (DNA) alignment necessarily has subtrees of all possible sizes which are most parsimonious for the corresponding subalignments. We answer the conjecture affirmatively for binary alignments on 5 taxa but also show how to construct examples for which Maximum Parsimony trees are not hereditary. Apart from showing that a most parsimonious tree cannot generally be reduced to a most parsimonious tree on fewer taxa, we also show that compatible most parsimonious quartets do not have to provide a most parsimonious supertree. Last, we show that our results can be generalized to Maximum Likelihood for certain nucleotide substitution models. 相似文献
3.
Background
Protein inter-residue contacts play a crucial role in the determination and prediction of protein structures. Previous studies on contact prediction indicate that although template-based consensus methods outperform sequence-based methods on targets with typical templates, such consensus methods perform poorly on new fold targets. However, we find out that even for new fold targets, the models generated by threading programs can contain many true contacts. The challenge is how to identify them. 相似文献4.
5.
Background
Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past.Results
In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores.Conclusion
The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are common to all the branching patterns introduced by the reticulate vertices. Thus the score contains an in-built cost for the number of reticulate vertices in the network, and would provide a criterion that is comparable among all networks. Although the problem of finding the parsimony score on the network is believed to be computationally hard to solve, heuristics such as the ones described here would be beneficial in our efforts to find a most parsimonious network. 相似文献6.
A. E. FRIDAY 《Zoological Journal of the Linnean Society》1982,74(3):329-335
The role of a parsimony principle is unclear in most methods which have been claimed to be valid for the reconstruction of tionary kinship. There appear to be two reasons for this: first, the role of parsimony is generally uncertain in scientific method; second, the majority of methods proposed transform data and order them, but are not appropriate to the reconstruction of phyto Commitment to a probabilistic model of tionary processes seems to be the essential component which may enable us justifiably to estimate phylo An example is provided which emphasizes the importance of knowledge about the nature of the process before undertaking estimation of the pattern of kinship. 相似文献
7.
8.
Alon Noga Chor Benny Pardi Fabio Rapoport Anat 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2010,7(1):183-187
We explore the maximum parsimony (MP) and ancestral maximum likelihood (AML) criteria in phylogenetic tree reconstruction. Both problems are NP-hard, so we seek approximate solutions. We formulate the two problems as Steiner tree problems under appropriate distances. The gist of our approach is the succinct characterization of Steiner trees for a small number of leaves for the two distances. This enables the use of known Steiner tree approximation algorithms. The approach leads to a 16/9 approximation ratio for AML and asymptotically to a 1.55 approximation ratio for MP. 相似文献
9.
10.
Ensemble clustering methods have become increasingly important to ease the task of choosing the most appropriate cluster algorithm for a particular data analysis problem. The consensus clustering (CC) algorithm is a recognized ensemble clustering method that uses an artificial intelligence technique to optimize a fitness function. We formally prove the existence of a subspace of the search space for CC, which contains all solutions of maximal fitness and suggests two greedy algorithms to search this subspace. We evaluate the algorithms on two gene expression data sets and one synthetic data set, and compare the result with the results of other ensemble clustering approaches. 相似文献
11.
Jan De Laet Erik Smets 《Cladistics : the international journal of the Willi Hennig Society》1998,14(4):363-381
The following three basic defects for which three-taxon analysis has been rejected as a method for biological systematics are reviewed: (1) character evolution is a priori assumed to be irreversible; (2) basic statements that are not logically independent are treated as if they are; (3) three-taxon statements that are considered as independent support for a given tree may be mutually exclusive on that tree. It is argued that these criticisms only relate to the particular way the three-taxon approach was originally implemented. Four-taxon analysis, an alternative implementation that circumvents these problems, is derived. Four-taxon analysis is identical to standard parsimony analysis except for an unnatural restriction on the maximum amount of homoplasy that may be concentrated in a single character state. This restriction follows directly from the basic tenet of the three-taxon approach, that character state distributions should be decomposed into basic statements that are, in themselves, still informative with respect to relationships. A reconsideration of what constitutes an elementary relevant statement in systematics leads to a reformulation of standard parsimony as two-taxon analysis and to a rejection of four-taxon analysis as a method for biological systematics. 相似文献
12.
13.
Jan De Laet Erik Smets 《Cladistics : the international journal of the Willi Hennig Society》1998,14(3):239-248
Standard parsimony analysis has recently been described in a “three-taxon-like” way (the three-taxa statements for contiguous series–four-taxa statements for contiguous series, or TTSC–FTSC procedure) in order to clarify the differences between the standard approach and three-taxon analysis. It is shown that the alleged equivalence of standard parsimony analysis and the TTSC–FTSC procedure does not hold. Some minor defects of the procedure can be fixed within the TTSC–FTSC logic, but no solution is available for two basic problems: (1) the elementary three-taxon-like statements of the TTSC–FTSC procedure are highly artificial; and (2) the equivalence with standard parsimony depends on an incomplete correction for nonindependence between these statements. However, these findings do not invalidate the reported superiority of standard parsimony as a method for biological systematics. 相似文献
14.
Background
Maximum parsimony is one of the most commonly used criteria for reconstructing phylogenetic trees. Recently, Nakhleh and co-workers extended this criterion to enable reconstruction of phylogenetic networks, and demonstrated its application to detecting reticulate evolutionary relationships. However, one of the major problems with this extension has been that it favors more complex evolutionary relationships over simpler ones, thus having the potential for overestimating the amount of reticulation in the data. An ad hoc solution to this problem that has been used entails inspecting the improvement in the parsimony length as more reticulation events are added to the model, and stopping when the improvement is below a certain threshold. 相似文献15.
Alberto Romagnoni Jér?me Ribot Daniel Bennequin Jonathan Touboul 《PLoS computational biology》2015,11(11)
The layout of sensory brain areas is thought to subtend perception. The principles shaping these architectures and their role in information processing are still poorly understood. We investigate mathematically and computationally the representation of orientation and spatial frequency in cat primary visual cortex. We prove that two natural principles, local exhaustivity and parsimony of representation, would constrain the orientation and spatial frequency maps to display a very specific pinwheel-dipole singularity. This is particularly interesting since recent experimental evidences show a dipolar structures of the spatial frequency map co-localized with pinwheels in cat. These structures have important properties on information processing capabilities. In particular, we show using a computational model of visual information processing that this architecture allows a trade-off in the local detection of orientation and spatial frequency, but this property occurs for spatial frequency selectivity sharper than reported in the literature. We validated this sharpening on high-resolution optical imaging experimental data. These results shed new light on the principles at play in the emergence of functional architecture of cortical maps, as well as their potential role in processing information. 相似文献
16.
Adam Eyre-Walker 《Journal of molecular evolution》1998,47(6):686-690
Parsimony is commonly used to infer the direction of substitution and mutation. However, it is known that parsimony is biased
when the base composition of the DNA sequence is skewed. Here I quantify this effect for several simple cases. The analysis
demonstrates that parsimony can be misleading even when levels of sequence divergence are as low as 10%; parsimony incorrectly
infers an excess of common to rare changes. Caution must therefore be excercised in the use of parsimony.
Received: 13 November 1997 / Accepted: 18 June 1998 相似文献
17.
Urokinase-type plasminogen activator (uPA) and its receptor (uPAR) are instrumental in cellular activities during inflammation, angiogenesis and tumor metastasis. Recent studies suggest that uPA might exert its function on cell proliferation and migration in a uPAR-independent manner or through an adaptor to the uPA-uPAR system. By applying phage display technology, we have identified a putative uPA-binding consensus sequence BXXSSXXB (where B represents a basic amino acid and X represents any amino acid), which has no apparent sequence correlation to uPAR. This uPA-binding motif apparently recognizes the kringle domain of the protease and has an agonistic effect on uPA binding to immobilized uPAR, thereby possibly serving as part of an adaptor component for uPAR signaling. As a result of protein database searches, this motif was found in the extracellular domain of several cell surface proteins, some of which were proposed to be associated with the uPA-uPAR system. Among these, gp130, a common signal transducer for cytokines, was identified as a uPA-binding protein. The specificity of this interaction was demonstrated by inhibition of uPA binding to immobilized gp130 with soluble gp130. Furthermore, the binding could be partially inhibited by a uPA-binding consensus sequence-containing fusion protein in a dose-dependent manner, with an IC50 of approximately 1 microM, indicating that the uPA-binding motif is apparently involved in the uPA-gp130 interaction. The association of gp130 with uPA may link the uPA-uPAR system to various signal transduction pathways. 相似文献
18.
The epithelial mitogen keratinocyte growth factor binds to collagens via the consensus sequence glycine-proline-hydroxyproline 总被引:1,自引:0,他引:1
Ruehl M Somasundaram R Schoenfelder I Farndale RW Knight CG Schmid M Ackermann R Riecken EO Zeitz M Schuppan D 《The Journal of biological chemistry》2002,277(30):26872-26878
The binding of certain growth factors and cytokines to components of the extracellular matrix can regulate their local availability and modulate their biological activities. We show that mesenchymal cell-derived keratinocyte growth factor (KGF), a key stimulator of epithelial cell proliferation during wound healing, preferentially binds to collagens I, III, and VI. Binding is inhibited in a dose-dependent manner by denatured single collagen chains and collagen cyanogen bromide peptides. This interaction is saturable with dissociation constants of approximately 10(-8) to 10(-9) m and estimated molar ratios of up to three molecules of KGF bound to one molecule of triple helical collagen. Furthermore, collagen-bound KGF stimulated the proliferation of transformed keratinocyte or HaCaT cells. Ligand blotting of collagen-derived peptides points to a limited set of collagenous consensus sequences that bind KGF. By using synthetic collagen peptides, we defined the consensus sequence (Gly-Pro-Hyp)(n) as the collagen binding motif. We conclude that the preferential binding of KGF to the abundant collagens leads to a spatial pattern of bioavailable KGF that is dictated by the local organization of the collagenous extracellular matrix. The defined collagenous consensus peptide or its analogue may be useful in wound healing by increasing KGF bioactivity and thus modulating local epithelial remodeling and regeneration. 相似文献
19.
20.
The Pure Parsimony Haplotyping (PPH) problem is a NP-hard combinatorial optimization problem that consists of finding the minimum number of haplotypes necessary to explain a given set of genotypes. PPH has attracted more and more attention in recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from mapping complex disease genes to inferring population histories, passing through designing drugs, functional genomics and pharmacogenetics. In this article we investigate, for the first time, a recent version of PPH called the Pure Parsimony Haplotype problem under Uncertain Data (PPH-UD). This version mainly arises when the input genotypes are not accurate, i.e., when some single nucleotide polymorphisms are missing or affected by errors. We propose an exact approach to solution of PPH-UD based on an extended version of Catanzaro et al.[1] class representative model for PPH, currently the state-of-the-art integer programming model for PPH. The model is efficient, accurate, compact, polynomial-sized, easy to implement, solvable with any solver for mixed integer programming, and usable in all those cases for which the parsimony criterion is well suited for haplotype estimation. 相似文献