首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The problem of determining an optimal phylogenetic tree from a set of data is an example of the Steiner problem in graphs. There is no efficient algorithm for solving this problem with reasonably large data sets. In the present paper an approach is described that proves in some cases that a given tree is optimal without testing all possible trees. The method first uses a previously described heuristic algorithm to find a tree of relatively small total length. The second part of the method independently analyses subsets of sites to determine a lower bound on the length of any tree. We simultaneously attempt to reduce the total length of the tree and increase the lower bound. When these are equal it is not possible to make a shorter tree with a given data set and given criterion. An example is given where the only two possible minimal trees are found for twelve different mammalian cytochrome c sequences. The criterion of finding the smallest number of minimum base changes was used. However, there is no general method of guaranteeing that a solution will be found in all cases and in particular better methods of improving the estimate of the lower bound need to be developed.  相似文献   

2.

Background  

Phylogenetic analyses of protein families are used to define the evolutionary relationships between homologous proteins. The interpretation of protein-sequence phylogenetic trees requires the examination of the taxonomic properties of the species associated to those sequences. However, there is no online tool to facilitate this interpretation, for example, by automatically attaching taxonomic information to the nodes of a tree, or by interactively colouring the branches of a tree according to any combination of taxonomic divisions. This is especially problematic if the tree contains on the order of hundreds of sequences, which, given the accelerated increase in the size of the protein sequence databases, is a situation that is becoming common.  相似文献   

3.
Summary The problem of determining the minimal phylogenetic tree is discussed in relation to graph theory. It is shown that this problem is an example of the Steiner problem in graphs which is to connect a set of points by a minimal length network where new points can be added. There is no reported method of solving realistically-sized Steiner problems in reasonable computing time. A heuristic method of approaching the phylogenetic problem is presented, together with a worked example with 7 mammalian cytochrome c sequences. It is shown in this case that the method develops a phylogenetic tree that has the smallest possible number of amino acid replacements. The potential and limitations of the method are discussed. It is stressed that objective methods must be used for comparing different trees. In particular it should be determined how close a given tree is to a mathematically determined lower bound. A theorem is proved which is used to establish a lower bound on the length of any tree and if a tree is found with a length equal to the lower bound, then no shorter tree can exist.  相似文献   

4.
We have developed a rapid parsimony method for reconstructing ancestral nucleotide states that allows calculation of initial branch lengths that are good approximations to optimal maximum-likelihood estimates under several commonly used substitution models. Use of these approximate branch lengths (rather than fixed arbitrary values) as starting points significantly reduces the time required for iteration to a solution that maximizes the likelihood of a tree. These branch lengths are close enough to the optimal values that they can be used without further iteration to calculate approximate maximum-likelihood scores that are very close to the "exact" scores found by iteration. Several strategies are described for using these approximate scores to substantially reduce times needed for maximum-likelihood tree searches.  相似文献   

5.
6.
Summary We have recently described a method of building phylogenetic trees and have outlined an approach for proving whether a particular tree is optimal for the data used. In this paper we describe in detail the method of establishing lower bounds on the length of a minimal tree by partitioning the data set into subsets. All characters that could be involved in duplications in the data are paired with all other such characters. A matching algorithm is then used to obtain the pairing of characters that reveals the most duplications in the data. This matching may still not account for all nucleotide substitutions on the tree. The structure of the tree is then used to help select subsets of three or more. characters until the lower bound found by partitioning is equal to the length of the tree. The tree must then be a minimal tree since no tree can exist with a length less than that of the lower bound.The method is demonstrated using a set of 23 vertebrate cytochrome c sequences with the criterion of minimizing the total number of nucleotide substitutions. There are 131130 7045768798 9603440625 topologically distinct trees that can be constructed from this data set. The method described in this paper does identify 144 minimal tree variants. The method is general in the sense that it can be used for other data and other criteria of length. It need not however always be possible to prove a tree minimal but the method will give an upper and lower bound on the length of minimal trees.  相似文献   

7.
Many different phylogenetic clustering techniques are used currently. One approach is to first determine the topology with a common clustering method and then calculate the branch lengths of the tree. If the resulting tree is not optimal exchanging tree branches can make some local changes in the tree topology. The whole process can be iterated until a satisfactory result has been obtained. The efficiency of this method fully depends on the initially generated tree. Although local changes are made, the optimal tree will never be found if the initial tree is poorly chosen. In this article, genetic algorithms are applied such that the optimal tree can be found even with a bad initial tree topology. This tree generating method is tested by comparing its results with the results of the FITCH program in the PHYLIP software package. Two simulated data sets and a real data set are used.  相似文献   

8.
We have developed a new tool, called fastDNAml, for constructingphylogenetic trees from DNA sequences. The program can be runon a wide variety of computers ranging from Unix workstationsto massively parallel systems, and is available from the RibosomalDatabase Project (RDP) by anonymous FTP. Our program uses amaximum likelihood approach and is based on version 3.3 of Felsenstein'sdnaml program. Several enhancements, including algorithmic changes,significantly improve performance and reduce memory usage, makingit feasible to construct even very large trees. Trees containing40–100 taxa have been easily generated, and phylogeneticestimates are possible even when hundreds of sequences exist.We are currently using the tool to construct a phylogenetictree based on 473 small subunit rRNA sequences from prokaryotes.  相似文献   

9.
10.
SUMMARY: MAC5 implements MCMC sampling of the posterior distribution of tree topologies from DNA sequences containing gaps by using a five state model of evolution (the four nucleotides and the gap character).  相似文献   

11.
An important issue in the phylogenetic analysis of nucleotide sequence data using the maximum likelihood (ML) method is the underlying evolutionary model employed. We consider the problem of simultaneously estimating the tree topology and the parameters in the underlying substitution model and of obtaining estimates of the standard errors of these parameter estimates. Given a fixed tree topology and corresponding set of branch lengths, the ML estimates of standard evolutionary model parameters are asymptotically efficient, in the sense that their joint distribution is asymptotically normal with the variance–covariance matrix given by the inverse of the Fisher information matrix. We propose a new estimate of this conditional variance based on estimation of the expected information using a Monte Carlo sampling (MCS) method. Simulations are used to compare this conditional variance estimate to the standard technique of using the observed information under a variety of experimental conditions. In the case in which one wishes to estimate simultaneously the tree and parameters, we provide a bootstrapping approach that can be used in conjunction with the MCS method to estimate the unconditional standard error. The methods developed are applied to a real data set consisting of 30 papillomavirus sequences. This overall method is easily incorporated into standard bootstrapping procedures to allow for proper variance estimation.  相似文献   

12.
We describe a confidence test for branching order that can aid protein phylogeny reconstruction as well as the evaluation of the optimal tree. It is proposed that the process resulting in the observed amino acid residue differences, which is the basis for the identification of the order and relative times of divergence events, is appropriately described by a modification of the negative binomial distribution. The relative total numbers of mutations (accepted and nonaccepted), which result in a given number of amino acid differences, may be obtained as the expectation of this distribution. The associated variances enable significant differences in tree branching order to be established. If the total rates of mutation of the genes encoding the compared proteins are equal, the expected total mutations and their associated variances map identically to their relative times of divergence. In addition, significantly different rates of change (due to differences in total mutation rate and/or acceptance rate) may be identified without the requirement of outlying reference group. The method is equally applicable to phylogenies derived from DNA or RNA sequence information.  相似文献   

13.
In recent years, the emphasis of theoretical work on phylogenetic inference has shifted from the development of new tree inference methods to the development of methods to measure the statistical support for the topologies. This paper reviews 3 approaches to assign support values to branches in trees obtained in the analysis of molecular sequences: the bootstrap, the Bayesian posterior probabilities for clades, and the interior branch tests. In some circumstances, these methods give different answers. It should not be surprising: their assumptions are different. Thus the interior branch tests assume that a given topology is true and only consider if a particular branch length is longer than zero. If a tree is incorrect, a wrong branch (a low bootstrap or Bayesian support may be an indication) may have a non-zero length. If the substitution model is oversimplified, the length of a branch may be overestimated, and the Bayesian support for the branch may be inflated. The bootstrap, on the other hand, approximates the variance of the data under the real model of sequence evolution, because it involves direct resampling from this data. Thus the discrepancy between the Bayesian support and the bootstrap support may signal model inaccuracy. In practical application, use of all 3 methods is recommended, and if discrepancies are observed, then a careful analysis of their potential origins should be made.  相似文献   

14.
Parsimony methods infer phylogenetic trees by minimizing number of character changes required to explain observed character states. From the perspective of applicability of parsimony methods, it is important to assess whether the characters used to infer phylogeny are likely to provide a correct tree. We introduce a graph theoretical characterization that helps to assess whether given set of characters is appropriate to use with parsimony methods. Given a set of characters and a set of taxa, we construct a network called character overlap graph. We show that the character overlap graph for characters that are appropriate to use in parsimony methods is characterized by significant under-representation of subnetworks known as holes, and provide a validation for this observation. This characterization explains success in constructing evolutionary trees using parsimony method for some characters (e.g., protein domains) and lack of such success for other characters (e.g., introns). In the latter case, the understanding of obstacles to applying parsimony methods in a direct way has lead us to a new approach for detecting inconsistent and/or noisy data. Namely, we introduce the concept of stable characters which is similar but less restrictive than the well known concept of pairwise compatible characters. Application of this approach to introns produces the evolutionary tree consistent with the Coelomata hypothesis.  相似文献   

15.
To find out the evolutionary relationships among different tRNA sequences of 21 amino acids, 22 networks are constructed. One is constructed from whole tRNAs, and the other 21 networks are constructed from the tRNAs which carry the same amino acids. A new method is proposed such that the alignment scores of any two amino acids groups are determined by the average degree and the average clustering coefficient of their networks. The anticodon feature of isolated tRNA and the phylogenetic trees of 21 group networks are discussed. We find that some isolated tRNA sequences in 21 networks still connect with other tRNAs outside their group, which reflects the fact that those tRNAs might evolve by intercrossing among these 21 groups. We also find that most anticodons among the same cluster are only one base different in the same sites when S ≥ 70, and they stay in the same rank in the ladder of evolutionary relationships. Those observations seem to agree on that some tRNAs might mutate from the same ancestor sequences based on point mutation mechanisms.  相似文献   

16.
A recently developed mathematical model for the analysis of phylogenetic trees is applied to comparative data for 48 species. The model represents a return to fundamentals and makes no hypothesis with respect to the reversibility of the process. The species have been analysed in all subsets of three, and a measure of reliability of the results is provided. The numerical results of the computations on 17,296 triples of species are made available on the Internet. These results are discussed and the development of reliable tree structures for several species is illustrated. It is shown that, indeed, the Markov model is capable of considerably more interesting predictions than has been recognized to date.  相似文献   

17.
Bootstrap method of interior-branch test for phylogenetic trees   总被引:5,自引:2,他引:5  
Statistical properties of the bootstrap test of interior branch lengths of phylogenetic trees have been studied and compared with those of the standard interior-branch test in computer simulations. Examination of the properties of the tests under the null hypothesis showed that both tests for an interior branch of a predetermined topology are quite reliable when the distribution of the branch length estimate approaches a normal distribution. Unlike the standard interior-branch test, the bootstrap test appears to retain this property even when the substitution rate varies among sites. In this case, the distribution of the branch length estimate deviates from a normal distribution, and the standard interior-branch test gives conservative confidence probability values. A simple correction method was developed for both interior- branch tests to be applied for testing the reliability of tree topologies estimated from sequence data. This correction for the standard interior-branch test appears to be as effective as that obtained in our previous study, though it is much simpler. The bootstrap and standard interior-branch tests for estimated topologies become conservative as the number of sequence groups in a star-like tree increases.   相似文献   

18.
We present an efficient algorithm for statistical multiple alignment based on the TKF91 model of Thorne, Kishino, and Felsenstein (1991) on an arbitrary k-leaved phylogenetic tree. The existing algorithms use a hidden Markov model approach, which requires at least O( radical 5(k)) states and leads to a time complexity of O(5(k)L(k)), where L is the geometric mean sequence length. Using a combinatorial technique reminiscent of inclusion/exclusion, we are able to sum away the states, thus improving the time complexity to O(2(k)L(k)) and considerably reducing memory requirements. This makes statistical multiple alignment under the TKF91 model a definite practical possibility in the case of a phylogenetic tree with a modest number of leaves.  相似文献   

19.
Deciphering the network of protein interactions that underlines cellular operations has become one of the main tasks of proteomics and computational biology. Recently, a set of bioinformatics approaches has emerged for the prediction of possible interactions by combining sequence and genomic information. Even though the initial results are very promising, the current methods are still far from perfect. We propose here a new way of discovering possible protein-protein interactions based on the comparison of the evolutionary distances between the sequences of the associated protein families, an idea based on previous observations of correspondence between the phylogenetic trees of associated proteins in systems such as ligands and receptors. Here, we extend the approach to different test sets, including the statistical evaluation of their capacity to predict protein interactions. To demonstrate the possibilities of the system to perform large-scale predictions of interactions, we present the application to a collection of more than 67 000 pairs of E.coli proteins, of which 2742 are predicted to correspond to interacting proteins.  相似文献   

20.
MOTIVATION: The study and comparison of mutational spectra is an important problem in molecular biology, because these spectra often reveal important features of the action of various mutagens and the functioning of repair/replication enzymes. As is known, mutability varies significantly along nucleotide sequences: mutations often concentrate at certain positions in a sequence, otherwise termed 'hotspots'. RESULTS: Herein, we propose a regression analysis method based on the use of regression trees in order to analyse the influence of nucleotide context on the occurrence of such hotspots. The REGRT program developed has been tested on simulated and real mutational spectra. For the G:C-->T:A mutational spectra induced by Sn1 alkylating agents (nine spectra), the prediction accuracy was 0. 99. AVAILABILITY: The REGRT program is available upon request from V.Berikov.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号