首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
FastJoin, an improved neighbor-joining algorithm   总被引:1,自引:0,他引:1  
Reconstructing the evolutionary history of a set of species is an elementary problem in biology, and methods for solving this problem are evaluated based on two characteristics: accuracy and efficiency. Neighbor-joining reconstructs phylogenetic trees by iteratively picking a pair of nodes to merge as a new node until only one node remains; due to its good accuracy and speed, it has been embraced by the phylogeny research community. With the advent of large amounts of data, improved fast and precise methods for reconstructing evolutionary trees have become necessary. We improved the neighbor-joining algorithm by iteratively picking two pairs of nodes and merging as two new nodes, until only one node remains. We found that another pair of true neighbors could be chosen to merge as a new node besides the pair of true neighbors chosen by the criterion of the neighbor-joining method, in each iteration of the clustering procedure for the purely additive tree. These new neighbors will be selected by another iteration of the neighbor-joining method, so that they provide an improved neighbor-joining algorithm, by iteratively picking two pairs of nodes to merge as two new nodes until only one node remains, constructing the same phylogenetic tree as the neighbor-joining algorithm for the same input data. By combining the improved neighbor-joining algorithm with styles upper bound computation optimization of RapidNJ and external storage of ERapidNJ methods, a new method of reconstructing phylogenetic trees, FastJoin, was proposed. Experiments with sets of data showed that this new neighbor-joining algorithm yields a significant speed-up compared to classic neighbor-joining, showing empirically that FastJoin is superior to almost all other neighbor-joining implementations.  相似文献   

2.
In the reconstruction of a large phylogenetic tree, the most difficult part is usually the problem of how to explore the topology space to find the optimal topology. We have developed a "divide-and-conquer" heuristic algorithm in which an initial neighbor-joining (NJ) tree is divided into subtrees at internal branches having bootstrap values higher than a threshold. The topology search is then conducted by using the maximum-likelihood method to reevaluate all branches with a bootstrap value lower than the threshold while keeping the other branches intact. Extensive simulation showed that our simple method, the neighbor-joining maximum-likelihood (NJML) method, is highly efficient in improving NJ trees. Furthermore, the performance of the NJML method is nearly equal to or better than existing time-consuming heuristic maximum-likelihood methods. Our method is suitable for reconstructing relatively large molecular phylogenetic trees (number of taxa >/= 16).  相似文献   

3.

Background  

The neighbor-joining method by Saitou and Nei is a widely used method for constructing phylogenetic trees. The formulation of the method gives rise to a canonical Θ(n 3) algorithm upon which all existing implementations are based.  相似文献   

4.
5.
On the optimality of coarse behavior rules   总被引:1,自引:0,他引:1  
Animal behavior can be characterized by the degree of responsiveness it has to variations in the environment. Some behavior rules lead to fine-tuned responses that carefully adjust to environmental cues, while other rules fail to discriminate as carefully, and lead to more inflexible responses. In this paper we seek to explain such inflexible behavior. We show that coarse behavior, behavior which appears to be rule-bound and inflexible, and which fails to adapt to predictable changes in the environment, is an optimal response to a particular type of uncertainty we call extended uncertainty. We show that the very variability and unpredictability that arises from extended uncertainty will lead to more rigid and possibly more predictable behavior. We relate coarse behavior to the failures to meet optimality conditions in animal behavior, most notably in foraging behavior, and also address the implications of extended uncertainty and coarse behavior rules for some results in experimental versus naturalistic approaches to ethology.  相似文献   

6.
7.

Background

The standard genetic code (SGC) is a unique set of rules which assign amino acids to codons. Similar amino acids tend to have similar codons indicating that the code evolved to minimize the costs of amino acid replacements in proteins, caused by mutations or translational errors. However, if such optimization in fact occurred, many different properties of amino acids must have been taken into account during the code evolution. Therefore, this problem can be reformulated as a multi-objective optimization task, in which the selection constraints are represented by measures based on various amino acid properties.

Results

To study the optimality of the SGC we applied a multi-objective evolutionary algorithm and we used the representatives of eight clusters, which grouped over 500 indices describing various physicochemical properties of amino acids. Thanks to that we avoided an arbitrary choice of amino acid features as optimization criteria. As a consequence, we were able to conduct a more general study on the properties of the SGC than the ones presented so far in other papers on this topic. We considered two models of the genetic code, one preserving the characteristic codon blocks structure of the SGC and the other without this restriction. The results revealed that the SGC could be significantly improved in terms of error minimization, hereby it is not fully optimized. Its structure differs significantly from the structure of the codes optimized to minimize the costs of amino acid replacements. On the other hand, using newly defined quality measures that placed the SGC in the global space of theoretical genetic codes, we showed that the SGC is definitely closer to the codes that minimize the costs of amino acids replacements than those maximizing them.

Conclusions

The standard genetic code represents most likely only partially optimized systems, which emerged under the influence of many different factors. Our findings can be useful to researchers involved in modifying the genetic code of the living organisms and designing artificial ones.
  相似文献   

8.
What proportion of the traits of individuals has been optimally shaped by natural selection and what has not? Here, we estimate the maximal number of those traits using a mathematical model for natural selection in multitrait organisms. The model represents the most ideal conditions for natural selection: a simple genotype–phenotype map and independent variation between traits. The model is also used to disentangle the influence of fitness functions and the number of traits, n, per se on the efficiency of natural selection. We also allow n to evolve. Our simulations show that, for all fitness functions and even in the best conditions optimal phenotypes are rarely encountered, only for = 1, and that a large proportion of traits are always far from their optimum, specially for large n. This happens to different degrees depending on the fitness functions (additive linear, additive nonlinear, Gaussian and multiplicative). The traits that arise earlier in evolution account for a larger proportion of the absolute fitness of individuals. Thus, complex phenotypes have, in proportion, more traits that are far from optimal and the closeness to the optimum correlates with the age of the trait. Based on estimated population sizes, mutation rates and selection coefficients, we provide an upper estimation of the number of traits that can become and remain adapted by direct natural selection.  相似文献   

9.
Using analytical methods, we show that under a variety of model misspecifications, Neighbor-Joining, minimum evolution, and least squares estimation procedures are statistically inconsistent. Failure to correctly account for differing rates-across-sites processes, failure to correctly model rate matrix parameters, and failure to adjust for parallel rates-across-sites changes (a rates-across-subtrees process) are all shown to lead to a "long branch attraction" form of inconsistency. In addition, failure to account for rates-across-sites processes is also shown to result in underestimation of evolutionary distances for a wide variety of substitution models, generalizing an earlier analytical result for the Jukes-Cantor model reported in Golding and a similar bias result for the GTR or REV model in Kelly and Rice (1996). Although standard rates-across-sites models can be employed in many of these cases to restore consistency, current models cannot account for other kinds of misspecification. We examine an idealized but biologically relevant case, where parallel changes in rates at sites across subtrees is shown to give rise to inconsistency. This changing rates-across-subtrees type model misspecification cannot be adjusted for with conventional methods or without carefully considering the rate variation in the larger tree. The results are presented for four-taxon trees, but the expectation is that they have implications for larger trees as well. To illustrate this, a simulated 42-taxon example is given in which the microsporidia, an enigmatic group of eukaryotes, are incorrectly placed at the archaebacteria-eukaryotes split because of incorrectly specified pairwise distances. The analytical nature of the results lend insight into the reasons that long branch attraction tends to be a common form of inconsistency and reasons that other forms of inconsistency like "long branches repel" can arise in some settings. In many of the cases of inconsistency presented, a particular incorrect topology is estimated with probability converging to one, the implication being that measures of uncertainty like bootstrap support will be unable to detect that there is a problem with the estimation. The focus is on distance methods, but previous simulation results suggest that the zones of inconsistency for distance methods contain the zones of inconsistency for maximum likelihood methods as well.  相似文献   

10.
Much recent progress has been made to understand the impact of proteome allocation on bacterial growth; much less is known about the relationship between the abundances of the enzymes and their substrates, which jointly determine metabolic fluxes. Here, we report a correlation between the concentrations of enzymes and their substrates in Escherichia coli. We suggest this relationship to be a consequence of optimal resource allocation, subject to an overall constraint on the biomass density: For a cellular reaction network composed of effectively irreversible reactions, maximal reaction flux is achieved when the dry mass allocated to each substrate is equal to the dry mass of the unsaturated (or “free”) enzymes waiting to consume it. Calculations based on this optimality principle successfully predict the quantitative relationship between the observed enzyme and metabolite abundances, parameterized only by molecular masses and enzyme–substrate dissociation constants (Km). The corresponding organizing principle provides a fundamental rationale for cellular investment into different types of molecules, which may aid in the design of more efficient synthetic cellular systems.

This study shows that in E. coli, the cellular mass of each metabolite approximately equals the combined mass of the free enzymes waiting to consume it; this simple relationship arises from the optimal utilization of cellular dry mass, and quantitatively describes available experimental data.  相似文献   

11.
We have developed a phylogenetic tree reconstruction method that detects and reports multiple topologically distant low-cost solutions. Our method is a generalization of the neighbor-joining method of Saitou and Nei and affords a more thorough sampling of the solution space by keeping track of multiple partial solutions during its execution. The scope of the solution space sampling is controlled by a pair of user-specified parameters--the total number of alternate solutions and the number of alternate solutions that are randomly selected--effecting a smooth trade-off between run time and solution quality and diversity. This method can discover topologically distinct low-cost solutions. In tests on biological and synthetic data sets using either the least-squares distance or minimum-evolution criterion, the method consistently performed as well as, or better than, both the neighbor-joining heuristic and the PHYLIP implementation of the Fitch-Margoliash distance measure. In addition, the method identified alternative tree topologies with costs within 1% or 2% of the best, but with topological distances of 9 or more partitions from the best solution (16 taxa); with 32 taxa, topologies were obtained 17 (least-squares) and 22 (minimum-evolution) partitions from the best topology when 200 partial solutions were retained. Thus, the method can find lower-cost tree topologies and near-best tree topologies that are significantly different from the best topology.  相似文献   

12.
The "neighbor-joining algorithm" is a recursive procedure for reconstructing trees that is based on a transformation of pairwise distances between leaves. We present a generalization of the neighbor-joining transformation, which uses estimates of phylogenetic diversity rather than pairwise distances in the tree. This leads to an improved neighbor-joining algorithm whose total running time is still polynomial in the number of taxa. On simulated data, the method outperforms other distance-based methods. We have implemented neighbor-joining for subtree weights in a program called MJOIN which is freely available under the Gnu Public License at http://bio.math.berkeley.edu/mjoin/.  相似文献   

13.
Goodarzi H  Nejad HA  Torabi N 《Bio Systems》2004,77(1-3):163-173
The existence of nonrandom patterns in codon assignments is supported by many statistical and biochemical studies. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslation errors and point mutations. For example, it is known that when an error induces the conversion of an amino acid to another, the biochemical properties of the resulting amino acid are usually very similar to that of the original. Prior studies include many attempts at quantitative estimation of the fraction of randomly generated codes which, based upon load minimization, score higher than the canonical genetic code. In this study, we took into consideration both the relative frequencies of amino acids and nonsense mistranslations, factors which had been previously ignored. Incorporation of these parameters, resulted in a fitness function (phi) which rendered the canonical genetic code to be highly optimized with respect to load minimization. Considering termination codons, we applied a biosynthetic version of the coevolution theory, however, with low significance. We employed a revised cost for the precursor-product pairs of amino acids and showed that the significance of this approach depends on the cost measure matrix used by the researcher. Thus, we have compared the two prominent matrices, point accepted mutations 74-100 (PAM(74-100)) and mutation matrix in our study.  相似文献   

14.
Modern materialist rationalism is the doctrine that principles governing behaviorally important aspects of the world have become implicit in the structure of purpose-specific information-processing mechanisms through evolution by natural selection. These principles are mostly, but not entirely mathematical. Because the evolutionary process tends to optimize, the computations performed by these mechanisms tend to approximate the optimal computation. This doctrine does not imply that animals always make rational and/or optimal choices.  相似文献   

15.
The method of minimum evolution reconstructs a phylogenetic tree T for n taxa given dissimilarity data d. In principle, for every tree W with these n leaves an estimate for the total length of W is made, and T is selected as the W that yields the minimum total length. Suppose that the ordinary least-squares formula S W (d) is used to estimate the total length of W. A theorem of Rzhetsky and Nei shows that when d is positively additive on a completely resolved tree T, then for all WT it will be true that S W (d) > S T (d). The same will be true if d is merely sufficiently close to an additive dissimilarity function. This paper proves that as n grows large, even if the shortest branch length in the true tree T remains constant and d is additive on T, then the difference S W (d)-S T (d) can go to zero. It is also proved that, as n grows large, there is a tree T with n leaves, an additive distance function d T on T with shortest edge ε, a distance function d, and a tree W with the same n leaves such that d differs from d T by only approximately ε/4, yet minimum evolution incorrectly selects the tree W over the tree T. This result contrasts with the method of neighbor-joining, for which Atteson showed that incorrect selection of W required a deviation at least ε/2. It follows that, for large n, minimum evolution with ordinary least-squares can be only half as robust as neighbor-joining.  相似文献   

16.
The neighbor-joining method: a new method for reconstructing phylogenetic trees   总被引:673,自引:29,他引:673  
A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.   相似文献   

17.
Phylogeny reconstruction is the process of inferring evolutionary relationships from molecular sequences, and methods that are expected to accurately reconstruct trees from sequences of reasonable length are highly desirable. To formalize this concept, the property of fast-convergence has been introduced to describe phylogeny reconstruction methods that, with high probability, recover the true tree from sequences that grow polynomially in the number of taxa n. While provably fast-converging methods have been developed, the neighbor-joining (NJ) algorithm of Saitou and Nei remains one of the most popular methods used in practice. This algorithm is known to converge for sequences that are exponential in n, but no lower bound for its convergence rate has been established. To address this theoretical question, we analyze the performance of the NJ algorithm on a type of phylogeny known as a 'caterpillar tree'. We find that, for sequences of polynomial length in the number of taxa n, the variability of the NJ criterion is sufficiently high that the algorithm is likely to fail even in the first step of the phylogeny reconstruction process, regardless of the degree of polynomial considered. This result demonstrates that, for general n-taxa trees, the exponential bound cannot be improved.  相似文献   

18.
19.
This paper discusses problems associated with the use of optimality models in human behavioral ecology. Optimality models are used in both human and non-human animal behavioral ecology to test hypotheses about the conditions generating and maintaining behavioral strategies in populations via natural selection. The way optimality models are currently used in behavioral ecology faces significant problems, which are exacerbated by employing the so-called ‘phenotypic gambit’: that is, the bet that the psychological and inheritance mechanisms responsible for behavioral strategies will be straightforward. I argue that each of several different possible ways we might interpret how optimality models are being used for humans face similar and additional problems. I suggest some ways in which human behavioral ecologists might adjust how they employ optimality models; in particular, I urge the abandonment of the phenotypic gambit in the human case.  相似文献   

20.
Mortality is U-shaped with age for many species, declining from birth to sexual maturity, then rising in adulthood, sometimes with postreproductive survival. We show analytically why the optimal life history of a species with determinate growth is likely to have this shape. An organism allocates energy among somatic growth, fertility and maintenance/survival at each age. Adults may transfer energy to juveniles, who can then use more energy than they produce. Optimal juvenile mortality declines from birth to maturity, either to protect the increasingly valuable cumulative investments by adults in juveniles or to exploit the compounding effects of early investment in somatic growth, since early growth raises subsequent energy production, which in turn supports further growth. Optimal adult mortality rises after maturity as expected future reproduction declines as in Hamilton, but intergenerational transfers lead to postreproductive survival as in Lee. Here the Hamilton and transfer effects are divided by probabilities of survival in contrast to the fitness impact measures, which are relevant for mutation-selection balance. If energetic efficiency rises strongly with adult experience, then adult mortality could initially be flat or declining.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号