共查询到20条相似文献,搜索用时 31 毫秒
1.
Our ability to construct very large phylogenetic trees is becoming more important as vast amounts of sequence data are becoming
readily available. Neighbor joining (NJ) is a widely used distance-based phylogenetic tree construction method that has historically
been considered fast, but it is prohibitively slow for building trees from increasingly large datasets. We developed a fast
variant of NJ called relaxed neighbor joining (RNJ) and performed experiments to measure the speed improvement over NJ. Since
repeated runs of the RNJ algorithm generate a superset of the trees that repeated NJ runs generate, we also assessed tree
quality. RNJ is dramatically faster than NJ, and the quality of resulting trees is very similar for the two algorithms. The
results indicate that RNJ is a reasonable alternative to NJ and that it is especially well suited for uses that involve large
numbers of taxa or highly repetitive procedures such as bootstrapping.
[Reviewing Editor: Dr. James Bull] 相似文献
2.
通过对类人猿亚目中部分种类的孕激素受体基因进行分析,重建类人猿亚目的 系统发育关系.扩增并测定了来源于14个属的类人猿亚目物种的孕激素受体编码区序列,并基于这一序列数据,分别采用邻接法、最大简约法和最大似然法重建了系统发育关系.除了阔鼻下目,3种方法构建的系统发生树的拓扑结构类似且各节点支持率高.重建的人猿超科和猴超科内部亲缘关系支持多数人所认可的分类系统.本研究为黑猩猩和人的姐妹群关系提供了证据,提示黑猩猩比大猩猩或其他猿猴更接近人类.阔鼻下目中蜘蛛猴科、卷尾猴科和僧面猴科三者之间的系统发育关系在本研究中未得到很好辨析. 相似文献
3.
Comprehensive phylogenetic trees are essential tools to better understand evolutionary processes. For many groups of organisms or projects aiming to build the Tree of Life, comprehensive phylogenetic analysis implies sampling hundreds to thousands of taxa. For the tree of all life this task rises to a highly conservative 13 million. Here, we assessed the performances of methods to reconstruct large trees using Monte Carlo simulations with parameters inferred from four large angiosperm DNA matrices, containing between 141 and 567 taxa. For each data set, parameters of the HKY85+G model were estimated and used to simulate 20 new matrices for sequence lengths from 100 to 10,000 base pairs. Maximum parsimony and neighbor joining were used to analyze each simulated matrix. In our simulations, accuracy was measured by counting the number of nodes in the model tree that were correctly inferred. The accuracy of the two methods increased very quickly with the addition of characters before reaching a plateau around 1000 nucleotides for any sizes of trees simulated. An increase in the number of taxa from 141 to 567 did not significantly decrease the accuracy of the methods used, despite the increase in the complexity of tree space. Moreover, the distribution of branch lengths rather than the rate of evolution was found to be the most important factor for accurately inferring these large trees. Finally, a tree containing 13,000 taxa was created to represent a hypothetical tree of all angiosperm genera and the efficiency of phylogenetic reconstructions was tested with simulated matrices containing an increasing number of nucleotides up to a maximum of 30,000. Even with such a large tree, our simulations suggested that simple heuristic searches were able to infer up to 80% of the nodes correctly. 相似文献
4.
以系统发育树构建的原有距离方法为基础,吸取了NJ法和FM法中的部分理论,提出了以节点引入为手段的新的简易方法,通过该方法构建了分子系统发育树,结果表明这种方法更加快捷,而且所得结果与FM法完全一致。 相似文献
5.
Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology from both practical and theoretical considerations. We study algorithms which are based on a characterization of edge-weighted trees by distances to LCAs (Least Common Ancestors). This characterization enables a direct application of ultrametric reconstruction techniques to trees which are not necessarily ultrametric. A simple and natural neighbor joining criterion based on this observation is used to provide a family of efficient neighbor-joining algorithms. These algorithms are shown to reconstruct a refinement of the Buneman tree, which implies optimal robustness to noise under criteria defined by Atteson. In this sense, they outperform many popular algorithms such as Saitou and Nei's NJ. One member of this family is used to provide a new simple version of the 3-approximation algorithm for the closest additive metric under the iota (infinity) norm. A byproduct of our work is a novel technique which yields a time optimal O (n (2)) implementation of common clustering algorithms such as UPGMA. 相似文献
6.
V Makarenkov 《Bioinformatics (Oxford, England)》2001,17(7):664-668
T-REX (tree and reticulogram reconstruction) is an application to reconstruct phylogenetic trees and reticulation networks from distance matrices. The application includes a number of tree fitting methods like NJ, UNJ or ADDTREE which have been very popular in phylogenetic analysis. At the same time, the software comprises several new methods of phylogenetic analysis such as: tree reconstruction using weights, tree inference from incomplete distance matrices or modeling a reticulation network for a collection of objects or species. T-REX also allows the user to visualize obtained tree or network structures using Hierarchical, Radial or Axial types of tree drawing and manipulate them interactively. AVAILABILITY: T-REX is a freeware package available online at: http://www.fas.umontreal.ca/biol/casgrain/en/labo/t-rex 相似文献
7.
We reconstructed a robust phylogenetic tree of the Metazoa, consisting of almost 1,500 taxa, by profile neighbor joining (PNJ), an automated computational method that inherits the efficiency of the neighbor joining algorithm. This tree supports the one proposed in the latest review on metazoan phylogeny. Our main goal is not to discuss aspects of the phylogeny itself, but rather to point out that PNJ can be a valuable tool when the basal branching pattern of a large phylogenetic tree must be estimated, whereas traditional methods would be computationally impractical. 相似文献
8.
Balanced minimum evolution (BME) is a statistically consistent distance-based method to reconstruct a phylogenetic tree from
an alignment of molecular data. In 2000, Pauplin showed that the BME method is equivalent to optimizing a linear functional
over the BME polytope, the convex hull of the BME vectors obtained from Pauplin’s formula applied to all binary trees. The
BME method is related to the Neighbor Joining (NJ) Algorithm, now known to be a greedy optimization of the BME principle.
Further, the NJ and BME algorithms have been studied previously to understand when the NJ Algorithm returns a BME tree for
small numbers of taxa. In this paper we aim to elucidate the structure of the BME polytope and strengthen knowledge of the
connection between the BME method and NJ Algorithm. We first prove that any subtree-prune-regraft move from a binary tree
to another binary tree corresponds to an edge of the BME polytope. Moreover, we describe an entire family of faces parameterized
by disjoint clades. We show that these clade-faces are smaller dimensional BME polytopes themselves. Finally, we show that for any order of joining nodes to form a tree, there
exists an associated distance matrix (i.e., dissimilarity map) for which the NJ Algorithm returns the BME tree. More strongly,
we show that the BME cone and every NJ cone associated to a tree T have an intersection of positive measure. 相似文献
9.
RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees 总被引:1,自引:0,他引:1
MOTIVATION: The computation of large phylogenetic trees with statistical models such as maximum likelihood or bayesian inference is computationally extremely intensive. It has repeatedly been demonstrated that these models are able to recover the true tree or a tree which is topologically closer to the true tree more frequently than less elaborate methods such as parsimony or neighbor joining. Due to the combinatorial and computational complexity the size of trees which can be computed on a Biologist's PC workstation within reasonable time is limited to trees containing approximately 100 taxa. RESULTS: In this paper we present the latest release of our program RAxML-III for rapid maximum likelihood-based inference of large evolutionary trees which allows for computation of 1.000-taxon trees in less than 24 hours on a single PC processor. We compare RAxML-III to the currently fastest implementations for maximum likelihood and bayesian inference: PHYML and MrBayes. Whereas RAxML-III performs worse than PHYML and MrBayes on synthetic data it clearly outperforms both programs on all real data alignments used in terms of speed and final likelihood values. Availability SUPPLEMENTARY INFORMATION: RAxML-III including all alignments and final trees mentioned in this paper is freely available as open source code at http://wwwbode.cs.tum/~stamatak CONTACT: stamatak@cs.tum.edu. 相似文献
10.
In this study, we develop a distance method for inferring unrooted species trees from a collection of unrooted gene trees. The species tree is estimated by the neighbor joining (NJ) tree built from a distance matrix in which the distance between two species is defined as the average number of internodes between two species across gene trees, that is, average gene-tree internode distance. The distance method is named NJ(st) to distinguish it from the original NJ method. Under the coalescent model, we show that if gene trees are known or estimated correctly, the NJ(st) method is statistically consistent in estimating unrooted species trees. The simulation results suggest that NJ(st) and STAR (another coalescence-based method for inferring species trees) perform almost equally well in estimating topologies of species trees, whereas the Bayesian coalescence-based method, BEST, outperforms both NJ(st) and STAR. Unlike BEST and STAR, the NJ(st) method can take unrooted gene trees to infer species trees without using an outgroup. In addition, the NJ(st) method can handle missing data and is thus useful in phylogenomic studies in which data sets often contain missing loci for some individuals. 相似文献
11.
Murayama M Tazumi A Hayashi K Nakanishi S Tasaki E Ueno H Nakajima T Matsubara K Moore JE Millar BC Matsuda M 《Folia microbiologica》2011,56(5):397-406
Molecular cloning, nucleotide sequencing, and characterization of the flaA gene from additional isolates of urease-positive thermophilic Campylobacter (UPTC) were performed. These isolates were obtained from the natural environment in Northern Ireland (n?=?9 from mussels) and in England (n?=?1 from sea water). All isolates carried the shorter flaA gene, [open reading frames (ORFs), 1,461 to 1,503?base pairs], without any internal termination codons, and did not carry any flaA pseudogenes. The UPTC isolates were well discriminated by the neighbor joining (NJ) phylogenetic tree constructed based on the putative flaA genes ORFs nucleotide sequence information. In addition, the NJ tree constructed based on the flaA-short variable region sequence information discriminated the Campylobacter lari isolates with a similar degree of discrimination power. 相似文献
12.
Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. 总被引:15,自引:0,他引:15
The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary least-squares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix and then do a topological search from that starting point. The first stage requires O(n(3)) time, where n is the number of taxa, while the current implementations of the second are in O(p n(3)) or more, where p is the number of swaps performed by the program. In this paper, we examine a greedy approach to minimum evolution which produces a starting topology in O(n(2)) time. Moreover, we provide an algorithm that searches for the best topology using nearest neighbor interchanges (NNIs), where the cost of doing p NNIs is O(n(2) + p n), i.e., O(n(2)) in practice because p is always much smaller than n. The Greedy Minimum Evolution (GME) algorithm, when used in combination with NNIs, produces trees which are fairly close to NJ trees in terms of topological accuracy. We also examine ME under a balanced weighting scheme, where sibling subtrees have equal weight, as opposed to the standard "unweighted" OLS, where all taxa have the same weight so that the weight of a subtree is equal to the number of its taxa. The balanced minimum evolution scheme (BME) runs slower than the OLS version, requiring O(n(2) x diam(T)) operations to build the starting tree and O(p n x diam(T)) to perform the NNIs, where diam(T) is the topological diameter of the output tree. In the usual Yule-Harding distribution on phylogenetic trees, the diameter expectation is in log(n), so our algorithms are in practice faster that NJ. Moreover, this BME scheme yields a very significant improvement over NJ and other distance-based algorithms, especially with large trees, in terms of topological accuracy. 相似文献
13.
Sequence alignments of multiple genes are routinely used to infer phylogenetic relationships among species. The analysis of their concatenation is more likely to give correct results under an assumption of homotachy (i.e., the evolutionary rates within lineages in each of the concatenated genes are constant during evolution). Here, we examine how the violation of homotachy (i.e., presence of within-site rate variation, called heterotachy) distorts species phylogenies. A theoretical examination has been conducted using a four taxon case and the neighbor joining (NJ) method, concluding that NJ recovers the incorrect tree when concatenated genes exhibit heterotachy. The application of average and weighted-average distance approaches, where gene boundaries are kept intact, overcomes the detrimental effect of heterotachy in multigene analysis using the NJ method. 相似文献
14.
The consistency of several phylogeny-inference methods under varying evolutionary rates. 总被引:7,自引:0,他引:7
R W DeBry 《Molecular biology and evolution》1992,9(3):537-551
A phylogenetic method is a consistent estimator of phylogeny if and only if it is guaranteed to give the correct tree, given that sufficient (possibly infinite) independent data are examined. The following methods are examined for consistency: UPGMA (unweighted pair-group method, averages), NJ (neighbor joining), MF (modified Farris), and P (parsimony). A two-parameter model of nucleotide sequence substitution is used, and the expected distribution of character states is calculated. Without perfect correction for superimposed substitutions, all four methods may be inconsistent if there is but one branch evolving at a faster rate than the other branches. Partial correction of observed distances improves the robustness of the NJ method to rate variation, and perfect correction makes the NJ method a consistent estimator for all combinations of rates that were examined. The sensitivity of all the methods to unequal rates varies over a wide range, so relative-rate tests are unlikely to be a reliable guide for accepting or rejecting phylogenies based on parsimony analysis. 相似文献
15.
Absolute fast converging phylogenetic reconstruction methods are provably guaranteed to recover the true tree with high probability from sequences that grow only polynomially in the number of leaves, once the edge lengths are bounded arbitrarily from above and below. Only a few methods have been determined to be absolute fast converging; these have all been developed in just the last few years, and most are polynomial time. In this paper, we compare pre-existing fast converging methods as well as some new polynomial time methods that we have developed. Our study, based upon simulating evolution under a wide range of model conditions, establishes that our new methods outperform both neighbor joining and the previous fast converging methods, returning very accurate large trees, when these other methods do poorly. 相似文献
16.
17.
为探明长江中下游不同湖泊中短颌鲚(Coilia brachygnathus)遗传多样性水平和遗传分化程度,以洞庭湖、长湖、巢湖3个地理群体作为研究对象,采用线粒体控制区序列为分子标记,分别应用软件Dna SP 5.0、Arlequin3.1.1、MEGA5.0和Network 5.1进行了遗传参数统计和单倍型间分子变异分析(AMOVA),构建邻接系统树及单倍型网络图。对长江中下游短颌鲚野生群体的遗传多样性和遗传结构进行分析。结果显示,用来分析的1 236 bp D-loop区序列中共90个变异位点,54个简约信息位点。长江中下游3个地理群体中共发现58个单倍型,单倍型多样性(h)范围0.949~0.982,核苷酸多样性范围0.004 99~0.006 21,说明长江中下游3个湖泊短颌鲚地理群体具有较高的遗传多样性水平。3个短颌鲚地理群体遗传分化指数(Fst)为0.265 95,呈现出中等程度的分化水平,主要表现在巢湖群体与其他群体之间处于中等程度分化水平。依据遗传距离构建系统发育树及单倍型网络图也出现相类似结果。 相似文献
18.
Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny 总被引:24,自引:6,他引:18
The relative efficiencies of different protein-coding genes of the
mitochondrial genome and different tree-building methods in recovering a
known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum,
chicken, frog, and three bony fish species) was evaluated. The
tree-building methods examined were the neighbor joining (NJ), minimum
evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and
both nucleotide sequences and deduced amino acid sequences were analyzed.
Generally speaking, amino acid sequences were better than nucleotide
sequences in obtaining the true tree (topology) or trees close to the true
tree. However, when only first and second codon positions data were used,
nucleotide sequences produced reasonably good trees. Among the 13 genes
examined, Nd5 produced the true tree in all tree-building methods or
algorithms for both amino acid and nucleotide sequence data. Genes Cytb and
Nd4 also produced the correct tree in most tree-building algorithms when
amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed
a poor performance. In general, large genes produced better results, and
when the entire set of genes was used, all tree-building methods generated
the true tree. In each tree-building method, several distance measures or
algorithms were used, but all these distance measures or algorithms
produced essentially the same results. The ME method, in which many
different topologies are examined, was no better than the NJ method, which
generates a single final tree. Similarly, an ML method, in which many
topologies are examined, was no better than the ML star decomposition
algorithm that generates a single final tree. In ML the best substitution
model chosen by using the Akaike information criterion produced no better
results than simpler substitution models. These results question the
utility of the currently used optimization principles in phylogenetic
construction. Relatively simple methods such as the NJ and ML star
decomposition algorithms seem to produce as good results as those obtained
by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML
methods in obtaining the correct tree were nearly the same when amino acid
sequence data were used. The most important factor in constructing reliable
phylogenetic trees seems to be the number of amino acids or nucleotides
used.
相似文献
19.
为了探究进化模型对DNA条形码分类的影响, 本研究以雾灵山夜蛾科44个种的标本为材料, 获得COI基因序列。使用邻接法(neighbor-joining)、 最大简约法(maximum parsimony)、 最大似然法(maximum likelihood)以及贝叶斯法(Bayesian inference)构建系统发育树, 并且对邻接法的12种模型、 最大似然法的7种模型、 贝叶斯法的2种模型进行模型成功率的评估。结果表明, 邻接法的12种模型成功率相差不大, 较稳定; 最大似然法及贝叶斯法的不同模型成功率存在明显差异, 不稳定; 最大简约法不基于模型, 成功率比较稳定。邻接法及最大似然法共有6种相同的模型, 这6种模型在不同的方法中成功率存在差异。此外, 分子数据中存在单个物种仅有一条序列的情况, 显著降低了模型成功率, 表明在DNA条形码研究中, 每个物种需要有多个样本。 相似文献
20.
Using simulated data, we compared five methods of phylogenetic tree
estimation: parsimony, compatibility, maximum likelihood, Fitch-
Margoliash, and neighbor joining. For each combination of substitution
rates and sequence length, 100 data sets were generated for each of 50
trees, for a total of 5,000 replications per condition. Accuracy was
measured by two measures of the distance between the true tree and the
estimate of the tree, one measure sensitive to accuracy of branch lengths
and the other not. The distance-matrix methods (Fitch- Margoliash and
neighbor joining) performed best when they were constrained from estimating
negative branch lengths; all comparisons with other methods used this
constraint. Parsimony and compatibility had similar results, with
compatibility generally inferior; Fitch- Margoliash and neighbor joining
had similar results, with neighbor joining generally slightly inferior.
Maximum likelihood was the most successful method overall, although for
short sequences Fitch- Margoliash and neighbor joining were sometimes
better. Bias of the estimates was inferred by measuring whether the
independent estimates of a tree for different data sets were closer to the
true tree than to each other. Parsimony and compatibility had particular
difficulty with inaccuracy and bias when substitution rates varied among
different branches. When rates of evolution varied among different sites,
all methods showed signs of inaccuracy and bias.
相似文献