共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary Three new methods for constructing evolutionary trees from molecular sequence data are presented. These methods are based on a theory for correcting for non-constant evolutionary rates (Klotz et al. 1979; Klotz and Blanken 1981). Extensive computer simulations were run to compare these new methods to the commonly used criteria of Dayhoff (1978) and Fitch and Margoliash (1967). The results of these simulations showed that two of the new methods performed as well as Dayhoff's criterion, significantly better than that of Fitch and Margoliash, and as well as a simple variation of the latter (Prager and Wilson 1978) where any topology containing negative branch mutations is discarded. However, no method yielded the correct topology all of the time, which demonstrated the need to determine confidence estimates in a particular result when evolutionary trees are determined from sequence data. 相似文献
2.
Ivar Heuch 《American journal of human genetics》1976,28(4):428-429
3.
From the measures of evolutionary distance between pairs ofsequences in a set, it is possible to infer the genetic treeor trees which best fit these known data. DENDRON is a new program,written in FORTRAN 66, which computes an initial tree from thebottom-up, then searches among increasingly divergent treesfor a better fit. As a check on the consistency of the measures,the program tests all triplets for the triangle inequality.DENDRON also calculates a single top-down tree,progressing from the trunk to the twigs, for comparison withthe bottom-up trees.
Received on August 17, 1987; accepted on June 1, 1988 相似文献
4.
An important issue in the phylogenetic analysis of nucleotide sequence data using the maximum likelihood (ML) method is the
underlying evolutionary model employed. We consider the problem of simultaneously estimating the tree topology and the parameters
in the underlying substitution model and of obtaining estimates of the standard errors of these parameter estimates. Given
a fixed tree topology and corresponding set of branch lengths, the ML estimates of standard evolutionary model parameters
are asymptotically efficient, in the sense that their joint distribution is asymptotically normal with the variance–covariance
matrix given by the inverse of the Fisher information matrix. We propose a new estimate of this conditional variance based
on estimation of the expected information using a Monte Carlo sampling (MCS) method. Simulations are used to compare this
conditional variance estimate to the standard technique of using the observed information under a variety of experimental
conditions. In the case in which one wishes to estimate simultaneously the tree and parameters, we provide a bootstrapping
approach that can be used in conjunction with the MCS method to estimate the unconditional standard error. The methods developed
are applied to a real data set consisting of 30 papillomavirus sequences. This overall method is easily incorporated into
standard bootstrapping procedures to allow for proper variance estimation. 相似文献
5.
MacT is a set of programs for the Apple Macintosh to constructand evaluate unrooted trees derived from amino acid sequencesusing a distance matrix method. Programs are designed on a oneprogramone task basis for (i) determining thebranching order in trees consisting of four or five speciesand calculating various statistical measures, (ii) calculatingstatistical measures for all possible topologies of unrootedtrees and (iii) generating and evaluating trees derived frombootstrapped samples. With four auxiliary programs unrootedtrees can be built for maximal 26 species, and the robustnessof topologies be tested by bootstrapping. 相似文献
6.
The paper concerns the practical realization of the maximum topologic similarity principle for phylogenetic reconstruction. This novel principle is described in the accompanying paper. Two algorithms that were embodied in the computer program allow one to find out the unique tree in case when source data admit the existence of such tree. In case if numerous parallel mutations make such precise realization impossible, algorithms allow one to obtain approximations to the maximum topologic similarity trees with a high computation efficiency. Examples illustrating use of these algorithms, as well as discussion of biological consistency of the novel concept are presented. 相似文献
7.
Walter M. Fitch 《Journal of molecular evolution》1981,18(1):30-37
Summary A procedure is presented that forms an unrooted tree-like structure from a matrix of pairwise differences. The tree is not formed a portion at a time, as methods now in use generally do, but is formed en toto without intervening estimates of branch lengths. The method is based on a relaxed additivity (four-point metric) constraint. From the tree, a classification may be formed. 相似文献
8.
9.
SUMMARY: QDist is a program for computing the quartet distance between two unrooted trees, i.e. the number of quartet topology differences between the trees, where a quartet topology is the topological subtree induced by four species. The program is based on an algorithm with running time O(n log2 n), which makes it practical to compare large trees. Available under GNU license. AVAILABILITY: http://www.birc.dk/Software/QDist 相似文献
10.
Since the initial work of Jukes and Cantor (1969), a number of procedures
have been developed to estimate the expected number of nucleotide
substitutions corresponding to a given observed level of nucleotide
differentiation assuming particular evolutionary models. Unlike the
proportion of different sites, the expected number of substitutions that
would have occurred grows linearly with time and therefore has had great
appeal as an evolutionary distance. Recently, however, a number of authors
have tried to develop improved statistical approaches for generating and
evaluating evolutionary distances (Schoniger and von Haeseler 1993;
Goldstein and Polock 1994; Tajima and Takezaki 1994). These studies clearly
show that the estimated number of nucleotide substitutions is generally not
the best estimator for use in reconstruction of phylogenetic relationships.
The reason for this is that there is often a large error associated with
the estimation of this number. Therefore, even though its expectation is
correct (i.e., on average the expected number of substitutions is
proportional to time- -but see Tajima 1993), it is not expected to be as
useful as estimators designed to have a lower variance.
相似文献
11.
The challenge of constructing large phylogenetic trees 总被引:3,自引:0,他引:3
The amount of sequence data available to reconstruct the evolutionary history of genes and species has increased 20-fold in the past decade. Consequently the size of phylogenetic analyses has grown as well, and phylogenetic methods, algorithms and their implementations have struggled to keep pace. Computational and other challenges raised by this burgeoning database emerge at several stages of analysis, from the optimal assembly of large data matrices from sequence databases, to the efficient construction of trees from these large matrices and the piece-wise assembly of 'supertrees' from those trees in turn. A final challenge is posed by the difficulty of visualizing and making inferences from trees that might soon routinely contain thousands of species. 相似文献
12.
The most commonly used measure of evolutionary distance in molecular
phylogenetics is the number of nucleotide substitutions per site. However,
this number is not necessarily most efficient for reconstructing a
phylogenetic tree. In order to evaluate the accuracy of evolutionary
distance, D(t), for obtaining the correct tree topology, an accuracy index,
A(t), was proposed. This index is defined as D'(t)/square root of[D(t)],
where D'(t) is the first derivative of D(t) with respect to evolutionary
time and V[D(t)] is the sampling variance of evolutionary distance. Using
A(t), namely, finding the condition under which A(t) gives the maximum
value, we can obtain an evolutionary distance which is efficient for
obtaining the correct topology. Under the assumption that the
transversional changes do not occur as frequently as the transitional
changes, we obtained the evolutionary distances which are expected to give
the correct topology more often than are the other distances.
相似文献
13.
To refine the location of a disease gene within the bounds provided by linkage analysis, many scientists use the pattern of linkage disequilibrium between the disease allele and alleles at nearby markers. We describe a method that seeks to refine location by analysis of "disease" and "normal" haplotypes, thereby using multivariate information about linkage disequilibrium. Under the assumption that the disease mutation occurs in a specific gap between adjacent markers, the method first combines parsimony and likelihood to build an evolutionary tree of disease haplotypes, with each node (haplotype) separated, by a single mutational or recombinational step, from its parent. If required, latent nodes (unobserved haplotypes) are incorporated to complete the tree. Once the tree is built, its likelihood is computed from probabilities of mutation and recombination. When each gap between adjacent markers is evaluated in this fashion and these results are combined with prior information, they yield a posterior probability distribution to guide the search for the disease mutation. We show, by evolutionary simulations, that an implementation of these methods, called "FineMap," yields substantial refinement and excellent coverage for the true location of the disease mutation. Moreover, by analysis of hereditary hemochromatosis haplotypes, we show that FineMap can be robust to genetic heterogeneity. 相似文献
14.
Evolutionary branching, which is a coevolutionary phenomenon of the development of two or more distinctive traits from a single trait in a population, is the issue of recent studies on adaptive dynamics. In previous studies, it was revealed that trait variance is a minimum requirement for evolutionary branching, and that it does not play an important role in the formation of an evolutionary pattern of branching. Here we demonstrate that the trait evolution exhibits various evolutionary branching paths starting from an identical initial trait to different evolutional terminus traits as determined by only changing the assumption of trait variance. The key feature of this phenomenon is the topological configuration of equilibria and the initial point in the manifold of dimorphism from which dimorphic branches develop. This suggests that the existing monomorphic or polymorphic set in a population is not an unique inevitable consequence of an identical initial phenotype. 相似文献
15.
Estimating the reliability of evolutionary trees 总被引:9,自引:1,他引:8
Six protein sequences from the same 11 mammalian taxa were used to estimate
the accuracy and reliability of phylogenetic trees using real, rather than
simulated, data. A tree comparison metric was used to measure the increase
in similarity of minimal trees as larger, randomly selected subsets of
nucleotide positions were taken. The ratio of the observed to the expected
number of incompatibilities for each nucleotide position (character) is a
good predictor of the number of changes required at that position on the
minimal (most-parsimonious) tree. This allows a higher weighting of
nucleotide positions that have changed more slowly and should result in the
minimal length tree converging to the correct tree as more sequences are
obtained. An estimate was made of the smallest subset of trees that need to
be considered to include the actual historical tree for a given set of
data. It was concluded that it is possible to give a reasonable estimate of
the reliability of the final tree, at least when several sequences are
combined. With the present data, resolving the rodent- primate-lagomorph
(rabbit) trichotomy is the least certain aspect of the final tree, followed
then by establishing the position of dog. In our opinion, it is
unreasonable to publish an evolutionary tree derived from sequence data
without giving an idea of the reliability of the tree.
相似文献
16.
Liang Liu Lili Yu LauraKubatko Dennis K. Pearl Scott V. Edwards 《Molecular phylogenetics and evolution》2009,53(1):320-328
We review recent models to estimate phylogenetic trees under the multispecies coalescent. Although the distinction between gene trees and species trees has come to the fore of phylogenetics, only recently have methods been developed that explicitly estimate species trees. Of the several factors that can cause gene tree heterogeneity and discordance with the species tree, deep coalescence due to random genetic drift in branches of the species tree has been modeled most thoroughly. Bayesian approaches to estimating species trees utilizes two likelihood functions, one of which has been widely used in traditional phylogenetics and involves the model of nucleotide substitution, and the second of which is less familiar to phylogeneticists and involves the probability distribution of gene trees given a species tree. Other recent parametric and nonparametric methods for estimating species trees involve parsimony criteria, summary statistics, supertree and consensus methods. Species tree approaches are an appropriate goal for systematics, appear to work well in some cases where concatenation can be misleading, and suggest that sampling many independent loci will be paramount. Such methods can also be challenging to implement because of the complexity of the models and computational time. In addition, further elaboration of the simplest of coalescent models will be required to incorporate commonly known issues such as deviation from the molecular clock, gene flow and other genetic forces. 相似文献
17.
Miklós Csur?s 《Journal of computational biology》2002,9(2):277-297
We present a novel distance-based algorithm for evolutionary tree reconstruction. Our algorithm reconstructs the topology of a tree with n leaves in O(n(2)) time using O(n) working space. In the general Markov model of evolution, the algorithm recovers the topology successfully with (1 - o(1)) probability from sequences with polynomial length in n. Moreover, for almost all trees, our algorithm achieves the same success probability on polylogarithmic sample sizes. The theoretical results are supported by simulation experiments involving trees with 500, 1,895, and 3,135 leaves. The topologies of the trees are recovered with high success from 2,000 bp DNA sequences. 相似文献
18.
Wang BF Lin CH 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(5):1258-1272
A gene team is a set of genes that appear in two or more species, possibly in a different order yet with the distance between adjacent genes in the team for each chromosome always no more than a certain threshold δ. A gene team tree is a succinct way to represent all gene teams for every possible value of δ. In this paper, improved algorithms are presented for the problem of finding the gene teams of two chromosomes and the problem of constructing a gene team tree of two chromosomes. For the problem of finding gene teams, Beal et al. had an O(n lg2 n)-time algorithm. Our improved algorithm requires O(n lg t) time, where t ≤ n is the number of gene teams. For the problem of constructing a gene team tree, Zhang and Leong had an O(n lg2 n)-time algorithm. Our improved algorithm requires O(n lg n lglg n) time. Similar to Beal et al.'s gene team algorithm and Zhang and Leong's gene team tree algorithm, our improved algorithms can be extended to k chromosomes with the time complexities increased only by a factor of k. 相似文献
19.
In a previous paper (Klotz et a1., 1979) we described a method for determining evolutionary trees from sequence data when rates of evolution of the sequences might differ greatly. It was shown theoretically that the method always gave the correct topology and root when the exact number of mutation differences between sequences and from their common ancestor was known. However, the method is impractical to use in most situations because it requires some knowledge of the ancestor. In this present paper we describe another method, related to the previous one, in which a present-day sequence can serve temporarily as an ancestor for purposes of determining the evolutionary tree regardless of the rates of evolution of the sequences involved. This new method can be carried out with high precision without the aid of a computer, and it does not increase in difficulty rapidly as the number of sequences involved in the study increases, unlike other methods. 相似文献
20.
Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites 总被引:1,自引:0,他引:1
The general Markov plus invariable sites (GM+I) model of biological sequence evolution is a two-class model in which an unknown proportion of sites are not allowed to change, while the remainder undergo substitutions according to a Markov process on a tree. For statistical use it is important to know if the model is identifiable; can both the tree topology and the numerical parameters be determined from a joint distribution describing sequences only at the leaves of the tree? We establish that for generic parameters both the tree and all numerical parameter values can be recovered, up to clearly understood issues of 'label swapping'. The method of analysis is algebraic, using phylogenetic invariants to study the variety defined by the model. Simple rational formulas, expressed in terms of determinantal ratios, are found for recovering numerical parameters describing the invariable sites. 相似文献