首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
ESTIMATING CHARACTER WEIGHTS DURING TREE SEARCH   总被引:9,自引:2,他引:7  
Abstract— A new method for weighting characters according to their homoplasy is proposed; the method is non-iterative and does not require independent estimations of weights. It is based on searching trees with maximum total fit, with character fits defined as a concave function of homoplasy. Then, when comparing trees, differences in steps occurring in characters which show more homoplasy on the trees are less influential. The reliability of the characters is estimated, during the analysis, as a logical implication of the trees being compared. The "fittest" trees imply that the characters are maximally reliable and, given character conflict, have fewer steps for the characters which fit the tree better. If other trees save steps in some characters, it will be at the expense of gaining them in characters with less homoplasy.  相似文献   

2.
Summary The statistical properties of three molecular tree construction methods—the unweighted pair-group arithmetic average clustering (UPG), Farris, and modified Farris methods—are examined under the neutral mutation model of evolution. The methods are compared for accuracy in construction of the topology and estimation of the branch lengths, using statistics of these two aspects. The distribution of the statistic concerning topological construction is shown to be as important as its mean and variance for the comparison.Of the three methods, the UPG method constructs the tree topology with the least variation. The modified Farris method, however, gives the best performance when the two aspects are considered simultaneously. It is also shown that a topology based on two genes is much more accurate than that based on one gene.There is a tendency to accept published molecular trees, but uncritical acceptance may lead one to spurious conclusions. It should always be kept in mind that a tree is a statistical result that is affected strongly by the stochastic error of nucleotide substitution and the error intrinsic to the tree construction method itself.  相似文献   

3.
Accuracy of estimated phylogenetic trees from molecular data   总被引:2,自引:0,他引:2  
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained.  相似文献   

4.
Accuracy of estimated phylogenetic trees from molecular data   总被引:27,自引:0,他引:27  
The accuracies and efficiencies of three different methods of making phylogenetic trees from gene frequency data were examined by using computer simulation. The methods examined are UPGMA, Farris' (1972) method, and Tateno et al.'s (1982) modified Farris method. In the computer simulation eight species (or populations) were assumed to evolve according to a given model tree, and the evolutionary changes of allele frequencies were followed by using the infinite-allele model. At the end of the simulated evolution five genetic distance measures (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's f theta, and the modified Cavalli-Sforza distance) were computed for all pairs of species, and the distance matrix obtained for each distance measure was used for reconstructing a phylogenetic tree. The phylogenetic tree obtained was then compared with the model tree. The results obtained indicate that in all tree-making methods examined the accuracies of both the topology and branch lengths of a reconstructed tree (rooted tree) are very low when the number of loci used is less than 20 but gradually increase with increasing number of loci. When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error as measured by the distortion index (dT) is not great, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci. When M is as small as 0.004, P is substantially lower. In obtaining a good topology (small dT and high P) UPGMA and the modified Farris method generally show a better performance than the Farris method. The poor performance of the Farris method is observed even when Rogers' distance which obeys the triangle inequality is used. The main reason for this seems to be that the Farris method often gives overestimates of branch lengths. For estimating the expected branch lengths of the true tree UPGMA shows the best performance. For this purpose Nei's standard distance gives a better result than the others because of its linear relationship with the number of gene substitutions. Rogers' or Cavalli-Sforza's distance gives a phylogenetic tree in which the parts near the root are condensed and the other parts are elongated. It is recommended that more than 30 loci, including both polymorphic and monomorphic loci, be used for making phylogenetic trees. The conclusions from this study seem to apply also to data on nucleotide differences obtained by the restriction enzyme techniques.  相似文献   

5.
The new procedure for constructing a Wagner network presented differs from Farris’s (1970) method in that the amount of computation required is reduced. The usefulness of this procedure was examined by applying it to the 20 characters considered in a recent monograph of the seven OTUs of the genusPentachaeta. A single network was derived from some 945 or more networks possible for this group. A comparison of the network constructed by this simplified method to that constructed by Farris’ procedure revealed no differences. An attempt to reconstruct the cladistic history of this group by generating a Wagner tree based on the network resulted in four equally possible trees, suggesting that further data are needed before cladogenesis in this group is resolved.  相似文献   

6.
Concerted homoplasy is a concern in phylogenetics because of its potential to generate strong but misleading evidence of relationships. Marques and Gnaspini (Cladistics 17, 371–381, 2001) proposed a method of recoding characters suspected of concerted homoplasy that avoids problems associated with their wholesale exclusion or inclusion. Here we show that the proposed method is analytically equivalent to excluding the recoded characters.  相似文献   

7.
The construction and interpretation of gene trees is fundamental in molecular systematics. If the gene is defined in a historical (coalescent) sense, there can be multiple gene trees within the single contiguous set of nucleotides, and attempts to construct a single tree for such a sequence must deal with homoplasy created by conflict among divergent histories. On a larger scale, incongruence is expected among gene tree topologies at different loci of individuals within sexually reproducing species, and it has been suggested that this discordance can be used to delimit species. A practical concern for such topological methods is that polymorphisms may be maintained through numerous cladogenic events; this polymorphism problem is less of a concern for nontopological approaches to species delimitation using molecular data. Although a central theoretical concern in molecular systematics is discordance between a given gene tree and the true "species tree," the primary empirical problem faced in reconstructing taxic phylogeny is incongruence among the trees inferred from different sequences. Linkage relationships limit character independence and thus have important implications for handling multiple data sets in phylogenetic analysis, particularly at the species level, where incongruence among different historically associated loci is expected. Gene trees can also be reconstructed for loci that influence phenotypic characters, but there is at best a tenuous relationship between phenotypic homoplasy and homoplasy in such gene trees. Nevertheless, expression patterns and orthology relationships of genes involved in the expression of phenotypes can in theory provide criteria for homology assessment of morphological characters.  相似文献   

8.
The neighbor-joining method: a new method for reconstructing phylogenetic trees   总被引:673,自引:29,他引:673  
A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.   相似文献   

9.
HOMOPLASY AND THE CHOICE AMONG CLADOGRAMS   总被引:6,自引:0,他引:6  
Abstract Cladistic data are more decisive when the possible trees differ more in tree length. When all the possible dichotomous trees have the same length, no one tree is better supported than the others, and the data are completely undecisive . From a rule for recursively generating undecisive matrices for different numbers of taxa, formulas to calculate consistency, rescaled consistency and retention indices in undecisive matrices are derived. The least decisive matrices are not the matrices with the lowest possible consistency, rescaled consistency or retention indices (on the most parsimonious trees); those statistics do not directly vary with decisiveness. Decisiveness can be measured with a newly proposed statistic, DD = − S )/( − S ) (where S = length of the most parsimonious cladogram, = mean length of all the possible cladograms for the data set and M = observed variation). For any data set, can be calculated exactly with simple formulas; it depends on the types of characters present, and not on their congruence. Despite some recent assertions to the contrary, the consistency index is an appropriate measure of homoplasy (= deviation from hierarchy). The retention index seems more appropriate for comparing the fit of different trees for the same data set.  相似文献   

10.
A method is presented for removing recent homoplastic events from a phylogenetic tree. This “topiary pruning” method produces a series of progressively modified duplicates of the original set of data, from which more and more of the most recent substitutions have been removed. The edited sets of data have increased amounts of information per remaining taxon, while similar but randomized data sets subjected to topiary pruning do not. The ability of topiary pruning to “unscramble” artificial data sets that have high levels of homoplasy is demonstrated, and is shown to be similar in its effects to the weighting method of Kluge and Farris (1969), although with the additional advantage of reducing the number of taxa to the point where bootstrapping is feasible. Pruning and weighting used together produce closer approximations to the “true” tree than either method used separately. It is further shown that in these artificial data sets midpoint rooting is more likely to be accurate than outgroup rooting. When pruning and weighting are applied to the extensive sets of mitochondrial DNA data of Cann et al. (1987) and Vigilant et al. (1991), trees result that have deep branch points, some of which lead to entirely African branches. In the case of the Vigilant et al. data, the three African branches have bootstrap values between 0.94 and 1.0, and the consensus and bootstrap midpoint roots also have high bootstrap values and occur on these African branches near their junction. An African origin of the human mitochondrial tree is not proved by this approach, particularly since sequences from non-African groups are underrepresented in current data sets, but it is rendered more likely.  相似文献   

11.
Measuring Topological Congruence by Extending Character Techniques   总被引:1,自引:0,他引:1  
A measure of topological congruence which is an extension of the Mickevich–Farris character incongruence metric ( i.e. , ILD; Mickevich and Farris, 1981) is proposed. Group inclusion characters (1 = member of a clade; 0 = not a member) are constructed for each topology to be considered. The sets of characters derived from the topologies are then compared for character incongruence due to data set combination. Each homoplasy signifies a disagreement among topological statements. The value is normalized for potential maximum incongruence to adjust values for unresolved topologies. This measure is compared to other topological and character congruence techniques and explored in test data.  相似文献   

12.
Abstract— Inspection of trees of varying lengths (by the option ALL TREES, which produces a histogram for tree lengths in PAUP 3.0) has been used to evaluate cladistic data and results. For example, a result may be judged more effective if the groups supported in the most parsimonious tree are preserved in trees that require increasingly greater amounts of homoplasy. Evaluation of grouping purely on the basis of this stability criterion ignores other highly relevant aspects of cladistic results. In particular, some data sets may incorporate additional taxa that introduce homoplasy to the shortest tree in a manner that concurrently allows for a revised understanding of character optimization patterns. These taxa may render groups preserved in the shortest tree less stable, but this result is not necessarily deficient if the homoplasy underlying such instability reveals possible character state changes for the given taxa irretrievable from the original matrix. The hypothetical example described here is relevant to so called "stem", "basal" or "plesiomorphic sister" taxa that are commonly considered in studies of both fossil and extant taxa.  相似文献   

13.
While previous workers have argued persuasively that ammonoid workers should use cladistic approaches to reconstruct phylogeny, relatively few cladistic studies have been published to date. An essential yet challenging part of cladistic analysis is the selection of characters. Are certain types of characters more likely to show homoplasy? Are certain aspects of shell anatomy more likely to contain phylogenetically informative characters? Are datasets with more characters inherently better? To answer these questions, a meta-analysis of character data from published ammonoid phylogenies was performed. I compiled 14 datasets, published between 1989 and 2007, representing parsimony-based phylogenetic analyses of ammonoids. These studies defined a combined total of 323 characters, which were grouped into categories reflecting different aspects of anatomy: shell size and shape, ornament, suture, early ontogeny, body chamber and apertural modifications. Tree searches were re-run to determine overall tree statistics, parsimony permutation tail probability (PTP) tests were calculated to assess the phylogenetic information content of the matrices, and retention and rescaled consistency indices for each character were calculated. My analyses revealed that studies with higher character/taxon ratios did not necessarily produce trees with more information content and less homoplasy, as measured by retention or rescaled consistency indices, because additional characters were often parsimony-uninformative. Rather, studies with relatively few characters could produce high-quality trees if the characters were well-chosen and character states carefully defined. Characters related to the body chamber and adult aperture typically had retention indices of either 0 or 1, rarely in between, indicating that they either worked perfectly or not at all. Suture characters tended to have higher indices than shell shape or ornament characters, suggesting more phylogenetic information and less homoplasy in the suture line than in shell traits. These results should aid in the selection of characters for future cladistic studies of ammonoids.  相似文献   

14.
Given the importance of phylogenetic trees to understanding common ancestry and evolution, they are a necessary part of the undergraduate biology curriculum. However, a number of common misconceptions, such as reading across branch tips and understanding homoplasy, can pose difficulties in student understanding. Students also may take phylogenetic trees to be fact, instead of hypotheses. Below we outline a case study that we have used in upper-level undergraduate evolution and ichthyology courses that utilizes shark teeth (representing fossils), body characters, and mitochondrial genes. Students construct their own trees using freely available software, and are prompted to compare their trees with a series of questions. Finally, students explore homoplasy, polytomies, and trees as hypotheses during a class discussion period. This case study gives students practice with tree-thinking, as well as demonstrating that tree topology is reliant on which characters and tree-building algorithms are used.  相似文献   

15.
Summary The methods of Fitch and Margoliash and of Farris for the construction of phylogenetic trees were compared. A phenetic clustering technique - the UPGMA method — was also considered.The three methods were applied to difference matrices obtained from comparison of macromolecules by immunological, DNA hybridization, electrophoretic, and amino acid sequencing techniques. To evaluate the results, we used the goodness-of-fit criterion. In some instances, the F-M and Farris methods gave a comparably good fit of the output to the input data, though in most cases the F-M procedure gave a much better fit. By the fit criterion, the UPGMA procedure was on the average better than the Farris method but not as good as the F-M procedure.On the basis of the results given in this report and the goodness-of-fit criterion, it is suggested that where input data are likely to include overestimates as well as true estimates and underestimates of the actual distances between taxonomic units, the F-M method is the most reasonable to use for constructing phylogenies from distance matrices. Immunological, DNA hybridization, and electrophoretic data fall into this category. By contrast, where it is known that each input datum is indeed either a true estimate or an underestimate of the actual distance between 2 taxonomic units, the Farris procedure appears, on theoretical grounds, to be the matrix method of choice. Amino acid and nucleotide sequence data are in this category.The following abbreviations are used in this work F-M Fitch-Margoliash - UPGMA unweighted pair-group method using arithmetic averages - SD percent standard deviation  相似文献   

16.
This paper examines the efficiency of the incongruence length difference test (ILD) proposed by Farris et al. (1994) for assessing the incongruence between sets of characters. DNA sequences were simulated under various evolutionary conditions: (1) following symmetric or asymmetric trees, (2) with various mutation rates, (3) with constant or variable evolutionary rates along the branches, and (4) with different among-site substitution rates. We first compared two sets of sequences generated along the same tree and under the same evolutionary conditions. The probability of a Type-I error (wrongly rejecting the true hypothesis of congruence) was substantially below the standard 5% level of significance given by the ILD test; this finding indicates that the choice of the 5% level is rather conservative in this case. We then compared two data sets, still generated along the same tree, but under different evolutionary conditions (constant vs. variable evolutionary rate, homogeneity vs. heterogeneity rate of substitution). Under these conditions, the probability of rejecting the true hypothesis of congruence was greater than the 5% given by the ILD test and increased with the number of sites and the degree to which the tree was asymmetric. Finally, the comparison of the two data sets, simulated under contrasting tree structures (symmetric vs. asymmetric) but under the same evolutionary conditions, led us to reject the hypothesis of congruence, albeit weakly, particularly when the number of informative sites was low and among-site substitution rate heterogeneous. We conclude that the ILD test has only limited power to detect incongruence caused by differences in the evolutionary conditions or in the tree topology, except when numerous characters are present and the substitution rate is homogeneous from site to site.  相似文献   

17.
The emerging molecular evolutionary tree for placental mammals differs greatly from morphological trees, leading to repeated suggestions that morphology is uninformative at this level. This view is here refuted empirically, using an extensive morphological and molecular dataset totalling 17 431 characters. When analysed alone, morphology indeed is highly misleading, contradicting nearly every clade in the preferred tree (obtained from the molecular or the combined data). Widespread homoplasy overrides historical signal. However, when added to the molecular data, morphology surprisingly increases support for most clades in the preferred tree. The homoplasy in the morphology is incongruent with all aspects of the molecular signal, while the historical signal in the morphology is congruent with (and amplifies) the historical signal in the molecular data. Thus, morphology remains relevant in the genomic age, providing vital independent corroboration of the molecular tree of mammals.  相似文献   

18.
Phylogenetic trees based on mtDNA polymorphisms are often used to infer the history of recent human migrations. However, there is no consensus on which method to use. Most methods make strong assumptions which may bias the choice of polymorphisms and result in computational complexity which limits the analysis to a few samples/polymorphisms. For example, parsimony minimizes the number of mutations, which biases the results to minimizing homoplasy events. Such biases may miss the global structure of the polymorphisms altogether, with the risk of identifying a "common" polymorphism as ancient without an internal check on whether it either is homoplasic or is identified as ancient because of sampling bias (from oversampling the population with the polymorphism). A signature of this problem is that different methods applied to the same data or the same method applied to different datasets results in different tree topologies. When the results of such analyses are combined, the consensus trees have a low internal branch consensus. We determine human mtDNA phylogeny from 1737 complete sequences using a new, direct method based on principal component analysis (PCA) and unsupervised consensus ensemble clustering. PCA identifies polymorphisms representing robust variations in the data and consensus ensemble clustering creates stable haplogroup clusters. The tree is obtained from the bifurcating network obtained when the data are split into k = 2,3,4,...,kmax clusters, with equal sampling from each haplogroup. Our method assumes only that the data can be clustered into groups based on mutations, is fast, is stable to sample perturbation, uses all significant polymorphisms in the data, works for arbitrary sample sizes, and avoids sample choice and haplogroup size bias. The internal branches of our tree have a 90% consensus accuracy. In conclusion, our tree recreates the standard phylogeny of the N, M, L0/L1, L2, and L3 clades, confirming the African origin of modern humans and showing that the M and N clades arose in almost coincident migrations. However, the N clade haplogroups split along an East-West geographic divide, with a "European R clade" containing the haplogroups H, V, H/V, J, T, and U and a "Eurasian N subclade" including haplogroups B, R5, F, A, N9, I, W, and X. The haplogroup pairs (N9a, N9b) and (M7a, M7b) within N and M are placed in nonnearest locations in agreement with their expected large TMRCA from studies of their migrations into Japan. For comparison, we also construct consensus maximum likelihood, parsimony, neighbor joining, and UPGMA-based trees using the same polymorphisms and show that these methods give consistent results only for the clade tree. For recent branches, the consensus accuracy for these methods is in the range of 1-20%. From a comparison of our haplogroups to two chimp and one bonobo sequences, and assuming a chimp-human coalescent time of 5 million years before present, we find a human mtDNA TMRCA of 206,000 +/- 14,000 years before present.  相似文献   

19.
The cladistic literature does not always specify the kind of multistate character treatment that is applied for an analysis. Characters can be treated either as unordered transformation series or as rooted [three‐item analysis (3ia)] or unrooted state trees (ordered characters). We aimed to measure the impact of these character treatments on phylogenetic inference. Discrete characters can be represented either as rows or columns in matrices (e.g. for parsimony) or as hierarchies for 3ia. In the present study, we use simulated and empirical examples to assess the relative merits of each method considering both the character treatment and representation. We measure two parameters (resolving power and artefactual resolution) using a new tree comparison metric, ITRI (inter‐tree retention index). Our results suggest that the hierarchical character representation not only results (with our simulation settings) in the greatest resolving power, but also in the highest artefactual resolution. Our empirical examples provide equivocal results. Parsimony unordered states yield less resolving power and more artefactual resolutions than parsimony ordered states, both with our simulated and empirical data. Relationships between three operational taxonomic units (OTUs), irrespective of their relationships with other OTUs, are called three‐item statements (3is). We compare the intersection tree (which reconstructs a single tree from all of the common 3is of source trees) with the traditional strict consensus and show that the intersection tree retains more of the information contained in the source trees. © 2013 The Linnean Society of London, Biological Journal of the Linnean Society, 2013, 110 , 914–930.  相似文献   

20.
New examples are presented, showing that supertree methods such as matrix representation with parsimony, minimum flip trees, and compatibility analysis of the matrix representing the input trees, produce supertrees that cannot be interpreted as displaying the groups present in the majority of the input trees. These methods may produce a supertree displaying some groups present in the minority of the trees, and contradicted by the majority. Of the three methods, compatibility analysis is the least used, but it seems to be the one that differs the least from majority rule consensus. The three methods are similar in that they choose the supertree(s) that best fit the set of input trees (quantified as some measure of the fit to the matrix representation of the input trees); in the case of complete trees, it is argued that, for a supertree method to be equivalent to majority rule or frequency difference consensus, two necessary (but not sufficient) conditions must be met. First, the measure of fit between a supertree and an input tree must be symmetrical. Second, the fit for a character representing a group must be measured as absolute: either it fits or it does not fit. In the restricted case of complete and equally resolved input trees, compatibility analysis (unlike MRP and minimum flipping) fulfils these two conditions: it is symmetrical (i.e., as long as the trees have the same taxon sets and are equally resolved, the number of characters in the matrix representation of tree A that require homoplasy in tree B is always the same as the number of characters in the matrix representation of tree B that require homoplasy in tree A) and it measures fit as all‐or‐none. In the case of just two complete and equally resolved input trees, the two conditions (symmetry and absolute fit) are necessary and sufficient, which explains why the compatibility analysis of such trees behaves as majority consensus. With more than two such trees, these conditions are still necessary but no longer sufficient for the equivalence; in such cases, the compatibility supertree may differ significantly from the majority rule consensus, even when these conditions apply (as shown by example). MRP and minimum flipping are asymmetric and measure various degrees of fit for each character, which explains why they often behave very differently from majority rule procedures, and why they are very likely to have groups contradicted by each of the input trees, or groups supported by a minority of the input trees. © The Willi Hennig Society 2005.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号