首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
New examples are presented, showing that supertree methods such as matrix representation with parsimony, minimum flip trees, and compatibility analysis of the matrix representing the input trees, produce supertrees that cannot be interpreted as displaying the groups present in the majority of the input trees. These methods may produce a supertree displaying some groups present in the minority of the trees, and contradicted by the majority. Of the three methods, compatibility analysis is the least used, but it seems to be the one that differs the least from majority rule consensus. The three methods are similar in that they choose the supertree(s) that best fit the set of input trees (quantified as some measure of the fit to the matrix representation of the input trees); in the case of complete trees, it is argued that, for a supertree method to be equivalent to majority rule or frequency difference consensus, two necessary (but not sufficient) conditions must be met. First, the measure of fit between a supertree and an input tree must be symmetrical. Second, the fit for a character representing a group must be measured as absolute: either it fits or it does not fit. In the restricted case of complete and equally resolved input trees, compatibility analysis (unlike MRP and minimum flipping) fulfils these two conditions: it is symmetrical (i.e., as long as the trees have the same taxon sets and are equally resolved, the number of characters in the matrix representation of tree A that require homoplasy in tree B is always the same as the number of characters in the matrix representation of tree B that require homoplasy in tree A) and it measures fit as all‐or‐none. In the case of just two complete and equally resolved input trees, the two conditions (symmetry and absolute fit) are necessary and sufficient, which explains why the compatibility analysis of such trees behaves as majority consensus. With more than two such trees, these conditions are still necessary but no longer sufficient for the equivalence; in such cases, the compatibility supertree may differ significantly from the majority rule consensus, even when these conditions apply (as shown by example). MRP and minimum flipping are asymmetric and measure various degrees of fit for each character, which explains why they often behave very differently from majority rule procedures, and why they are very likely to have groups contradicted by each of the input trees, or groups supported by a minority of the input trees. © The Willi Hennig Society 2005.  相似文献   

2.
Three measures intended to assess the fit of stratigraphic age to the fossil record have been suggested previously: the Spearman Rank Correlation (SRC), the Stratigraphic Consistency Index (SCI) and the Relative Completeness Index (RCI). The original formulation of SRC is intractable to all but pectinate trees and the corrective pruning procedure that circumvents this precludes whole-tree estimates of fit. SCI, though it has been claimed otherwise, is strongly biased by tree shape, particularly as one adds more information. RCI is a measure of the amount of gap in the fossil record but has awkward consequences for evolutionary biology when it is maximized. A new approach, the Manhattan Stratigraphic Measure, uses the Manhattan distance between stratigraphic ages to determine fit to a tree. It is not biased by tree shape, it is sensitive to the magnitude of age discrepancy and there is an obvious significance test.  相似文献   

3.
落叶松人工林单木模型的研究   总被引:16,自引:1,他引:15  
根据吉林省松江河林业局所实测的落叶松人工林(Larix olgensis)临时标准地66块、固定标准地18块以及8块团状枝解析样地资料,通过对林分中优势木生长及树冠结构与动态的分析,提出适于树木生长的Korf方程并用来构造林木的潜在生长函数。选择林分密度指数(SDI)作为反映林分中林木之间平均拥挤指标。在单木竞争指标的选择上,通过引进树冠因子,并在与传统的竞争指标相比较的基础上,淡化距离因子的作用,应用优势木相对树冠表面积构造了与距离无关的单木竞争指标,以此建立了落叶松人工林单木生长模型。  相似文献   

4.
Predictive accuracy and explained variation in Cox regression   总被引:6,自引:0,他引:6  
Schemper M  Henderson R 《Biometrics》2000,56(1):249-255
We suggest a new measure of the proportion of the variation of possibly censored survival times explained by a given proportional hazards model. The proposed measure, termed V, shares several favorable properties with an earlier V1 but also improves the handling of censoring. The statistic contrasts distance measures between individual 1/0 survival processes and fitted survival curves with and without covariate information. These distance measures, Dx and D, respectively, are themselves informative as summaries of absolute rather than relative predictive accuracy. We recommend graphical comparisons of survival curves for prognostic index groups to improve the understanding of obtained values for V, Dx, and D. Their use and interpretation is exemplified for a Yorkshire lung cancer study on survival. From this and an overview for several well-known clinical data sets, we show that the likely amount of relative or absolute predictive accuracy is often low even if there are highly significant and relatively strong prognostic factors.  相似文献   

5.
We studied the factors affecting the accuracy of the neighbor-joining (NJ) method for estimating phylogenies by simulating character change under different evolutionary models applied to twenty different 8-OTU tree topologies that varied widely with respect to tree imbalance and stemminess. The models incorporated three evolutionary rates—constant, varying among lineages, varying among characters—and three evolutionary contexts concerning patterns of character change relative to speciation events—phyletic, speciational, and punctuational. All combinations of the rate and context models were studied. In addition, three different absolute rates of change were investigated. To measure the accuracy, the strict consensus index was computed between the estimated tree and the tree topology along which the data had been generated. The results were analyzed by analysis of variance and compared to a previous study that evaluated UPGMA clustering and maximum parsimony (MP) as phylogenetic estimation techniques. We found evolutionary context and tree imbalance to be the most important factors affecting the accuracy of the NJ method. NJ was more accurate than UPGMA or MP in terms of the average strict consensus index over all treatments. However, no one method was more accurate than the other two for all combinations of treatments. Higher absolute rate of change generally resulted in higher accuracy for all three methods.  相似文献   

6.
MOTIVATION: The construction of evolutionary trees is one of the major problems in computational biology, mainly due to its complexity. RESULTS: We present a new tree construction method that constructs a tree with minimum score for a given set of sequences, where the score is the amount of evolution measured in PAM distances. To do this, the problem of tree construction is reduced to the Traveling Salesman Problem (TSP). The input for the TSP algorithm are the pairwise distances of the sequences and the output is a circular tour through the optimal, unknown tree plus the minimum score of the tree. The circular order and the score can be used to construct the topology of the optimal tree. Our method can be used for any scoring function that correlates to the amount of changes along the branches of an evolutionary tree, for instance it could also be used for parsimony scores, but it cannot be used for least squares fit of distances. A TSP solution reduces the space of all possible trees to 2n. Using this order, we can guarantee that we reconstruct a correct evolutionary tree if the absolute value of the error for each distance measurement is smaller than f2.gif" BORDER="0">, where f3.gif" BORDER="0">is the length of the shortest edge in the tree. For data sets with large errors, a dynamic programming approach is used to reconstruct the tree. Finally simulations and experiments with real data are shown.  相似文献   

7.
The most commonly used measure of evolutionary distance in molecular phylogenetics is the number of nucleotide substitutions per site. However, this number is not necessarily most efficient for reconstructing a phylogenetic tree. In order to evaluate the accuracy of evolutionary distance, D(t), for obtaining the correct tree topology, an accuracy index, A(t), was proposed. This index is defined as D'(t)/square root of[D(t)], where D'(t) is the first derivative of D(t) with respect to evolutionary time and V[D(t)] is the sampling variance of evolutionary distance. Using A(t), namely, finding the condition under which A(t) gives the maximum value, we can obtain an evolutionary distance which is efficient for obtaining the correct topology. Under the assumption that the transversional changes do not occur as frequently as the transitional changes, we obtained the evolutionary distances which are expected to give the correct topology more often than are the other distances.   相似文献   

8.
Accuracy of estimated phylogenetic trees from molecular data   总被引:27,自引:0,他引:27  
The accuracies and efficiencies of three different methods of making phylogenetic trees from gene frequency data were examined by using computer simulation. The methods examined are UPGMA, Farris' (1972) method, and Tateno et al.'s (1982) modified Farris method. In the computer simulation eight species (or populations) were assumed to evolve according to a given model tree, and the evolutionary changes of allele frequencies were followed by using the infinite-allele model. At the end of the simulated evolution five genetic distance measures (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's f theta, and the modified Cavalli-Sforza distance) were computed for all pairs of species, and the distance matrix obtained for each distance measure was used for reconstructing a phylogenetic tree. The phylogenetic tree obtained was then compared with the model tree. The results obtained indicate that in all tree-making methods examined the accuracies of both the topology and branch lengths of a reconstructed tree (rooted tree) are very low when the number of loci used is less than 20 but gradually increase with increasing number of loci. When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error as measured by the distortion index (dT) is not great, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci. When M is as small as 0.004, P is substantially lower. In obtaining a good topology (small dT and high P) UPGMA and the modified Farris method generally show a better performance than the Farris method. The poor performance of the Farris method is observed even when Rogers' distance which obeys the triangle inequality is used. The main reason for this seems to be that the Farris method often gives overestimates of branch lengths. For estimating the expected branch lengths of the true tree UPGMA shows the best performance. For this purpose Nei's standard distance gives a better result than the others because of its linear relationship with the number of gene substitutions. Rogers' or Cavalli-Sforza's distance gives a phylogenetic tree in which the parts near the root are condensed and the other parts are elongated. It is recommended that more than 30 loci, including both polymorphic and monomorphic loci, be used for making phylogenetic trees. The conclusions from this study seem to apply also to data on nucleotide differences obtained by the restriction enzyme techniques.  相似文献   

9.
We fitted spatial autocorrelation functions to distance-based data for assemblages of birds and for three attributes of birds' habitats at 140 locations, separated by up to 65 km, in the Great Basin (Nevada, USA). The three habitat characteristics were taxonomic composition of the vegetation, physical structure of the vegetation, and a measure of primary productivity, the normalized difference vegetation index, estimated from satellite imagery. We found that a spherical model was the best fit to data for avifaunal composition, vegetation composition, and primary productivity, but the distance at which spatial correlation effectively was zero differed substantially among data sets ( c . 30 km for birds, 20 km for vegetation composition, and 60 km for primary productivity). A power-law function was the best fit to data for vegetation structure, indicating that the structure of vegetation differed by similar amounts irrespective of distance between locations (up to the maximum distance measured). Our results suggested that the spatial structure of bird assemblages is more similar to vegetation composition than to either vegetation structure or primary productivity, but is autocorrelated over larger distances. We believe that the greater mobility of birds compared with plants may be responsible for this difference.  相似文献   

10.
Phylogenetic trees based on gene content   总被引:2,自引:0,他引:2  
Comparing gene content between species can be a useful approach for reconstructing phylogenetic trees. In this paper, we derive a maximum-likelihood estimation of evolutionary distance between species under a simple model of gene genesis and gene loss. Using simulated data on a biological tree with 107 taxa (and on a number of randomly generated trees), we compare the accuracy of tree reconstruction using this ML distance measure to an earlier ad hoc distance. We then compare these distance-based approaches to a character-based tree reconstruction method (Dollo parsimony) which seems well suited to the analysis of gene content data. To simplify simulations, we give a formal proof of the well-known 'fact' that the Dollo parsimony score is independent of the choice of root. Our results show a consistent trend, with the character-based method and ML distance measure outperforming the earlier ad hoc distance method. AVAILABILITY: http://www.ab.informatik.uni-tuebingen.de/software/genecontent/welcome_en.html  相似文献   

11.
We present a Monte-Carlo simulation analysis of the statistical properties of absolute genetic distance and of Nei's minimum and standard genetic distances. The estimation of distances (bias) and of their variances is analysed as well as the distributions of distance and variance estimators, taking into account both gamete and locus samplings. Both of Nei's statistics are non-linear when distances are small and consequently the distributions of their estimators are extremely asymmetrical. It is difficult to find theoretical laws that fit such asymmetrical distributions. Absolute genetic distance is linear and its distributions are better fit by a normal distribution. When distances are medium or large, minimum distance and absolute distance distributions are close to a normal distribution, but those of the standard distance can never be considered as normal. For large distances the jack-knife estimator of the standard distance variance is bad; another standard distance estimator is suggested. Absolute distance, which has the best mathematical properties, is particularly interesting for small distances if the gamete sample size is large, even when the number of loci is small. When both distance and gamete sample size are small, this statistic is biased.  相似文献   

12.
BRANCH SUPPORT AND TREE STABILITY   总被引:38,自引:1,他引:37  
Abstract— Branch support is quantified as the extra length needed to lose a branch in the consensus of near-most-parsimonious trees. This approach is based solely on the original data, as opposed to the data perturbation used in the bootstrap procedure. If trees have been generated by Farris's successive approximations approach to character weighting, branch support should be examined in terms of weighted extra length needed to lose a branch. The sum of all branch support values over the tree divided by the length of the most parsimonious tree[s] provides a new index, the total support index. This index is a measure of tree stability in terms of supported resolutions, which is of prime importance in cladistic analysis.  相似文献   

13.
This study aimed at comparing six patch connectivity measures by fitting them to field data. We used occupancy data for eight beetle and two pseudoscorpion species from 281 hollow oaks in southeast Sweden. Species occupancy was modelled in relation to tree characteristics and one measure of patch connectivity at a time. For each connectivity measure we searched for the spatial scale that generated the best fit to field data. Connectivity measures that only include occupied patches provided better model fits than those that include all patches. When occupancy data are absent for surrounding habitat patches, information that reflects occurrence probabilities can be included in the connectivity measure. However, in this study incorporation of such information resulted in only a slight improvement of model fit. A frequently used connectivity measure based on the negative exponential function was relatively poor in explaining species’ occurrence; for eight species out of nine a buffer measure was better. A better fit was obtained when the negative exponential function was modified to take into account that habitat patches may “compete” for the immigrants. The spatial scale with the best fit tended to be larger when we used connectivity measures in which dispersal sources are identified with lower precision. Thus, the outcomes from different multiple‐scale studies are not directly comparable if the density of dispersal sources is not measured in the same way. Overall we conclude that buffer measures are useful, as they give good predictions and are easy to understand and use. If a biologically more realistic measure is needed, one that up‐weights the closest patches should be used. Finally, the possibility that habitat patches may compete with each other for immigrants should be considered when selecting a connectivity measure.  相似文献   

14.
HOMOPLASY AND THE CHOICE AMONG CLADOGRAMS   总被引:6,自引:0,他引:6  
Abstract Cladistic data are more decisive when the possible trees differ more in tree length. When all the possible dichotomous trees have the same length, no one tree is better supported than the others, and the data are completely undecisive . From a rule for recursively generating undecisive matrices for different numbers of taxa, formulas to calculate consistency, rescaled consistency and retention indices in undecisive matrices are derived. The least decisive matrices are not the matrices with the lowest possible consistency, rescaled consistency or retention indices (on the most parsimonious trees); those statistics do not directly vary with decisiveness. Decisiveness can be measured with a newly proposed statistic, DD = − S )/( − S ) (where S = length of the most parsimonious cladogram, = mean length of all the possible cladograms for the data set and M = observed variation). For any data set, can be calculated exactly with simple formulas; it depends on the types of characters present, and not on their congruence. Despite some recent assertions to the contrary, the consistency index is an appropriate measure of homoplasy (= deviation from hierarchy). The retention index seems more appropriate for comparing the fit of different trees for the same data set.  相似文献   

15.

Background

This paper is devoted to distance measures for leaf-labelled trees on free leafset. A leaf-labelled tree is a data structure which is a special type of a tree where only leaves (terminal) nodes are labelled. This data structure is used in bioinformatics for modelling of evolution history of genes and species and also in linguistics for modelling of languages evolution history. Many domain specific problems occur and need to be solved with help of tree postprocessing techniques such as distance measures.

Results

Here we introduce the tree edit distance designed for leaf labelled trees on free leafset, which occurs to be a metric. It is presented together with tree edit consensus tree notion. We provide statistical evaluation of provided measure with respect to R-F, MAST and frequent subsplit based dissimilarity measures as the reference measures.

Conclusions

The tree edit distance was proven to be a metric and has the advantage of using different costs for contraction and pruning, therefore their properties can be tuned depending on the needs of the user. Two of the presented methods carry the most interesting properties. E(3,1) is very discriminative (having a wide range of values) and has a very regular distance distribution which is similar to a normal distribution in its shape and is good both for similar and non-similar trees. NFC(2,1) on the other hand is proportional or nearly proportional to the number of mutation operations used, irrespective of their type.  相似文献   

16.
It has been claimed that blending processes such as trade and exchange have always been more important in the evolution of cultural similarities and differences among human populations than the branching process of population fissioning. In this paper, we report the results of a novel comparative study designed to shed light on this claim. We fitted the bifurcating tree model that biologists use to represent the relationships of species to 21 biological data sets that have been used to reconstruct the relationships of species and/or higher level taxa and to 21 cultural data sets. We then compared the average fit between the biological data sets and the model with the average fit between the cultural data sets and the model. Given that the biological data sets can be confidently assumed to have been structured by speciation, which is a branching process, our assumption was that, if cultural evolution is dominated by blending processes, the fit between the bifurcating tree model and the cultural data sets should be significantly worse than the fit between the bifurcating tree model and the biological data sets. Conversely, if cultural evolution is dominated by branching processes, the fit between the bifurcating tree model and the cultural data sets should be no worse than the fit between the bifurcating tree model and the biological data sets. We found that the average fit between the cultural data sets and the bifurcating tree model was not significantly different from the fit between the biological data sets and the bifurcating tree model. This indicates that the cultural data sets are not less tree-like than are the biological data sets. As such, our analysis does not support the suggestion that blending processes have always been more important than branching processes in cultural evolution. We conclude from this that, rather than deciding how cultural evolution has proceeded a priori, researchers need to ascertain which model or combination of models is relevant in a particular case and why.  相似文献   

17.
The modification of typical age-related growth by environmental changes is poorly understood, In part because there is a lack of consensus at individual tree level regarding age-dependent growth responses to climate warming as stands develop. To increase our current understanding about how multiple drivers of environmental change can modify growth responses as trees age we used tree ring data of a mountain subtropical pine species along an altitudinal gradient covering more than 2,200 m of altitude. We applied mixed-linear models to determine how absolute and relative age-dependent growth varies depending on stand development; and to quantify the relative importance of tree age and climate on individual tree growth responses. Tree age was the most important factor for tree growth in models parameterised using data from all forest developmental stages. Contrastingly, the relationship found between tree age and growth became non-significant in models parameterised using data corresponding to mature stages. These results suggest that although absolute tree growth can continuously increase along tree size when trees reach maturity age had no effect on growth. Tree growth was strongly reduced under increased annual temperature, leading to more constant age-related growth responses. Furthermore, young trees were the most sensitive to reductions in relative growth rates, but absolute growth was strongly reduced under increased temperature in old trees. Our results help to reconcile previous contrasting findings of age-related growth responses at the individual tree level, suggesting that the sign and magnitude of age-related growth responses vary with stand development. The different responses found to climate for absolute and relative growth rates suggest that young trees are particularly vulnerable under warming climate, but reduced absolute growth in old trees could alter the species’ potential as a carbon sink in the future.  相似文献   

18.
A method is described that allows the assessment of treelikeness of phylogenetic distance data before tree estimation. This method is related to statistical geometry as introduced by Eigen, Winkler-Oswatitsch, and Dress (1988 [Proc. Natl. Acad. Sci. USA. 85:5913-5917]), and in essence, displays a measure for treelikeness of quartets in terms of a histogram that we call a delta plot. This allows identification of nontreelike data and analysis of noisy data sets arising from processes such as, for example, parallel evolution, recombination, or lateral gene transfer. In addition to an overall assessment of treelikeness, individual taxa can be ranked by reference to the treelikeness of the quartets to which they belong. Removal of taxa on the basis of this ranking results in an increase in accuracy of tree estimation. Recombinant data sets are simulated, and the method is shown to be capable of identifying single recombinant taxa on the basis of distance information alone, provided the parents of the recombinant sequence are sufficiently divergent and the mixture of tree histories is not strongly skewed toward a single tree. delta Plots and taxon rankings are applied to three biological data sets using distances derived from sequence alignment, gene order, and fragment length polymorphism.  相似文献   

19.
Using a simple example and simulations, we explore the impact of input tree shape upon a broad range of supertree methods. We find that input tree shape can affect how conflict is resolved by several supertree methods and that input tree shape effects may be substantial. Standard and irreversible matrix representation with parsimony (MRP), MinFlip, duplication-only Gene Tree Parsimony (GTP), and an implementation of the average consensus method have a tendency to resolve conflict in favor of relationships in unbalanced trees. Purvis MRP and the average dendrogram method appear to have an opposite tendency. Biases with respect to tree shape are correlated with objective functions that are based upon unusual asymmetric tree-to-tree distance or fit measures. Split, quartet, and triplet fit, most similar supertree, and MinCut methods (provided the latter are interpreted as Adams consensus-like supertrees) are not revealed to have any bias with respect to tree shape by our example, but whether this holds more generally is an open problem. Future development and evaluation of supertree methods should consider explicitly the undesirable biases and other properties that we highlight. In the meantime, use of a single, arbitrarily chosen supertree method is discouraged. Use of multiple methods and/or weighting schemes may allow practical assessment of the extent to which inferences from real data depend upon methodological biases with respect to input tree shape or size.  相似文献   

20.
On dendrogram-based measures of functional diversity   总被引:4,自引:1,他引:3  
JánosPodani  DénesSchmera 《Oikos》2006,115(1):179-185
Euclidean distance is commonly involved in calculating functional diversity (FD), for example, in measures based on dendrogram branch lengths. We point out that this function is inappropriate in many cases and that the choice of clustering method is more crucial than earlier thought. Gower's formula and UPGMA clustering are suggested here as a standard combination of techniques for calculating FD. The advantage of Gower's measure is its suitability to a mixture of scale types and its tolerance to missing values. Examples demonstrate that UPGMA clustering is more robust and has a better goodness of fit to dissimilarities than complete and single linkage classifications. In addition, we propose that the effect of individual species on FD is best evaluated by species removals and subsequent comparisons of tree length values. The influence of each functional trait is optimally judged by considering both dendrogram length and topological changes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号