首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The great increase in the number of phylogenetic studies of a wide variety of organisms in recent decades has focused considerable attention on the balance of phylogenetic trees—the degree to which sister clades within a tree tend to be of equal size—for at least two reasons: (1) the degree of balance of a tree may affect the accuracy of estimates of it; (2) the degree of balance, or imbalance, of a tree may reveal something about the macroevolutionary processes that produced it. In particular, variation among lineages in rates of speciation or extinction is expected to produce trees that are less balanced than those that result from phylogenetic evolution in which each extant species of a group has the same probability of speciation or extinction. Several coefficients for measuring the balance or imbalance of phylogenetic trees have been proposed. I focused on Colless's coefficient of imbalance (7) for its mathematical tractability and ease of interpretation. Earlier work on this statistic produced exact methods only for calculating the expected value. In those studies, the variance and confidence limits, which are necessary for testing the departure of observed values of I from the expected, were estimated by Monte Carlo simulation. I developed recursion equations that allow exact calculation of the mean, variance, skewness, and complete probability distribution of I for two different probability-generating models for bifurcating tree shapes. The Equal-Rates Markov (ERM) model assumes that trees grow by the random speciation and extinction of extant species, with all species that are extant at a given time having the same probability of speciation or extinction. The Equal Probability (EP) model assumes that all possible labeled trees for a given number of terminal taxa have the same probability of occurring. Examples illustrate how these theoretically derived probabilities and parameters may be used to test whether the evolution of a monophyletic group or set of monophyletic groups has proceeded according to a Markov model with equal rates of speciation and extinction among species, that is, whether there has been significant variation among lineages in expected rates of speciation or extinction.  相似文献   

2.
3.
Random trees and random characters can be used in null models for testing phylogenetic hypothesis. We consider three interpretations of random trees: first, that trees are selected from the set of all possible trees with equal probability; second, that trees are formed by random speciation or coalescence (equivalent); and third, that trees are formed by a series of random partitions of the taxa. We consider two interpretations of random characters: first, that the number of taxa with each state is held constant, but the states are randomly reshuffled among the taxa; and second, that the probability each taxon is assigned a particular state is constant from one taxon to the next. Under null models representing various combinations of randomizations of trees and characters, exact recursion equations are given to calculate the probability distribution of the number of character state changes required by a phylogenetic tree. Possible applications of these probability distributions are discussed. They can be used, for example, to test for a panmictic population structure within a species or to test phylogenetic inertia in a character's evolution. Whether and how a null model incorporates tree randomness makes little difference to the probability distribution in many but not all circumstances. The null model's sense of character randomness appears more critical. The difficult issue of choosing a null model is discussed.  相似文献   

4.
Estimating the reliability of evolutionary trees   总被引:9,自引:1,他引:8  
Six protein sequences from the same 11 mammalian taxa were used to estimate the accuracy and reliability of phylogenetic trees using real, rather than simulated, data. A tree comparison metric was used to measure the increase in similarity of minimal trees as larger, randomly selected subsets of nucleotide positions were taken. The ratio of the observed to the expected number of incompatibilities for each nucleotide position (character) is a good predictor of the number of changes required at that position on the minimal (most-parsimonious) tree. This allows a higher weighting of nucleotide positions that have changed more slowly and should result in the minimal length tree converging to the correct tree as more sequences are obtained. An estimate was made of the smallest subset of trees that need to be considered to include the actual historical tree for a given set of data. It was concluded that it is possible to give a reasonable estimate of the reliability of the final tree, at least when several sequences are combined. With the present data, resolving the rodent- primate-lagomorph (rabbit) trichotomy is the least certain aspect of the final tree, followed then by establishing the position of dog. In our opinion, it is unreasonable to publish an evolutionary tree derived from sequence data without giving an idea of the reliability of the tree.   相似文献   

5.
Noise     
The proliferation of DNA sequence data has generated a concern about the effects of "noise" on phylogeny reconstruction. This concern has led to various recommendations for weighting schemes and for separating data types prior to analysis. A new technique is explored to examine directly how noise influences the stability of parsimony reconstruction. By appending purely random characters onto a matrix of pure signal, or by replacing characters in a matrix of signal by random states, one can measure the degree to which a matrix is robust against noise. Reconstructions were sensitive to tree topology and clade size when noise was added, but were less so when character states were replaced with noise. When a signal matrix is complemented with a noise matrix of equal size, parsimony will trace the original signal about half the time when there is only one synapomorphy per node, and about 90% of the time when there are three synapomorphies per node. Similar results obtain when 20% of a matrix is replaced by noise. Successive weighting does not improve performance. Adding noise to only some taxa is more damaging, but replacing characters in only some taxa is less so. The bootstrap and g1 (tree skewness) statistics are shown to be uninterpretable measures of noise or departures from randomness. Empirical data sets illustrate that commonly recommended schemes of differential weighting (e.g. downweighting third positions) are not well supported from the point of view of reducing the influence of noise nor are more noisy data sets likely to degrade signal found in less noisy data sets.  相似文献   

6.
The maximum parsimony (MP) method for inferring phylogenies is widely used, but little is known about its limitations in non-asymptotic situations. This study employs large-scale computations with simulated phylogenetic data to estimate the probability that MP succeeds in finding the true phylogeny for up to twelve taxa and 256 characters. The set of candidate phylogenies are taken to be unrooted binary trees; for each simulated data set, the tree lengths of all (2n − 5)!! candidates are computed to evaluate quantities related to the performance of MP, such as the probability of finding the true phylogeny, the probability that the tree with the shortest length is unique, the probability that the true phylogeny has the shortest tree length, and the expected inverse of the number of trees sharing the shortest length. The tree length distributions are also used to evaluate and extend the skewness test of Hillis for distinguishing between random and phylogenetic data. The results indicate, for example, that the critical point after which MP achieves a success probability of at least 0.9 is roughly around 128 characters. The skewness test is found to perform well on simulated data and the study extends its scope to up to twelve taxa.  相似文献   

7.
Nonparamtric bootstrapping methods may be useful for assessing confidence in a supertree inference. We examined the performance of two supertree bootstrapping methods on four published data sets that each include sequence data from more than 100 genes. In "input tree bootstrapping," input gene trees are sampled with replacement and then combined in replicate supertree analyses; in "stratified bootstrapping," trees from each gene's separate (conventional) bootstrap tree set are sampled randomly with replacement and then combined. Generally, support values from both supertree bootstrap methods were similar or slightly lower than corresponding bootstrap values from a total evidence, or supermatrix, analysis. Yet, supertree bootstrap support also exceeded supermatrix bootstrap support for a number of clades. There was little overall difference in support scores between the input tree and stratified bootstrapping methods. Results from supertree bootstrapping methods, when compared to results from corresponding supermatrix bootstrapping, may provide insights into patterns of variation among genes in genome-scale data sets.  相似文献   

8.
We studied the influence of seven habitat variables, including tree species, for nesting by the Black-faced Ibis (Theristicus melanopis melanopis) in an urban area of southern Chile. Variables were compared between 30 trees with nests and 30 randomly selected trees without nests. Nests were found in big trees with large diameters and heights. However, the only variable found to have a significant effect on site selection was tree species, which explained 57.9% of data variability (deviance) and suggested a selection of exotic conifers, mainly Douglas fir (Pseudotsuga menziesii). Tree species and tree diameter also had significant effects upon the number of nests per tree, jointly explaining 68.9% of data deviance. Our results suggest that in urban environments the Black-faced Ibis uses larger trees that provide greater nest stability and protection.  相似文献   

9.
Signal, noise, and reliability in molecular phylogenetic analyses.   总被引:38,自引:0,他引:38  
DNA sequences and other molecular data compared among organisms may contain phylogenetic signal, or they may be randomized with respect to phylogenetic history. Some method is needed to distinguish phylogenetic signal from random noise to avoid analysis of data that have been randomized with respect to the historical relationships of the taxa being compared. We analyzed 8,000 random data matrices consisting of 10-500 binary or four-state characters and 5-25 taxa to study several options for detecting signal in systematic data bases. Analysis of random data often yields a single most-parsimonious tree, especially if the number of characters examined is large and the number of taxa examined is small (both often true in molecular studies). The most-parsimonious tree inferred from random data may also be considerably shorter than the second-best alternative. The distribution of tree lengths of all tree topologies (or a random sample thereof) provides a sensitive measure of phylogenetic signal: data matrices with phylogenetic signal produce tree-length distributions that are strongly skewed to the left, whereas those composed of random noise are closer to symmetrical. In simulations of phylogeny with varying rates of mutation (up to levels that produce random variation among taxa), the skewness of tree-length distributions is closely related to the success of parsimony in finding the true phylogeny. Tables of critical values of a skewness test statistic, g1, are provided for binary and four-state characters for 10-500 characters and 5-25 taxa. These tables can be used in a rapid and efficient test for significant structure in data matrices for phylogenetic analysis.  相似文献   

10.
Summary The maximum likelihood (ML) method for constructing phylogenetic trees (both rooted and unrooted trees) from DNA sequence data was studied. Although there is some theoretical problem in the comparison of ML values conditional for each topology, it is possible to make a heuristic argument to justify the method. Based on this argument, a new algorithm for estimating the ML tree is presented. It is shown that under the assumption of a constant rate of evolution, the ML method and UPGMA always give the same rooted tree for the case of three operational taxonomic units (OTUs). This also seems to hold approximately for the case with four OTUs. When we consider unrooted trees with the assumption of a varying rate of nucleotide substitution, the efficiency of the ML method in obtaining the correct tree is similar to those of the maximum parsimony method and distance methods. The ML method was applied to Brown et al.'s data, and the tree topology obtained was the same as that found by the maximum parsimony method, but it was different from those obtained by distance methods.  相似文献   

11.
We investigate some discrete structural properties of evolutionary trees generated under simple null models of speciation, such as the Yule model. These models have been used as priors in Bayesian approaches to phylogenetic analysis, and also to test hypotheses concerning the speciation process. In this paper we describe new results for three properties of trees generated under such models. Firstly, for a rooted tree generated by the Yule model we describe the probability distribution on the depth (number of edges from the root) of the most recent common ancestor of a random subset of k species. Next we show that, for trees generated under the Yule model, the approximate position of the root can be estimated from the associated unrooted tree, even for trees with a large number of leaves. Finally, we analyse a biologically motivated extension of the Yule model and describe its distribution on tree shapes when speciation occurs in rapid bursts.  相似文献   

12.
Data published recently on the stem diameters in experimental Pinus Radiata plantations, show a skewness which is initially zero, first becomes negative and later reverses direction, becomes positive and increases indefinitely. This and other behaviour are explained using a zone-of-influence model based entirely upon competition between neighbouring trees. Negative skewness can be identified with the early stages of competition when only the largest trees compete. The model also generates bimodal distributions when competition is intense, as observed experimentally in annual plants. Further modes are generated as competition is increased further.  相似文献   

13.
Abstract— Protein variation among 37 species of carcharhiniform sharks was examined at 17 presumed loci. Evolutionary trees were inferred from these data using both cladistic character and a distance Wagner analysis. Initial cladistic character analysis resulted in more than 30 000 equally parsimonious tree arrangements. Randomization tests designed to evaluate the phylogenetic information content of the data suggest the data are highly significantly different from random in spite of the large number of parsimonious trees produced. Different starting seed trees were found to influence the kind of tree topologies discovered by the heuristic branch swapping algorithm used. The trees generated during the early phases of branch swapping on a single seed tree were found to be topologically similar to those generated throughout the course of branch swapping. Successive weighting increased the frequency and the consistency with which certain clades were found during the course of branch swapping, causing the semi-strict consensus to be more resolved. Successive weighting also appeared resilient to the bias associated with the choice of initial seed tree causing analyses seeded with different trees to converge on identical final character weights and the same semi-strict consensus tree.
The summary cladistic character analysis and the distance Wagner analysis both support the monophyly of two major clades, the genus Rhizoprionodon and the genus Sphyrna. . The distance Wagner analysis also supports the monophyly of the genus Carcharhinus . However, the cladistic analysis suggests that Carcharhinus is a paraphyletic group that includes the blue shark Prionace glauca .  相似文献   

14.
Accuracy of estimated phylogenetic trees from molecular data   总被引:2,自引:0,他引:2  
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained.  相似文献   

15.
Gene coexpression networks inferred by correlation from high-throughput profiling such as microarray data represent simple but effective structures for discovering and interpreting linear gene relationships. In recent years, several approaches have been proposed to tackle the problem of deciding when the resulting correlation values are statistically significant. This is most crucial when the number of samples is small, yielding a non-negligible chance that even high correlation values are due to random effects. Here we introduce a novel hard thresholding solution based on the assumption that a coexpression network inferred by randomly generated data is expected to be empty. The threshold is theoretically derived by means of an analytic approach and, as a deterministic independent null model, it depends only on the dimensions of the starting data matrix, with assumptions on the skewness of the data distribution compatible with the structure of gene expression levels data. We show, on synthetic and array datasets, that the proposed threshold is effective in eliminating all false positive links, with an offsetting cost in terms of false negative detected edges.  相似文献   

16.
A simulation study was carried out to investigate the relative importance of tree topology (both balance and stemminess), evolutionary rates (constant, varying among characters, and varying among lineages), and evolutionary models in determining the accuracy with which phylogenetic trees can be estimated. The three evolutionary context models were phyletic (characters can change at each simulated time step), speciational (changes are possible only at the time of speciation into two daughter lineages), and punctuational (changes occur at the time of speciation but only in one of the daughter lineages). UPGMA clustering and maximum parsimony (“Wagner trees”) methods for estimating phylogenies were compared. All trees were based on eight recent OTUs. The three evolutionary context models were found to have the largest influence on accuracy of estimates by both methods. The next most important effect was that of the stemminess × context interaction. Maximum parsimony and UPGMA performed worst under the punctuational models. Under the phyletic model, trees with high stemminess values could be estimated more accurately and balanced trees were slightly easier to estimate than unbalanced ones. Overall, maximum parsimony yielded more accurate trees than UPGMA—but that was expected for these simulations since many more characters than OTUs were used. Our results suggest that the great majority of estimated phylogenetic trees are likely to be quite inaccurate; they underscore the inappropriateness of characterizing current phylogenetic methods as being for reconstruction rather than for estimation.  相似文献   

17.
The mutualistic symbiosis between forest trees and ectomycorrhizal fungi (EMF) is among the most ubiquitous and successful interactions in terrestrial ecosystems. Specific species of EMF are known to colonize specific tree species, benefitting from their carbon source, and in turn, improving their access to soil water and nutrients. EMF also form extensive mycelial networks that can link multiple root‐tips of different trees. Yet the number of tree species connected by such mycelial networks, and the traffic of material across them, are just now under study. Recently we reported substantial belowground carbon transfer between Picea, Pinus, Larix and Fagus trees in a mature forest. Here, we analyze the EMF community of these same individual trees and identify the most likely taxa responsible for the observed carbon transfer. Among the nearly 1,200 EMF root‐tips examined, 50%–70% belong to operational taxonomic units (OTUs) that were associated with three or four tree host species, and 90% of all OTUs were associated with at least two tree species. Sporocarp 13C signals indicated that carbon originating from labelled Picea trees was transferred among trees through EMF networks. Interestingly, phylogenetically more closely related tree species exhibited more similar EMF communities and exchanged more carbon. Our results show that belowground carbon transfer is well orchestrated by the evolution of EMFs and tree symbiosis.  相似文献   

18.
基因组甲基化修饰受环境因素的影响。在以甲基化为代表的表观遗传学研究中,如何减少保存环境对异地采后样品的影响,提高整个实验的准确性和科学性,目前尚未有系统的认知。该研究选取5种常用的采后样品保存方式(液氮冷冻、-20 ℃冷冻、变色硅胶干燥、密封袋密封、75%酒精浸泡),分别用Wilcoxon signed ranks tests统计分析和UPGMA聚类分析方法,对华南植物园锥栗进行F MSAP研究,以期找出最佳保存方式。同时,利用正交试验法对F MSAP体系进行优化,筛选出9对引物(E3 H/M2;E5 H/M2;E6 H/M1;E6 H/M5;E8 H/M1;E8 H/M5;E9 H/M2;E11 H/M5;E14 H/M1),并对不同发育时期的锥栗甲基化水平及遗传多样性进行了论述。结果表明:在锥栗F MSAP的研究中,Willcoxon signed ranks tests统计分析和UPGMA聚类分析结论一致,密封袋保存效果最佳;成熟叶半甲基化率(27.83%)和总甲基化率(51.13%)高于幼叶(21.35%,45.90%),全甲基化率(23.30%)低于幼叶(24.55%),平均多态位点百分数39.60%,香农信息指数0.207±0.002,表现出较高的甲基化水平和遗传多样性。  相似文献   

19.
The effect of six resemblance coefficients (taxonomic distance, Manhattan distance, correlations, cosines, and two new general dissimilarity coefficients) on the character stability of classifications based on six data sets was evaluated. The six data sets represent a variety of organisms, and of ratios of number of characters to number of OTUs, and were randomly bipartitioned 100 times. The results of matrix correlations, cophenetic correlations and two consensus measures indicate that no one resemblance coefficient is uniformly better than all others when evaluated in terms of the stability of a classification, although taxonomic distance and Manhattan distance produce relatively more stable classifications than the other resemblance coefficients. An index of dimensionality, the stemminess and cophenetic correlations of classifications were calculated for the six data sets and also for 20 data sets analyzed in an earlier study. Regression analysis of stability on the ratio of number of characters to the number of OTUs, dimensionality, stemminess, and cophenetic correlations explained more than 70% of the variance in stability. Of the four factors, the ratio was by far the most important. Stemminess and dimensionality contributed little when considered singly, and did not add appreciably to the variance explained by ratio and cophenetic correlations.Dedicated to the memory of Prof.J. S. L. Gilmour. His insightful wrightings on naturalness in classifications paved the way for the development of numerical phenetics.  相似文献   

20.
In this article the question of reconstructing a phylogeny from additive distance data is addressed. Previous algorithms used the complete distance matrix of then OTUs (Operational Taxonomic Unit), that corresponds to the tips of the tree. This usedO(n 2) computing time. It is shown that this is wasteful for biologically reasonable trees. If the tree has internal nodes with degrees that are bounded onO(n*log(n)) algorithm is possible. It is also shown if the nodes can have unbounded degrees the problem hasn 2 as lower bound.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号