期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent

Elizabeth S. Allman James H. Degnan John A. Rhodes 《Journal of mathematical biology》2011,62(6):833-862

Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled. 相似文献

2.

Effects of branch length uncertainty on Bayesian posterior probabilities for phylogenetic hypotheses 总被引：1，自引：0，他引：1

Kolaczkowski B Thornton JW 《Molecular biology and evolution》2007,24(9):2108-2118

In Bayesian phylogenetics, confidence in evolutionary relationships is expressed as posterior probability--the probability that a tree or clade is true given the data, evolutionary model, and prior assumptions about model parameters. Model parameters, such as branch lengths, are never known in advance; Bayesian methods incorporate this uncertainty by integrating over a range of plausible values given an assumed prior probability distribution for each parameter. Little is known about the effects of integrating over branch length uncertainty on posterior probabilities when different priors are assumed. Here, we show that integrating over uncertainty using a wide range of typical prior assumptions strongly affects posterior probabilities, causing them to deviate from those that would be inferred if branch lengths were known in advance; only when there is no uncertainty to integrate over does the average posterior probability of a group of trees accurately predict the proportion of correct trees in the group. The pattern of branch lengths on the true tree determines whether integrating over uncertainty pushes posterior probabilities upward or downward. The magnitude of the effect depends on the specific prior distributions used and the length of the sequences analyzed. Under realistic conditions, however, even extraordinarily long sequences are not enough to prevent frequent inference of incorrect clades with strong support. We found that across a range of conditions, diffuse priors--either flat or exponential distributions with moderate to large means--provide more reliable inferences than small-mean exponential priors. An empirical Bayes approach that fixes branch lengths at their maximum likelihood estimates yields posterior probabilities that more closely match those that would be inferred if the true branch lengths were known in advance and reduces the rate of strongly supported false inferences compared with fully Bayesian integration. 相似文献

3.

Molecular claims of Gondwanan age for Australian agamid lizards are untenable

Hugall AF Lee MS 《Molecular biology and evolution》2004,21(11):2102-2110

A recent mtDNA study proposes a surprisingly deep (approximately 150 MYA) divergence between SE Asian and Australasian agamid lizards, consistent with ancient Gondwanan vicariance rather than dispersal across the Indonesian Archipelago. However, the analysis contains a fundamental error: use of rates of molecular evolution inferred from uncorrected sequence divergence to put a time frame on a tree with branch lengths greatly elongated by complex likelihood and rate-smoothing models. Furthermore, this date implies that basal splits within agamids occurred implausibly early, at least 300 MYA (100 Myr before the first fossil lizards and coincident with the earliest fossil reptiles). Analyses of the mtDNA data using more appropriate methods and new information from nuclear (c-mos) sequences suggest a much more recent divergence between SE Asian and Australian agamids (around 30 MYA). Using two fossil boundary dates, bootstrapping the c-mos data gives a 95% confidence interval for this divergence time that is sufficiently recent (14-41 MYA) to exclude an ancient Gondwanan vicariance and is more consistent with Miocene over-water dispersal. As with the mtDNA, the c-mos data implies implausibly old basal divergences among agamids if a Gondwanan age is assumed for the Australasian clade. The analyses also highlight how methods for creating ultrametric trees (especially nonparametric rate smoothing) can greatly modify branch lengths and, thus, always require internal calibrations. The errors associated with inferred dates in the previous study (inferred through parametric bootstrapping) were also unjustifiably low, as this method only considers stochasticity in the substitution model and ignores much larger sources of uncertainty, such as variation in character sampling, tree topology, and calibration accuracy. 相似文献

4.

Estimating divergence times in large phylogenetic trees

Britton T Anderson CL Jacquet D Lundqvist S Bremer K 《Systematic biology》2007,56(5):741-752

A new method, PATHd8, for estimating ultrametric trees from trees with edge (branch) lengths proportional to the number of substitutions is proposed. The method allows for an arbitrary number of reference nodes for time calibration, each defined either as absolute age, minimum age, or maximum age, and the tree need not be fully resolved. The method is based on estimating node ages by mean path lengths from the node to the leaves but correcting for deviations from a molecular clock suggested by reference nodes. As opposed to most existing methods allowing substitution rate variation, the new method smoothes substitution rates locally, rather than simultaneously over the whole tree, thus allowing for analysis of very large trees. The performance of PATHd8 is compared with other frequently used methods for estimating divergence times. In analyses of three separate data sets, PATHd8 gives similar divergence times to other methods, the largest difference being between crown group ages, where unconstrained nodes get younger ages when analyzed with PATHd8. Overall, chronograms obtained from other methods appear smoother, whereas PATHd8 preserves more of the heterogeneity seen in the original edge lengths. Divergence times are most evenly spread over the chronograms obtained from the Bayesian implementation and the clock-based Langley-Fitch method, and these two methods produce very similar ages for most nodes. Evaluations of PATHd8 using simulated data suggest that PATHd8 is slightly less precise compared with penalized likelihood, but it gives more sensible answers for extreme data sets. A clear advantage with PATHd8 is that it is more or less instantaneous even with trees having several thousand leaves, whereas other programs often run into problems when analyzing trees with hundreds of leaves. PATHd8 is implemented in freely available software. 相似文献

5.

Estimating species trees using approximate Bayesian computation

Fan HH Kubatko LS 《Molecular phylogenetics and evolution》2011,59(2):354-363

Development of methods for estimating species trees from multilocus data is a current challenge in evolutionary biology. We propose a method for estimating the species tree topology and branch lengths using approximate Bayesian computation (ABC). The method takes as data a sample of observed rooted gene tree topologies, and then iterates through the following sequence of steps: First, a randomly selected species tree is used to compute the distribution of rooted gene tree topologies. This distribution is then compared to the observed gene topology frequencies, and if the fit between the observed and the predicted distributions is close enough, the proposed species tree is retained. Repeating this many times leads to a collection of retained species trees that are then used to form the estimate of the overall species tree. We test the performance of the method, which we call ST-ABC, using both simulated and empirical data. The simulation study examines both symmetric and asymmetric species trees over a range of branch lengths and sample sizes. The results from the simulation study show that the model performs very well, giving accurate estimates for both the topology and the branch lengths across the conditions studied, and that a sample size of 25 loci appears to be adequate for the method. Further, we apply the method to two empirical cases: a 4-taxon data set for primates and a 7-taxon data set for yeast. In both cases, we find that estimates obtained with ST-ABC agree with previous studies. The method provides efficient estimation of the species tree, and does not require sequence data, but rather the observed distribution of rooted gene topologies without branch lengths. Therefore, this method is a useful alternative to other currently available methods for species tree estimation. 相似文献

6.

The Rooting of the Universal Tree of Life Is Not Reliable 总被引：19，自引：0，他引：19

Hervé Philippe Patrick Forterre 《Journal of molecular evolution》1999,49(4):509-523

Several composite universal trees connected by an ancestral gene duplication have been used to root the universal tree of life. In all cases, this root turned out to be in the eubacterial branch. However, the validity of results obtained from comparative sequence analysis has recently been questioned, in particular, in the case of ancient phylogenies. For example, it has been shown that several eukaryotic groups are misplaced in ribosomal RNA or elongation factor trees because of unequal rates of evolution and mutational saturation. Furthermore, the addition of new sequences to data sets has often turned apparently reasonable phylogenies into confused ones. We have thus revisited all composite protein trees that have been used to root the universal tree of life up to now (elongation factors, ATPases, tRNA synthetases, carbamoyl phosphate synthetases, signal recognition particle proteins) with updated data sets. In general, the two prokaryotic domains were not monophyletic with several aberrant groupings at different levels of the tree. Furthermore, the respective phylogenies contradicted each others, so that various ad hoc scenarios (paralogy or lateral gene transfer) must be proposed in order to obtain the traditional Archaebacteria–Eukaryota sisterhood. More importantly, all of the markers are heavily saturated with respect to amino acid substitutions. As phylogenies inferred from saturated data sets are extremely sensitive to differences in evolutionary rates, present phylogenies used to root the universal tree of life could be biased by the phenomenon of long branch attraction. Since the eubacterial branch was always the longest one, the eubacterial rooting could be explained by an attraction between this branch and the long branch of the outgroup. Finally, we suggested that an eukaryotic rooting could be a more fruitful working hypothesis, as it provides, for example, a simple explanation to the high genetic similarity of Archaebacteria and Eubacteria inferred from complete genome analysis. 相似文献

7.

Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci 总被引：11，自引：0，他引：11

Rannala B Yang Z 《Genetics》2003,164(4):1645-1656

The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be approximately 20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models. 相似文献

8.

A mixed branch length model of heterotachy improves phylogenetic accuracy

Kolaczkowski B Thornton JW 《Molecular biology and evolution》2008,25(6):1054-1066

Evolutionary relationships are typically inferred from molecular sequence data using a statistical model of the evolutionary process. When the model accurately reflects the underlying process, probabilistic phylogenetic methods recover the correct relationships with high accuracy. There is ample evidence, however, that models commonly used today do not adequately reflect real-world evolutionary dynamics. Virtually all contemporary models assume that relatively fast-evolving sites are fast across the entire tree, whereas slower sites always evolve at relatively slower rates. Many molecular sequences, however, exhibit site-specific changes in evolutionary rates, called "heterotachy." Here we examine the accuracy of 2 phylogenetic methods for incorporating heterotachy, the mixed branch length model--which incorporates site-specific rate changes by summing likelihoods over multiple sets of branch lengths on the same tree--and the covarion model, which uses a hidden Markov process to allow sites to switch between variable and invariable as they evolve. Under a variety of simple heterogeneous simulation conditions, the mixed model was dramatically more accurate than homotachous models, which were subject to topological biases as well as biases in branch length estimates. When data were simulated with strong versions of the types of heterotachy observed in real molecular sequences, the mixed branch length model was more accurate than homotachous techniques. Analyses of empirical data sets confirmed that the mixed branch length model can improve phylogenetic accuracy under conditions that cause homotachous models to fail. In contrast, the covarion model did not improve phylogenetic accuracy compared with homotachous models and was sometimes substantially less accurate. We conclude that a mixed branch length approach, although not the solution to all phylogenetic errors, is a valuable strategy for improving the accuracy of inferred trees. 相似文献

9.

The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees

Soria-Carrasco V Talavera G Igea J Castresana J 《Bioinformatics (Oxford, England)》2007,23(21):2954-2956

SUMMARY: We introduce a new phylogenetic comparison method that measures overall differences in the relative branch length and topology of two phylogenetic trees. To do this, the algorithm first scales one of the trees to have a global divergence as similar as possible to the other tree. Then, the branch length distance, which takes differences in topology and branch lengths into account, is applied to the two trees. We thus obtain the minimum branch length distance or K tree score. Two trees with very different relative branch lengths get a high K score whereas two trees that follow a similar among-lineage rate variation get a low score, regardless of the overall rates in both trees. There are several applications of the K tree score, two of which are explained here in more detail. First, this score allows the evaluation of the performance of phylogenetic algorithms, not only with respect to their topological accuracy, but also with respect to the reproduction of a given branch length variation. In a second example, we show how the K score allows the selection of orthologous genes by choosing those that better follow the overall shape of a given reference tree. AVAILABILITY: http://molevol.ibmb.csic.es/Ktreedist.html 相似文献

10.

iGLASS: an improvement to the GLASS method for estimating species trees from gene trees

Jewett EM Rosenberg NA 《Journal of computational biology》2012,19(3):293-315

Several methods have been designed to infer species trees from gene trees while taking into account gene tree/species tree discordance. Although some of these methods provide consistent species tree topology estimates under a standard model, most either do not estimate branch lengths or are computationally slow. An exception, the GLASS method of Mossel and Roch, is consistent for the species tree topology, estimates branch lengths, and is computationally fast. However, GLASS systematically overestimates divergence times, leading to biased estimates of species tree branch lengths. By assuming a multispecies coalescent model in which multiple lineages are sampled from each of two taxa at L independent loci, we derive the distribution of the waiting time until the first interspecific coalescence occurs between the two taxa, considering all loci and measuring from the divergence time. We then use the mean of this distribution to derive a correction to the GLASS estimator of pairwise divergence times. We show that our improved estimator, which we call iGLASS, consistently estimates the divergence time between a pair of taxa as the number of loci approaches infinity, and that it is an unbiased estimator of divergence times when one lineage is sampled per taxon. We also show that many commonly used clustering methods can be combined with the iGLASS estimator of pairwise divergence times to produce a consistent estimator of the species tree topology. Through simulations, we show that iGLASS can greatly reduce the bias and mean squared error in obtaining estimates of divergence times in a species tree. 相似文献

11.

Nucleotide substitution models and estimation of phylogeny

Håstad O Björklund M 《Molecular biology and evolution》1998,15(11):1381-1389

The nucleotide substitution matrix inferred from avian data sets using cytochrome b differs considerably from the models commonly used in phylogenetic analyses. To analyze the possible effects of this particular pattern of change in phylogeny estimation we performed a computer simulation in which we started with a real sequence and used the inferred model of change to produce a tree of 10 species. Maximum parsimony (MP), maximum likelihood (ML), and various distance methods were then used to recover the topology and the branch lengths. We used two kinds of data with varying levels of variation. In addition, we tested with the removal of third positions and different weighting schemes. At low levels of variation, MP was outstanding in recovering the topology (90% correct), while unweighted pair-group method, arithmetic average (UPGMA), regardless of distances used, was poor (40%). At the higher level, most methods had a chance of around 40%-58% of finding the true tree. However, in most cases, the trees found were only slightly wrong, with only one or a few branches misplaced. On the other hand, the use of a "wrong" model had serious effects on the estimation of branch lengths (distances). Although precision was high, accuracy was poor with most methods, giving branch lengths that were biased downward. When seeded with the true distance matrix, Fitch and NJ always found the true tree, while UPGMA frequently failed to do so. The effect of removing third positions was dramatic at low levels of variation, because only one MP program was able to find a true tree at all, albeit rarely, while none of the others ever did so. At higher levels, the situation was better, but still much worse than with the whole data set. 相似文献

12.

The devil in the details: interactions between the branch-length prior and likelihood model affect node support and branch lengths in the phylogeny of the Psoraceae

Ekman S Blaalid R 《Systematic biology》2011,60(4):541-561

In popular use of Bayesian phylogenetics, a default branch-length prior is almost universally applied without knowing how a different prior would have affected the outcome. We performed Bayesian and maximum likelihood (ML) inference of phylogeny based on empirical nucleotide sequence data from a family of lichenized ascomycetes, the Psoraceae, the morphological delimitation of which has been controversial. We specifically assessed the influence of the combination of Bayesian branch-length prior and likelihood model on the properties of the Markov chain Monte Carlo tree sample, including node support, branch lengths, and taxon stability. Data included two regions of the mitochondrial ribosomal RNA gene, the internal transcribed spacer region of the nuclear ribosomal RNA gene, and the protein-coding largest subunit of RNA polymerase II. Data partitioning was performed using Bayes' factors, whereas the best-fitting model of each partition was selected using the Bayesian information criterion (BIC). Given the data and model, short Bayesian branch-length priors generate higher numbers of strongly supported nodes as well as short and topologically similar trees sampled from parts of tree space that are largely unexplored by the ML bootstrap. Long branch-length priors generate fewer strongly supported nodes and longer and more dissimilar trees that are sampled mostly from inside the range of tree space sampled by the ML bootstrap. Priors near the ML distribution of branch lengths generate the best marginal likelihood and the highest frequency of "rogue" (unstable) taxa. The branch-length prior was shown to interact with the likelihood model. Trees inferred under complex partitioned models are more affected by the stretching effect of the branch-length prior. Fewer nodes are strongly supported under a complex model given the same branch-length prior. Irrespective of model, internal branches make up a larger proportion of total tree length under the shortest branch-length priors compared with longer priors. Relative effects on branch lengths caused by the branch-length prior can be problematic to downstream phylogenetic comparative methods making use of the branch lengths. Furthermore, given the same branch-length prior, trees are on average more dissimilar under a simple unpartitioned model compared with a more complex partitioned models. The distribution of ML branch lengths was shown to better fit a gamma or Pareto distribution than an exponential one. Model adequacy tests indicate that the best-fitting model selected by the BIC is insufficient for describing data patterns in 5 of 8 partitions. More general substitution models are required to explain the data in three of these partitions, one of which also requires nonstationarity. The two mitochondrial ribosomal RNA gene partitions need heterotachous models. We found no significant correlations between, on the one hand, the amount of ambiguous data or the smallest branch-length distance to another taxon and, on the other hand, the topological stability of individual taxa. Integrating over several exponentially distributed means under the best-fitting model, node support for the family Psoraceae, including Psora, Protoblastenia, and the Micarea sylvicola group, is approximately 0.96. Support for the genus Psora is distinctly lower, but we found no evidence to contradict the current classification. 相似文献

13.

A basic limitation on inferring phylogenies by pairwise sequence comparisons

Mike Steel 《Journal of theoretical biology》2009,256(3):467-589

Distance-based approaches in phylogenetics such as Neighbor-Joining are a fast and popular approach for building trees. These methods take pairs of sequences, and from them construct a value that, in expectation, is additive under a stochastic model of site substitution. Most models assume a distribution of rates across sites, often based on a gamma distribution. Provided the (shape) parameter of this distribution is known, the method can correctly reconstruct the tree. However, if the shape parameter is not known then we show that topologically different trees, with different shape parameters and associated positive branch lengths, can lead to exactly matching distributions on pairwise site patterns between all pairs of taxa. Thus, one could not distinguish between the two trees using pairs of sequences without some prior knowledge of the shape parameter. More surprisingly, this can happen for any choice of distinct shape parameters on the two trees, and thus the result is not peculiar to a particular or contrived selection of the shape parameters. On a positive note, we point out known conditions where identifiability can be restored (namely, when the branch lengths are clocklike, or if methods such as maximum likelihood are used). 相似文献

14.

Modelling heterotachy in phylogenetic inference by reversible-jump Markov chain Monte Carlo

Pagel M Meade A 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2008,363(1512):3955-3964

The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon--known as heterotachy--can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data. 相似文献

15.

A bias in ML estimates of branch lengths in the presence of multiple signals

Penny D White WT Hendy MD Phillips MJ 《Molecular biology and evolution》2008,25(2):239-242

Sequence data often have competing signals that are detected by network programs or Lento plots. Such data can be formed by generating sequences on more than one tree, and combining the results, a mixture model. We report that with such mixture models, the estimates of edge (branch) lengths from maximum likelihood (ML) methods that assume a single tree are biased. Based on the observed number of competing signals in real data, such a bias of ML is expected to occur frequently. Because network methods can recover competing signals more accurately, there is a need for ML methods allowing a network. A fundamental problem is that mixture models can have more parameters than can be recovered from the data, so that some mixtures are not, in principle, identifiable. We recommend that network programs be incorporated into best practice analysis, along with ML and Bayesian trees. 相似文献

16.

DISTANCE DATA REVISITED 总被引：1，自引：0，他引：1

JAMES S. FARRIS 《Cladistics : the international journal of the Willi Hennig Society》1985,1(1):67-86

Abstract— Objections to my earlier demonstration, that the branch lengths of trees fitted to distance matrices have no physical interpretation, are shown to be ill-founded. In particular the contention of Felsenstein, that fitted lengths estimate expectations of amounts of change, is shown to lead to a paradox. A method is introduced for constructing multiple trees of optimal or near-optimal fit to distance data, and this is found to give better performance than previous methods. Most published trees based on distances have been poorly chosen. Consensus trees of several trees with near-optimal fit are found to be quite poorly resolved, and it appears that molecular distances seldom provide much useful information on phylogenetic relationships. 相似文献

17.

Quartet-mapping, a generalization of the likelihood-mapping procedure. 总被引：5，自引：0，他引：5

K Nieselt-Struwe A von Haeseler 《Molecular biology and evolution》2001,18(7):1204-1219

Likelihood-mapping (LM) was suggested as a method of displaying the phylogenetic content of an alignment. However, statistical properties of the method have not been studied. Here we analyze the special case of a four-species tree generated under a range of evolution models and compare the results with those of a natural extension of the likelihood-mapping approach, geometry-mapping (GM), which is based on the method of statistical geometry in sequence space. The methods are compared in their abilities to indicate the correct topology. The performance of both methods in detecting the star topology is especially explored. Our results show that LM tends to reject a star tree more often than GM. When assumptions about the evolutionary model of the maximum-likelihood reconstruction are not matched by the true process of evolution, then LM shows a tendency to favor one tree, whereas GM correctly detects the star tree except for very short outer branch lengths with a statistical significance of >0.95 for all models. LM, on the other hand, reconstructs the correct bifurcating tree with a probability of >0.95 for most branch length combinations even under models with varying substitution rates. The parameter domain for which GM recovers the true tree is much smaller. When the exterior branch lengths are larger than a (analytically derived) threshold value depending on the tree shape (rather than the evolutionary model), GM reconstructs a star tree rather than the true tree. We suggest a combined approach of LM and GM for the evaluation of starlike trees. This approach offers the possibility of testing for significant positive interior branch lengths without extensive statistical and computational efforts. 相似文献

18.

Accuracy of estimated phylogenetic trees from molecular data 总被引：27，自引：0，他引：27

Masatoshi Nei Fumio Tajima Yoshio Tateno 《Journal of molecular evolution》1983,19(2):153-170

The accuracies and efficiencies of three different methods of making phylogenetic trees from gene frequency data were examined by using computer simulation. The methods examined are UPGMA, Farris' (1972) method, and Tateno et al.'s (1982) modified Farris method. In the computer simulation eight species (or populations) were assumed to evolve according to a given model tree, and the evolutionary changes of allele frequencies were followed by using the infinite-allele model. At the end of the simulated evolution five genetic distance measures (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's f theta, and the modified Cavalli-Sforza distance) were computed for all pairs of species, and the distance matrix obtained for each distance measure was used for reconstructing a phylogenetic tree. The phylogenetic tree obtained was then compared with the model tree. The results obtained indicate that in all tree-making methods examined the accuracies of both the topology and branch lengths of a reconstructed tree (rooted tree) are very low when the number of loci used is less than 20 but gradually increase with increasing number of loci. When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error as measured by the distortion index (dT) is not great, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci. When M is as small as 0.004, P is substantially lower. In obtaining a good topology (small dT and high P) UPGMA and the modified Farris method generally show a better performance than the Farris method. The poor performance of the Farris method is observed even when Rogers' distance which obeys the triangle inequality is used. The main reason for this seems to be that the Farris method often gives overestimates of branch lengths. For estimating the expected branch lengths of the true tree UPGMA shows the best performance. For this purpose Nei's standard distance gives a better result than the others because of its linear relationship with the number of gene substitutions. Rogers' or Cavalli-Sforza's distance gives a phylogenetic tree in which the parts near the root are condensed and the other parts are elongated. It is recommended that more than 30 loci, including both polymorphic and monomorphic loci, be used for making phylogenetic trees. The conclusions from this study seem to apply also to data on nucleotide differences obtained by the restriction enzyme techniques. 相似文献

19.

Detectability of the emerald ash borer (Coleoptera: Buprestidae) in asymptomatic urban trees by using branch samples

Ryall KL Fidgen JG Turgeon JJ 《Environmental entomology》2011,40(3):679-688

The emerald ash borer, Agrilus planipennis Fairmaire, is an exotic invasive insect causing extensive mortality to ash trees, Fraxinus spp., in Canada and the United States. Detection of incipient populations of this pest is difficult because of its cryptic life stages and a multiyear time lag between initial attack and the appearance of signs or symptoms of infestation. We sampled branches from open-grown urban ash trees to develop a sample unit suitable for detecting low density A. planipennis infestation before any signs or symptoms are evident. The sample unit that maximized detection rates consisted of one 50-cm-long piece from the base of a branch ≥6 cm diameter in the midcrown. The optimal sample size was two such branches per tree. This sampling method detected ≈75% of asymptomatic trees known to be infested by using more intensive sampling and ≈3 times more trees than sampling one-fourth of the circumference of the trunk at breast height. The method is less conspicuous and esthetically damaging to a tree than the removal of bark from the main stem or the use of trap trees, and could be incorporated into routine sanitation or maintenance of city-owned trees to identify and delineate infested areas. This research indicates that branch sampling greatly reduces false negatives associated with visual surveys and window sampling at breast height. Detection of A. planipennis-infested asymptomatic trees through branch sampling in urban centers would provide landowners and urban foresters with more time to develop and implement management tactics. 相似文献

20.

A conditional probability of reconstruction measure for internal cladogram branches

Zander RH 《Systematic biology》2001,50(3):425-437

The conditional probability of reconstruction is a measure of the robustness of cladogram internodes and, unlike Bremer support and bootstrapping values, directly gauges probability. The new method compares the three putative branch lengths (the optimal and two alternatives) obtained through branch recalculation after nearest neighbor interchange and recalculation under constraint. With rooted trees, one switches the two free lineages attached at the distal end of an internal branch with the basal lineage. Probabilistic reconstruction of a branch for small data sets (e.g., morphological) is defined as having no contrary support for the two alternative branches and, when sufficient data are available (e.g., molecular studies), as meeting a selected confidence limit in chi-squared analysis. The exact probability that the internal branch is reconstructed is the same as the preselected confidence level met with chi-squared analysis; alternatively, it is a simple calculation of the length of the optimal branch divided by the sum of the lengths of all three putative branches. This new measure of robustness allows calculation of summary probabilities of subclade and tree reconstruction. The measure is conditional on a particular data set and optimization method but may also compare support from conflicting gene trees. Examples are provided by a morphological data set (the bryophyte Didymodon) and a molecular data set (primates). 相似文献