首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Greater phylogenetic signal is often found in parsimony-based analyses of third codon positions of protein-coding genes relative to their corresponding first and second codon positions, even for early-derived ("basal") clades. We used the Soltis et al. (2000; Bot. J. Linn. Soc. 133:381-461) data matrix of atpB and rbcL from 567 seed plants to quantify how each of six factors (observed character-state space, frequencies of observed character states, substitution probabilities among nucleotides, rate heterogeneity among sites, overall rate of evolution, and number of parsimony-informative characters) contributed to this phenomenon. Each of these six factors was estimated from the original data matrix for parsimony-informative third codon positions considered separately from first and second codon positions combined. One of the most parsimonious trees found was used as the constraint topology; branch lengths were estimated using likelihood-based distances, and characters were simulated on this tree. Differential frequencies of observed character states were found to be the most limiting of the factors simulated for all three codon positions. Differential frequencies of observed character states and differential substitution probabilities among states were relatively advantageous for first and second codon positions. In contrast, differential numbers of observed character states, differential rate heterogeneity among sites, the greater number of parsimony-informative characters, and the higher overall rate of evolution were relatively advantageous for third codon positions. The amount of possible synapomorphy was predictive of the overall success of resolution.  相似文献   

2.
Likelihood-based phylogenetic inference posits a probabilistic model of character state change along branches of a phylogenetic tree. These models typically assume statistical independence of sites in the sequence alignment. This is a restrictive assumption that facilitates computational tractability, but ignores how epistasis, the effect of genetic background on mutational effects, influences the evolution of functional sequences. We consider the effect of using a misspecified site-independent model on the accuracy of Bayesian phylogenetic inference in the setting of pairwise-site epistasis. Previous work has shown that as alignment length increases, tree reconstruction accuracy also increases. Here, we present a simulation study demonstrating that accuracy increases with alignment size even if the additional sites are epistatically coupled. We introduce an alignment-based test statistic that is a diagnostic for pairwise epistasis and can be used in posterior predictive checks.  相似文献   

3.
Among-site rate variation (alpha) and transition bias (kappa) have been shown, most often as independent parameters, to be important dynamics in DNA evolution. Accounting for these dynamics should result in better estimates of phylogenetic relationships. To test this idea, we simultaneously estimated overall (averaged over all codon positions) and codon-specific values of alpha and kappa, using maximum likelihood analyses of cytochrome b data from all genera of pipits and wagtails (Aves: Motacillidae), and six outgroup species, using initial trees generated with default values. Estimates of alpha and kappa were robust to initial tree topology and suggested substantial among-site rate variation even within codon classes; alpha was lowest (large among-site rate variation) at second-codon and highest (low among-site rate variation) at third-codon positions. When overall values were applied, there were shifts in tree topology and dramatic and statistically significant improvements in log-likelihood scores of trees compared with the scores from application of default values. Applying codon-specific values resulted in yet another highly significant increase in likelihood. However, although incorporating substitution dynamics into maximum likelihood, maximum parsimony, and neighbor-joining analyses resulted in increases in congruence among trees, there were only minor improvements in phylogenetic signal, and none of the successive approximations tree topologies were statistically distinguishable from one another by the data. We suggest that the bushlike nature of many higher-level phylogenies in birds makes estimating the dynamics of DNA evolution less sensitive to tree topology but also less susceptible to improvement via weighting.  相似文献   

4.
Synapomorphies are fundamental to phylogenetic systematics as they offer empirical evidence of monophyletic groups. However, no method exists to directly measure synapomorphy. Here, we propose a method that quantifies synapomorphy using the pattern of character state distribution over a cladogram separately for each character and for each clade. We define a fully synapomorphic character state as one shared by all of a clade’s terminal taxa and at the same time completely absent from all terminal taxa outside that clade. The extent to which this condition is met corresponds to the support for the character state being synapomorphic or, in short, support for synapomorphy. It is calculated as the probability of randomly selecting, by multi‐stage sampling following the topology of the tree, two terminals from inside a clade sharing the same character state and one terminal from outside the clade bearing a different character state. The method is independent of tree inference and free of transformational assumptions, and so can be applied to any tree and used for any type of discrete character. By measuring synapomorphy, the method offers a potential tool for determining diagnostic character states for taxa on different hierarchical levels, for evaluating alternative systems of character coding, and for evaluating clade support. We show how the method differs from ancestral character state reconstruction methods and goodness‐of‐fit indices. We demonstrate the behaviour of our method with several hypothetical scenarios and its potential use with two real‐life examples.  相似文献   

5.
We examined how alignment of internal transcribed spacers of rDNA in fungi and plants changes with increasing genetic distance by successive removal of sequences from each data set followed by realignment and phylogenetic analysis. Increasing genetic distance can negatively affect phylogenetic reconstruction in two ways. First, it may cause errors in the alignment and therefore the homology hypotheses of the sequence characters. Second, it may cause errors in the homology assessments of character states because of multiple hits on individual branches. These two causes of error in phylogenetic inference were distinguished from one another in our analysis. The errors in alignment caused by increasing genetic distance were primarily due to inserting too few gaps and inserting gaps at the wrong positions. Errors in tree resolution, topology, and/or branch-support values were more often caused by multiple hits than by misaligned positions. This suggests that increasing genetic distance negatively affects our primary homology assessments of character states more severely than our primary homology assessments of characters. We suggest that increasing taxon sampling with the aim of subdividing long branches is a strategy for obtaining reliable alignments.  相似文献   

6.
A new method for phylogenetic inference, Strongest Evidence (SE), is described. In this method, a character's support for a phylogenetic hypothesis, its apparent phylogenetic signal, is greatest when the amount of implied homoplasy is most remarkably small given background knowledge alone. Because evolutionary rates are not assumed to be slow, background expectations for character length can be derived through modeling complete dissociation between branching pattern and character state assignments. As in unweighted parsimony, SE holds that fewer required evolutionary steps in a character indicates stronger support for a tree. However, in SE, the relationship between steps and support differs by unlabeled tree topology and character state distribution. Strongest evidence is contrasted in detail with both unweighted parsimony and Goloboff's method of implied weights. An iterative process is suggested for incrementally resolving a phylogenetic hypothesis while conducting cladistic analyses at increasingly local levels.  相似文献   

7.
Quantification of the success of phylogenetic inference in simulations   总被引:1,自引:0,他引:1  
For phylogenetic simulation studies, the accuracy of topological reconstruction obtained from different data matrices or different methods of phylogenetic inference generally needs to be quantified. Two components of performance within this context are: (1) how the inferred tree topology matches or conflicts with the correct tree topology, and (2) the branch support assigned to both correctly and incorrectly resolved clades. We present a method (averaged overall success of resolution) that incorporates both of these components. Branch support is incorporated in the averaged overall success of resolution by linearly scaling the observed support relative to that conferred by uncontradicted synapomorphies. We believe that this method represents an improvement relative to the commonly used approaches of quantifying the percentage of clades that are correctly resolved in the inferred trees or presenting the Robinson–Foulds distance between the inferred trees and the correct tree. In contrast to Bremer support, the averaged overall success of resolution may be applied equally well to distance, likelihood and parsimony analyses. © The Willi Hennig Society 2006.  相似文献   

8.
The bootstrap is an important tool for estimating the confidence interval of monophyletic groups within phylogenies. Although bootstrap analyses are used in most evolutionary studies, there is no clear consensus as how best to interpret bootstrap probability values. To study further the bootstrap method, nine small subunit ribosomal DNA (SSU rDNA) data sets were submitted to bootstrapped maximum parsimony (MP) analyses using unweighted and weighted sequence positions. Analyses of the lengths (i.e., parsimony steps) of the bootstrap trees show that the shape and mean of the bootstrap tree distribution may provide important insights into the evolutionary signal within the sequence data. With complex phylogenies containing nodes defined by short internal branches (multifurcations), the mean of the bootstrap tree distribution may differ by 2 standard deviations from the length of the best tree found from the original data set. Weighting sequence positions significantly increases the bootstrap values at internal nodes. There may, however, be strong bootstrap support for conflicting species groupings among different data sets. This phenomenon appears to result from a correlation between the topology of the tree used to create the weights and the topology of the bootstrap consensus tree inferred from the MP analysis of these weighted data. The analyses also show that characteristics of the bootstrap tree distribution (e.g., skewness) may be used to choose between alternative weighting schemes for phylogenetic analyses.  相似文献   

9.
Considerable confusion remains among theoreticians and practicioners of phylogenetic science on the use of outgroup taxa. Here, we show that, despite claims to the contrary, details of the optimal ingroup topology can be changed by switching outgroup taxa. This has serious implications for phylogenetic accuracy. We delineate between the process of outgroup selection and the various possible processes involved in using an outgroup taxon after one has been selected. Criteria are needed for the determination that particular outgroup taxa do not reduce the accuracy of evolutionary tree topologies and inferred character state transformations. We compare previous results from a sensitivity bootstrap analysis of the mitochondrial cytochromebphylogenetic relationships among whales to the results of a Bremer support sensitivity analysis and of a recently developed application of RASA theory to the question of putative outgroup taxon plesiomorphy content.  相似文献   

10.
The ability to generate large molecular datasets for phylogenetic studies benefits biologists, but such data expansion introduces numerous analytical problems. A typical molecular phylogenetic study implicitly assumes that sequences evolve under stationary, reversible and homogeneous conditions, but this assumption is often violated in real datasets. When an analysis of large molecular datasets results in unexpected relationships, it often reflects violation of phylogenetic assumptions, rather than a correct phylogeny. Molecular evolutionary phenomena such as base compositional heterogeneity and among‐site rate variation are known to affect phylogenetic inference, resulting in incorrect phylogenetic relationships. The ability of methods to overcome such bias has not been measured on real and complex datasets. We investigated how base compositional heterogeneity and among‐site rate variation affect phylogenetic inference in the context of a mitochondrial genome phylogeny of the insect order Coleoptera. We show statistically that our dataset is affected by base compositional heterogeneity regardless of how the data are partitioned or recoded. Among‐site rate variation is shown by comparing topologies generated using models of evolution with and without a rate variation parameter in a Bayesian framework. When compared for their effectiveness in dealing with systematic bias, standard phylogenetic methods tend to perform poorly, and parsimony without any data transformation performs worst. Two methods designed specifically to overcome systematic bias, LogDet and a Bayesian method implementing variable composition vectors, can overcome some level of base compositional heterogeneity, but are still affected by among‐site rate variation. A large degree of variation in both noise and phylogenetic signal among all three codon positions is observed. We caution and argue that more data exploration is imperative, especially when many genes are included in an analysis.  相似文献   

11.
Near-full-length 18S and 28S rRNA gene sequences were obtained for 33 nematode species. Datasets were constructed based on secondary structure and progressive multiple alignments, and clades were compared for phylogenies inferred by Bayesian and maximum likelihood methods. Clade comparisons were also made following removal of ambiguously aligned sites as determined using the program ProAlign. Different alignments of these data produced tree topologies that differed, sometimes markedly, when analyzed by the same inference method. With one exception, the same alignment produced an identical tree topology when analyzed by different methods. Removal of ambiguously aligned sites altered the tree topology and also reduced resolution. Nematode clades were sensitive to differences in multiple alignments, and more than doubling the amount of sequence data by addition of 28S rRNA did not fully mitigate this result. Although some individual clades showed substantially higher support when 28S data were combined with 18S data, the combined analysis yielded no statistically significant increases in the number of clades receiving higher support when compared to the 18S data alone. Secondary structure alignment increased accuracy in positional homology assignment and, when used in combination with paired-site substitution models, these structural hypotheses of characters and improved models of character state change yielded high levels of phylogenetic resolution. Phylogenetic results included strong support for inclusion of Daubaylia potomaca within Cephalobidae, whereas the position of Fescia grossa within Tylenchina varied depending on the alignment, and the relationships among Rhabditidae, Diplogastridae, and Bunonematidae were not resolved.  相似文献   

12.
Recent years have seen an increasing effort to incorporate phylogenetic hypotheses to the study of community assembly processes. The incorporation of such evolutionary information has been eased by the emergence of specialized software for the automatic estimation of partially resolved supertrees based on published phylogenies. Despite this growing interest in the use of phylogenies in ecological research, very few studies have attempted to quantify the potential biases related to the use of partially resolved phylogenies and to branch length accuracy, and no work has examined how tree shape may affect inference of community phylogenetic metrics. In this study, we tested the influence of phylogenetic resolution and branch length information on the quantification of phylogenetic structure, and also explored the impact of tree shape (stemminess) on the loss of accuracy in phylogenetic structure quantification due to phylogenetic resolution. For this purpose, we used 9 sets of phylogenetic hypotheses of varying resolution and branch lengths to calculate three indices of phylogenetic structure: the mean phylogenetic distance (NRI), the mean nearest taxon distance (NTI) and phylogenetic diversity (stdPD) metrics. The NRI metric was the less sensitive to phylogenetic resolution, stdPD showed an intermediate sensitivity, and NTI was the most sensitive one; NRI was also less sensitive to branch length accuracy than NTI and stdPD, the degree of sensitivity being strongly dependent on the dating method and the sample size. Directional biases were generally towards type II errors. Interestingly, we detected that tree shape influenced the accuracy loss derived from the lack of phylogenetic resolution, particularly for NRI and stdPD. We conclude that well‐resolved molecular phylogenies with accurate branch length information are needed to identify the underlying phylogenetic structure of communities, and also that sensitivity of phylogenetic structure measures to low phylogenetic resolution can strongly vary depending on phylogenetic tree shape.  相似文献   

13.
SUMMARY: We introduce a new phylogenetic comparison method that measures overall differences in the relative branch length and topology of two phylogenetic trees. To do this, the algorithm first scales one of the trees to have a global divergence as similar as possible to the other tree. Then, the branch length distance, which takes differences in topology and branch lengths into account, is applied to the two trees. We thus obtain the minimum branch length distance or K tree score. Two trees with very different relative branch lengths get a high K score whereas two trees that follow a similar among-lineage rate variation get a low score, regardless of the overall rates in both trees. There are several applications of the K tree score, two of which are explained here in more detail. First, this score allows the evaluation of the performance of phylogenetic algorithms, not only with respect to their topological accuracy, but also with respect to the reproduction of a given branch length variation. In a second example, we show how the K score allows the selection of orthologous genes by choosing those that better follow the overall shape of a given reference tree. AVAILABILITY: http://molevol.ibmb.csic.es/Ktreedist.html  相似文献   

14.
Interspecific and intergeneric relationships of Prunus s.l. are still unclear due to low levels of genetic variation among species, and resulting partially unresolved phylogenetic inferences. Here we sequenced and compared six complete plastomes from two subgenera of Prunus in order to choose molecular markers to increase the amount of genetic variation suitable for inference of Prunus phylogeny. The plastomes range between 157 817 and 158 995 bp in length, and we found different levels of inverted repeat (IR) contraction among the three sampled subgenera of Prunus s.l. Most regions in Prunus plastomes considered individually provide low phylogenetic resolution at the subgenus or species level compared to a tree constructed using all 78 coding regions combined. We compared levels of variation among 206 coding regions and noncoding (intergenic and intron) plastid regions and inferred phylogenies from each region considered individually. We then chose using two regions together for future studies of relationships in Prunus, ycf1 and trnT-L, that display high to moderate levels of variation among coding and intergenic regions, respectively, and that individually permit inference of resolved species-level trees in Prunus with moderate to strong branch support. Considered together, these two regions allow inference of the same topology of Prunus inferred using all coding plastid regions combined, with comparable levels of tree support to the full plastome set. These two loci should therefore be useful as a plastid phylogenetic marker set for further inference of relationships within Prunus s.l.  相似文献   

15.
Under a coalescent model for within-species evolution, gene trees may differ from species trees to such an extent that the gene tree topology most likely to evolve along the branches of a species tree can disagree with the species tree topology. Gene tree topologies that are more likely to be produced than the topology that matches that of the species tree are termed anomalous, and the region of branch-length space that gives rise to anomalous gene trees (AGTs) is the anomaly zone. We examine the occurrence of anomalous gene trees for the case of five taxa, the smallest number of taxa for which every species tree topology has a nonempty anomaly zone. Considering all sets of branch lengths that give rise to anomalous gene trees, the largest value possible for the smallest branch length in the species tree is greater in the five-taxon case (0.1934 coalescent time units) than in the previously studied case of four taxa (0.1568). The five-taxon case demonstrates the existence of three phenomena that do not occur in the four-taxon case. First, anomalous gene trees can have the same unlabeled topology as the species tree. Second, the anomaly zone does not necessarily enclose a ball centered at the origin in branch-length space, in which all branches are short. Third, as a branch length increases, it is possible for the number of AGTs to increase rather than decrease or remain constant. These results, which help to describe how the properties of anomalous gene trees increase in complexity as the number of taxa increases, will be useful in formulating strategies for evading the problem of anomalous gene trees during species tree inference from multilocus data.  相似文献   

16.
Species-level phylogenetic studies require fast-evolving nucleotide positions to resolve relationships among close relatives, but these sites may be highly homoplastic and perhaps uninformative or even misleading deeper in the tree. Here we describe a species-level analysis of tiger beetles in the genus Cicindela (Coleoptera: Cicindelidae) for 132 terminal taxa and 1897 nucleotide positions from three regions of mtDNA, comprising 75% coverage of species occurring in North America. Evenly weighted parsimony analysis recovered four major clades representing radiations confined to North and Central America. Relationships near the tips were well supported but signal was contradictory at deeper nodes. Two major categories (3rd positions and all others) can be distinguished in likelihood analysis of character variation, of which only the fast-changing 3rd position characters were affected by saturation. However, their downweighting under a variety of criteria did not improve the tree topology at basal nodes. There was weak conflict between 3rd and non-3rd position characters deep in the tree, but support levels declined towards the root for all categories, even on trees that were reconstructed from 3rd and non-3rd positions separately. Statistical analysis of parsimony-based character transitions along branches showed a largely homogeneous distribution of change along the root-to-tip axis. The comparison of character transitions among the four major portions of the tree revealed deviations from stochastic distribution for the non-3rd positions, but not for 3rd positions. Hence, variability of functionally constrained non-3rd positions differs between clades and may be dependent on the character states at other sites, consistent with the covarion model of molecular evolution. The results suggest that some properties of 3rd positions are less problematic for phylogenetic reconstruction than other categories despite their high total homoplasy. In densely sampled data sets of closely related species, the disadvantages of weighting schemes according to homoplasy levels outweigh the benefits, showing the difficulty of devising meaningful weighting schemes that are applicable universally throughout the tree.  相似文献   

17.
Numerous simulation studies have investigated the accuracy of phylogenetic inference of gene trees under maximum parsimony, maximum likelihood, and Bayesian techniques. The relative accuracy of species tree inference methods under simulation has received less study. The number of analytical techniques available for inferring species trees is increasing rapidly, and in this paper, we compare the performance of several species tree inference techniques at estimating recent species divergences using computer simulation. Simulating gene trees within species trees of different shapes and with varying tree lengths (T) and population sizes (), and evolving sequences on those gene trees, allows us to determine how phylogenetic accuracy changes in relation to different levels of deep coalescence and phylogenetic signal. When the probability of discordance between the gene trees and the species tree is high (i.e., T is small and/or is large), Bayesian species tree inference using the multispecies coalescent (BEST) outperforms other methods. The performance of all methods improves as the total length of the species tree is increased, which reflects the combined benefits of decreasing the probability of discordance between species trees and gene trees and gaining more accurate estimates for gene trees. Decreasing the probability of deep coalescences by reducing also leads to accuracy gains for most methods. Increasing the number of loci from 10 to 100 improves accuracy under difficult demographic scenarios (i.e., coalescent units ≤ 4N(e)), but 10 loci are adequate for estimating the correct species tree in cases where deep coalescence is limited or absent. In general, the correlation between the phylogenetic accuracy and the posterior probability values obtained from BEST is high, although posterior probabilities are overestimated when the prior distribution for is misspecified.  相似文献   

18.
Interest in methods that estimate speciation and extinction rates from molecular phylogenies has increased over the last decade. The application of such methods requires reliable estimates of tree topology and node ages, which are frequently obtained using standard phylogenetic inference combining concatenated loci and molecular dating. However, this practice disregards population‐level processes that generate gene tree/species tree discordance. We evaluated the impact of employing concatenation and coalescent‐based phylogeny inference in recovering the correct macroevolutionary regime using simulated data based on the well‐established diversification rate shift of delphinids in Cetacea. We found that under scenarios of strong incomplete lineage sorting, macroevolutionary analysis of phylogenies inferred by concatenating loci failed to recover the delphinid diversification shift, while the coalescent‐based tree consistently retrieved the correct rate regime. We suggest that ignoring microevolutionary processes reduces the power of methods that estimate macroevolutionary regimes from molecular data.  相似文献   

19.
Different genes often have different phylogenetic histories. Even within regions having the same phylogenetic history, the mutation rates often vary. We investigate the prospects of phylogenetic reconstruction when all the characters are generated from the same tree topology, but the branch lengths vary (with possibly different tree shapes). Furthering work of Kolaczkowski and Thornton (2004, Nature 431: 980-984) and Chang (1996, Math. Biosci. 134: 189-216), we show examples where maximum likelihood (under a homogeneous model) is an inconsistent estimator of the tree. We then explore the prospects of phylogenetic inference under a heterogeneous model. In some models, there are examples where phylogenetic inference under any method is impossible - despite the fact that there is a common tree topology. In particular, there are nonidentifiable mixture distributions, i.e., multiple topologies generate identical mixture distributions. We address which evolutionary models have nonidentifiable mixture distributions and prove that the following duality theorem holds for most DNA substitution models. The model has either: (i) nonidentifiability - two different tree topologies can produce identical mixture distributions, and hence distinguishing between the two topologies is impossible; or (ii) linear tests - there exist linear tests which identify the common tree topology for character data generated by a mixture distribution. The theorem holds for models whose transition matrices can be parameterized by open sets, which includes most of the popular models, such as Tamura-Nei and Kimura's 2-parameter model. The duality theorem relies on our notion of linear tests, which are related to Lake's linear invariants.  相似文献   

20.
In this study we use sensitivity analysis sensu Wheeler (1995 ) for a matrix entirely composed of DNA sequences. We propose that not only congruence but also phylogenetic structure, as measured by character resampling, should be used to choose among competing weighting regimes. An extensive analysis of a five‐gene data set for Themira (Sepsidae: Diptera) reveals that even with different ways of partitioning the data, measures of topological congruence, character incongruence, and phylogenetic structure favor similar weighting regimes involving the down‐weighting of transitions. We furthermore use sensitivity analysis for obtaining empirical evidence that allows us to select weights for third positions, deciding between treating indels as fifth character states or missing values, and choosing between manual and computational alignments. For our data, sensitivity analysis favors manual alignment over a Clustal‐generated numerical alignment, the treatment of indels as fifth character states over considering them missing values, and equal weights for all positions in protein‐encoding genes over the down‐weighting of third positions. Among the topological congruence measures compared, symmetric tree distance performed best. Partitioned Bremer Support analysis reveals that COI contributes the largest amount of support for our phylogenetic tree for Themira. © The Willi Hennig Society 2005.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号