首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
2.
A common pattern found in phylogeny-based empirical studies of diversification is a decrease in the rate of lineage accumulation toward the present. This early-burst pattern of cladogenesis is often interpreted as a signal of adaptive radiation or density-dependent processes of diversification. However, incomplete taxonomic sampling is also known to artifactually produce patterns of rapid initial diversification. The Monte Carlo constant rates (MCCR) test, based upon Pybus and Harvey's gamma (γ)-statistic, is commonly used to accommodate incomplete sampling, but this test assumes that missing taxa have been randomly pruned from the phylogeny. Here we use simulations to show that preferentially sampling disparate lineages within a clade can produce severely inflated type-I error rates of the MCCR test, especially when taxon sampling drops below 75%. We first propose two corrections for the standard MCCR test, the proportionally deeper splits that assumes missing taxa are more likely to be recently diverged, and the deepest splits only MCCR that assumes that all missing taxa are the youngest lineages in the clade, and assess their statistical properties. We then extend these two tests into a generalized form that allows the degree of nonrandom sampling (NRS)to be controlled by a scaling parameter, α. This generalized test is then applied to two recent studies. This new test allows systematists to account for nonrandom taxonomic sampling when assessing temporal patterns of lineage diversification in empirical trees. Given the dramatic affect NRS can have on the behavior of the MCCR test, we argue that evaluating the sensitivity of this test to NRS should become the norm when investigating patterns of cladogenesis in incompletely sampled phylogenies.  相似文献   

3.
Noise     
The proliferation of DNA sequence data has generated a concern about the effects of "noise" on phylogeny reconstruction. This concern has led to various recommendations for weighting schemes and for separating data types prior to analysis. A new technique is explored to examine directly how noise influences the stability of parsimony reconstruction. By appending purely random characters onto a matrix of pure signal, or by replacing characters in a matrix of signal by random states, one can measure the degree to which a matrix is robust against noise. Reconstructions were sensitive to tree topology and clade size when noise was added, but were less so when character states were replaced with noise. When a signal matrix is complemented with a noise matrix of equal size, parsimony will trace the original signal about half the time when there is only one synapomorphy per node, and about 90% of the time when there are three synapomorphies per node. Similar results obtain when 20% of a matrix is replaced by noise. Successive weighting does not improve performance. Adding noise to only some taxa is more damaging, but replacing characters in only some taxa is less so. The bootstrap and g1 (tree skewness) statistics are shown to be uninterpretable measures of noise or departures from randomness. Empirical data sets illustrate that commonly recommended schemes of differential weighting (e.g. downweighting third positions) are not well supported from the point of view of reducing the influence of noise nor are more noisy data sets likely to degrade signal found in less noisy data sets.  相似文献   

4.
Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user’s query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.  相似文献   

5.
Combining data sets with different phylogenetic histories   总被引:1,自引:0,他引:1  
The possibility that two data sets may have different underlying phylogenetic histories (such as gene trees that deviate from species trees) has become an important argument against combining data in phylogenetic analysis. However, two data sets sampled for a large number of taxa may differ in only part of their histories. This is a realistic scenario and one in which the relative advantages of combined, separate, and consensus analysis become much less clear. I propose a simple methodology for dealing with this situation that involves (1) partitioning the available data to maximize detection of different histories, (2) performing separate analyses of the data sets, and (3) combining the data but considering questionable or unresolved those parts of the combined tree that are strongly contested in the separate analyses (and which therefore may have different histories) until a majority of unlinked data sets support one resolution over another. In support of this methodology, computer simulations suggest that (1) the accuracy of combined analysis for recovering the true species phylogeny may exceed that of either of two separately analyzed data sets under some conditions, particularly when the mismatch between phylogenetic histories is small and the estimates of the underlying histories are imperfect (few characters, high homoplasy, or both) and (2) combined analysis provides a poor estimate of the species tree in areas of the phylogenies with different histories but gives an improved estimate in regions that share the same history. Thus, when there is a localized mismatch between the histories of two data sets, the separate, consensus, and combined analyses may all give unsatisfactory results in certain parts of the phylogeny. Similarly, approaches that allow data combination only after a global test of heterogeneity will suffer from the potential failings of either separate or combined analysis, depending on the outcome of the test. Excision of conflicting taxa is also problematic, in that doing so may obfuscate the position of conflicting taxa within a larger tree, even when their placement is congruent between data sets. Application of the proposed methodology to molecular and morphological data sets for Sceloporus lizards is discussed.  相似文献   

6.
Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.  相似文献   

7.
A central question concerning data collection strategy for molecular phylogenies has been, is it better to increase the number of characters or the number of taxa sampled to improve the robustness of a phylogeny estimate? A recent simulation study concluded that increasing the number of taxa sampled is preferable to increasing the number of nucleotide characters, if taxa are chosen specifically to break up long branches. We explore this hypothesis by using empirical data from noctuoid moths, one of the largest superfamilies of insects. Separate studies of two nuclear genes, elongation factor-1 alpha (EF-1 alpha) and dopa decarboxylase (DDC), have yielded similar gene trees and high concordance with morphological groupings for 49 exemplar species. However, support levels were quite low for nodes deeper than the subfamily level. We tested the effects on phylogenetic signal of (1) increasing the taxon sampling by nearly 60%, to 77 species, and (2) combining data from the two genes in a single analysis. Surprisingly, the increased taxon sampling, although designed to break up long branches, generated greater disagreement between the two gene data sets and decreased support levels for deeper nodes. We appear to have inadvertently introduced new long branches, and breaking these up may require a yet larger taxon sample. Sampling additional characters (combining data) greatly increased the phylogenetic signal. To contrast the potential effect of combining data from independent genes with collection of the same total number of characters from a single gene, we simulated the latter by bootstrap augmentation of the single-gene data sets. Support levels for combined data were at least as high as those for the bootstrap-augmented data set for DDC and were much higher than those for the augmented EF-1 alpha data set. This supports the view that in obtaining additional sequence data to solve a refractory systematic problem, it is prudent to take them from an independent gene.  相似文献   

8.
The maximum parsimony (MP) method for inferring phylogenies is widely used, but little is known about its limitations in non-asymptotic situations. This study employs large-scale computations with simulated phylogenetic data to estimate the probability that MP succeeds in finding the true phylogeny for up to twelve taxa and 256 characters. The set of candidate phylogenies are taken to be unrooted binary trees; for each simulated data set, the tree lengths of all (2n − 5)!! candidates are computed to evaluate quantities related to the performance of MP, such as the probability of finding the true phylogeny, the probability that the tree with the shortest length is unique, the probability that the true phylogeny has the shortest tree length, and the expected inverse of the number of trees sharing the shortest length. The tree length distributions are also used to evaluate and extend the skewness test of Hillis for distinguishing between random and phylogenetic data. The results indicate, for example, that the critical point after which MP achieves a success probability of at least 0.9 is roughly around 128 characters. The skewness test is found to perform well on simulated data and the study extends its scope to up to twelve taxa.  相似文献   

9.
Cocoon spinning was analysed, using video recording and playback, in eighteen Nearctic black fly species, comparing nine Simulium species, six Eusimulium species, Stegopterna mutata, Cnephia dacotensis and Prosimulium mixtum. Fourteen behavioural characters were revealed that produced twenty-two equally parsimonious trees (CI = 0.93, RI = 0.96). Another tree was constructed on the basis of five characters relating to the cocoon structure (end-product characters). The goal of the study was to determine whether characters relating to behavioural components of black fly cocoon spinning or those based on end-products of the behaviour are superior for revealing phylogenetic relationships. This was accomplished by comparing both data sets to a phylogeny constructed with the use of cytological and morphological characters. If taxa are grouped according to end-products (the cocoons) there are some spurious groupings. The behavioural analysis only required one extra step to duplicate the morphological and cytological tree. In the case of black flies, it is more informative to use characters resulting from the analysis of the cocoon spinning behaviour than cocoon morphology.  相似文献   

10.
11.
One of the major issues in phylogenetic analysis is that gene genealogies from different gene regions may not reflect the true species tree or history of speciation. This has led to considerable debate about whether concatenation of loci is the best approach for phylogenetic analysis. The application of Next‐generation sequencing techniques such as RAD‐seq generates thousands of relatively short sequence reads from across the genomes of the sampled taxa. These data sets are typically concatenated for phylogenetic analysis leading to data sets that contain millions of base pairs per taxon. The influence of gene region conflict among so many loci in determining the phylogenetic relationships among taxa is unclear. We simulated RAD‐seq data by sampling 100 and 500 base pairs from alignments of over 6000 coding regions that each produce one of three highly supported alternative phylogenies of seven species of Drosophila. We conducted phylogenetic analyses on different sets of these regions to vary the sampling of loci with alternative gene trees to examine the effect on detecting the species tree. Irrespective of sequence length sampled per region and which subset of regions was used, phylogenetic analyses of the concatenated data always recovered the species tree. The results suggest that concatenated alignments of Next‐generation data that consist of many short sequences are robust to gene tree/species tree conflict when the goal is to determine the phylogenetic relationships among taxa.  相似文献   

12.
We present the first multi-locus chloroplast phylogeny of Arthrostylidiinae, a subtribe of neotropical woody bamboos. The morphological diversity of Arthrostylidiinae makes its taxonomy difficult and prior molecular analyses of bamboos have lacked breadth of sampling within the subtribe, leaving internal relationships uncertain. We sampled 51 taxa, chosen to span the range of taxonomic diversity and morphology, and analyzed a combined chloroplast DNA dataset with six chloroplast regions: ndhF, trnD-trnT, trnC-rpoB, rps16-trnQ, trnT-trnL, and rpl16. A consensus of maximum parsimony and Bayesian inference analyses reveals monophyly of the Arthrostylidiinae and four moderately supported lineages within it. Six previously recognized genera were monophyletic, three polyphyletic, and two monotypic; Rhipidocladum sect. Didymogonyx is here raised to generic status. When mapped onto our topology, many of the morphological characters show homoplasy.  相似文献   

13.
类群取样与系统发育分析精确度之探索   总被引:6,自引:2,他引:4  
Appropriate and extensive taxon sampling is one of the most important determinants of accurate phylogenetic estimation. In addition, accuracy of inferences about evolutionary processes obtained from phylogenetic analyses is improved significantly by thorough taxon sampling efforts. Many recent efforts to improve phylogenetic estimates have focused instead on increasing sequence length or the number of overall characters in the analysis, and this often does have a beneficial effect on the accuracy of phylogenetic analyses. However, phylogenetic analyses of few taxa (but each represented by many characters) can be subject to strong systematic biases, which in turn produce high measures of repeatability (such as bootstrap proportions) in support of incorrect or misleading phylogenetic results. Thus, it is important for phylogeneticists to consider both the sampling of taxa, as well as the sampling of characters, in designing phylogenetic studies. Taxon sampling also improves estimates of evolutionary parameters derived from phylogenetic trees, and is thus important for improved applications of phylogenetic analyses. Analysis of sensitivity to taxon inclusion, the possible effects of long-branch attraction, and sensitivity of parameter estimation for model-based methods should be a part of any careful and thorough phylogenetic analysis. Furthermore, recent improvements in phylogenetic algorithms and in computational power have removed many constraints on analyzing large, thoroughly sampled data sets. Thorough taxon sampling is thus one of the most practical ways to improve the accuracy of phylogenetic estimates, as well as the accuracy of biological inferences that are based on these phylogenetic trees.  相似文献   

14.
15.
Phylogenetic analyses that incorporate the most character information also provide the most explanatory power. Here I demonstrate the value of such an approach through a direct optimization sensitivity analysis of apid bee phylogeny. Whereas prior studies have relied solely on one class of data or the other, this analysis combines previously published molecular, morphological, and behavioural characters into a single supermatrix. The final dataset includes 191 ingroup and 30 outgroup taxa, and includes data from seven unaligned gene sequences (18S, 28S, wingless, EF1‐α, polII, Nak, LW rhodopsin), 209 adult and larval morphological characters, and two behavioural characters. Nine different sets of transformation cost parameters are evaluated, along with their relative degrees of character incongruence. The preferred parameter set returns a strict consensus tree somewhat similar to, but more resolved than, a previous parsimony tree based on molecules alone. I also describe the effects of including EF1‐α and LW rhodopsin intron sequences on the outcome of the direct optimization analysis. By accounting for more evidence, this study provides the most comprehensive treatment yet of apid phylogenetic relationships.  相似文献   

16.
Estimating species phylogeny from a single gene tree can be especially problematic for studies of species flocks in which diversification has been rapid. Here we compare a phylogenetic hypothesis derived from cytochrome b (cyt b) sequences with another based on amplified fragment length polymorphisms (AFLP) for 60 specimens of a monophyletic riverine species flock of mormyrid electric fishes collected in Gabon, west-central Africa. We analyze the aligned cyt b sequences by Wagner parsimony and AFLP data generated from 10 primer combinations using neighbor-joining from a Nei-Li distance matrix, Wagner parsimony, and Dollo parsimony. The different analysis methods yield AFLP tree topologies with few conflicting nodes. Recovered basal relationships in the group are similar between cyt b and AFLP analyses, but differ substantially at many of the more derived nodes. More of the clades recovered with the AFLP characters are consistent with the morphological characters used to designate operational taxonomic units in this group. These results support our hypothesis that the mitochondrial gene tree differs from the overall species phylogeny due at least in part to mitochondrial introgession among lineages. Mapping the two forms of electric organ found in this group onto the AFLP tree suggests that posteriorly innervated electrocytes with nonpenetrating stalks have independently evolved from anteriorly innervated, penetrating-stalk electrocytes at least three times.  相似文献   

17.
A comprehensive phylogeny of papilionoid legumes was inferred from sequences of 2228 taxa in GenBank release 147. A semiautomated analysis pipeline was constructed to download, parse, assemble, align, combine, and build trees from a pool of 11,881 sequences. Initial steps included all-against-all BLAST similarity searches coupled with assembly, using a novel strategy for building length-homogeneous primary sequence clusters. This was followed by a combination of global and local alignment protocols to build larger secondary clusters of locally aligned sequences, thus taking into account the dramatic differences in length of the heterogeneous coding and noncoding sequence data present in GenBank. Next, clusters were checked for the presence of duplicate genes and other potentially misleading sequences and examined for combinability with other clusters on the basis of taxon overlap. Finally, two supermatrices were constructed: a "sparse" matrix based on the primary clusters alone (1794 taxa x 53,977 characters), and a somewhat more "dense" matrix based on the secondary clusters (2228 taxa x 33,168 characters). Both matrices were very sparse, with 95% of their cells containing gaps or question marks. These were subjected to extensive heuristic parsimony analyses using deterministic and stochastic heuristics, including bootstrap analyses. A "reduced consensus" bootstrap analysis was also performed to detect cryptic signal in a subtree of the data set corresponding to a "backbone" phylogeny proposed in previous studies. Overall, the dense supermatrix appeared to provide much more satisfying results, indicated by better resolution of the bootstrap tree, excellent agreement with the backbone papilionoid tree in the reduced bootstrap consensus analysis, few problematic large polytomies in the strict consensus, and less fragmentation of conventionally recognized genera. Nevertheless, at lower taxonomic levels several problems were identified and diagnosed. A large number of methodological issues in supermatrix construction at this scale are discussed, including detection of annotation errors in GenBank sequences; the shortage of effective algorithms and software for local multiple sequence alignment; the difficulty of overcoming effects of fragmentation of data into nearly disjoint blocks in sparse supermatrices; and the lack of informative tools to assess confidence limits in very large trees.  相似文献   

18.
Reconstructing the evolution of complex bird song in the oropendolas   总被引:1,自引:0,他引:1  
The elaborate songs of songbirds are frequent models for investigating the evolution of animal signals. However, few previous studies have attempted to reconstruct historical changes in song evolution using a phylogenetic comparative approach. In particular, no comparative studies of bird song have used a large number of vocal characters and a well-supported, independently derived phylogeny. We identified 32 features in the complex vocal displays of male oropendolas (genera Psarocolius, Gymnostinops, and Ocyalus) that are relatively invariant within taxa and mapped these characters onto a robust molecular phylogeny of the group. Our analysis revealed that many aspects of oropendola song are surprisingly evolutionarily conservative and thus are potentially useful characters for reconstructing historical patterns. Of the characters that varied among taxa, nearly two thirds (19 of 29) showed no evidence of evolutionary convergence or reversal when mapped onto the tree, which was reflected in a high overall consistency index (CI = 0.78) and retention index (RI = 0.88). Some reconstructed patterns provided evidence of selection on these signals. For example, rapid divergence of the songs of the Montezuma oropendola, Gymnostinops montezuma, from those of closely related taxa suggests the recent influence of strong sexual selection. In general, our results provide insights into the mode of vocal evolution in songbirds and suggest that complex vocalizations can provide information about phylogeny. Based on this evidence, we use song characters to estimate the phylogenetic affinities of three oropendola taxa for which molecular data are not yet available.  相似文献   

19.
We present a cladistic analysis of the Cirripedia Thoracica using morphological characters and the Acrothoracica and Ascothoracida as outgroups. The list of characters comprised 32 shell and soft body features. The operational taxonomic units (OTUs) comprised 26 well-studied fossil and extant taxa, principally genera, since uncertainty about monophyly exists for most higher ranking taxonomic units. Parsimony analyses using PAUP 3.1.1 and Hennig86 produced 189 trees of assured minimal length. We also examined character evolution in the consensus trees using MacClade and Clados. The monophyly of the Balanomorpha and the Verrucomorpha sensu stricto is confirmed, and all trees featured a sister group relationship between the ‘living fossil Neoverruca and me Brachylepadomorpha. In the consensus trees the sequential progression of ‘pedunculate‘sister groups up to a node containing Neolepas also conforms to current views, but certain well-established taxa based solely on plesiomorphies stand out as paraphyletic, such as Pedunculata (= Lepadomorpha); Eolepadinae, Scalpellomorpha and Chthamaloidea. The 189 trees differed principally in the position of shell-less pedunculates, Neoverruca, the scalpelloid Capitulum, and the interrelationships within the Balanomorpha, although the 50% majority rule consensus tree almost fully resolved the latter. A monophyletic Sessilia comprising both Verrucomorpha and Balanomorpha appeared among the shortest trees, but not in the consensus. A tree with a monophyletic Verrucomorpha including Neoverruca had a tree length two steps longer than the consensus trees. Deletion of all extinct OTUs produced a radically different tree, which highlights the importance of fossils in estimating cirripede phylogeny. Mapping of our character set onto a manually constructed cladogram reflecting die most recent scenario of cirripede evolution resulted in a tree length five steps longer than any of our shortest trees. Our analysis reveals that several key questions in cirripede phylogeny remain unsolved, notably the position of shell-less forms and the transition from ‘pedunculate‘to ‘sessile‘barnacles. The inclusion of more fossil species at this point in our understanding of cirripede phylogeny will only result in even greater levels of uncertainty. When constructing the character list we also identified numerous uncertainties in the homology of traits commonly used in discussing cirripede evolution. Our study highlights larval ultrastructure, detailed studies of early ontogeny, and molecular data as the most promising areas for future research.  相似文献   

20.
We present a phylogenetic hypothesis and novel, rank-free classification for all extant species of softshell turtles (Testudines:Trionychidae). Our data set included DNA sequence data from two mitochondrial protein-coding genes and a approximately 1-kb nuclear intron for 23 of 26 recognized species, and 59 previously published morphological characters for a complimentary set of 24 species. The combined data set provided complete taxonomic coverage for this globally distributed clade of turtles, with incomplete data for a few taxa. Although our taxonomic sampling is complete, most of the modern taxa are representatives of old and very divergent lineages. Thus, due to biological realities, our sampling consists of one or a few representatives of several ancient lineages across a relatively deep phylogenetic tree. Our analyses of the combined data set converge on a set of well-supported relationships, which is in accord with many aspects of traditional softshell systematics including the monophyly of the Cyclanorbinae and Trionychinae. However, our results conflict with other aspects of current taxonomy and indicate that most of the currently recognized tribes are not monophyletic. We use this strong estimate of the phylogeny of softshell turtles for two purposes: (1) as the basis for a novel rank-free classification, and (2) to retrospectively examine strategies for analyzing highly homoplasious mtDNA data in deep phylogenetic problems where increased taxon sampling is not an option. Weeded and weighted parsimony, and model-based techniques, generally improved the phylogenetic performance of highly homoplasious mtDNA sequences, but no single strategy completely mitigated the problems of associated with these highly homoplasious data. Many deep nodes in the softshell turtle phylogeny were confidently recovered only after the addition of largely nonhomoplasious data from the nuclear intron.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号