首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The application of phylogenetic inference methods, to data for a set of independent genes sampled randomly throughout the genome, often results in substantial incongruence in the single-gene phylogenetic estimates. Among the processes known to produce discord between single-gene phylogenies, two of the best studied in a phylogenetic context are hybridization and incomplete lineage sorting. Much recent attention has focused on the development of methods for estimating species phylogenies in the presence of incomplete lineage sorting, but phylogenetic models that allow for hybridization have been more limited. Here we propose a model that allows incongruence in single-gene phylogenies to be due to both hybridization and incomplete lineage sorting, with the goal of determining the contribution of hybridization to observed gene tree incongruence in the presence of incomplete lineage sorting. Using our model, we propose methods for estimating the extent of the role of hybridization in both a likelihood and a Bayesian framework. The performance of our methods is examined using both simulated and empirical data.  相似文献   

2.
Phylogenies are fundamental to comparative biology as they help to identify independent events on which statistical tests rely. Two groups of phylogenetic comparative methods (PCMs) can be distinguished: those that take phylogenies into account by introducing explicit models of evolution and those that only consider phylogenies as a statistical constraint and aim at partitioning trait values into a phylogenetic component (phylogenetic inertia) and one or multiple specific components related to adaptive evolution. The way phylogenetic information is incorporated into the PCMs depends on the method used. For the first group of methods, phylogenies are converted into variance-covariance matrices of traits following a given model of evolution such as Brownian motion (BM). For the second group of methods, phylogenies are converted into distance matrices that are subsequently transformed into Euclidean distances to perform principal coordinate analyses. Here, we show that simply taking the elementwise square root of a distance matrix extracted from a phylogenetic tree ensures having a Euclidean distance matrix. This is true for any type of distances between species (patristic or nodal) and also for trees harboring multifurcating nodes. Moreover, we illustrate that this simple transformation using the square root imposes less geometric distortion than more complex transformations classically used in the literature such as the Cailliez method. Given the Euclidean nature of the elementwise square root of phylogenetic distance matrices, the positive semidefinitiveness of the phylogenetic variance-covariance matrix of a trait following a BM model, or related models of trait evolution, can be established. In that way, we build a bridge between the two groups of statistical methods widely used in comparative analysis. These results should be of great interest for ecologists and evolutionary biologists performing statistical analyses incorporating phylogenies.  相似文献   

3.
Most methods for phylogenetic tree reconstruction are based on sequence alignments; they infer phylogenies from substitutions that may have occurred at the aligned sequence positions. Gaps in alignments are usually not employed as phylogenetic signal. In this paper, we explore an alignment-free approach that uses insertions and deletions (indels) as an additional source of information for phylogeny inference. For a set of four or more input sequences, we generate so-called quartet blocks of four putative homologous segments each. For pairs of such quartet blocks involving the same four sequences, we compare the distances between the two blocks in these sequences, to obtain hints about indels that may have happened between the blocks since the respective four sequences have evolved from their last common ancestor. A prototype implementation that we call Gap-SpaM is presented to infer phylogenetic trees from these data, using a quartet-tree approach or, alternatively, under the maximum-parsimony paradigm. This approach should not be regarded as an alternative to established methods, but rather as a complementary source of phylogenetic information. Interestingly, however, our software is able to produce phylogenetic trees from putative indels alone that are comparable to trees obtained with existing alignment-free methods.  相似文献   

4.
5.
Restriction site‐associated DNA sequencing (RADseq) has emerged as a useful tool in systematics and population genomics. A common feature of RADseq data sets is that they contain missing data that arise from multiple sources including genealogical sampling bias, assembly methodology and sequencing error. Many RADseq studies have demonstrated that allowing sites (single nucleotide polymorphisms, SNPs) with missing data can increase support for phylogenetic hypotheses. Two non‐mutually exclusive explanations for this observation are that (a) larger data sets contain more phylogenetic information; and (b) excluding missing data disproportionally removes sites with the highest mutation rates, causing the exclusion of characters that are likely variable and informative. Using a RADseq data set derived from the East African banana frog, Afrixalus fornasini (up to 1.1 million SNPs), we found that missing data thresholds were positively correlated with the proportion of parsimony‐informative sites and mean branch support. Using three proxies for estimating site‐specific rate, we found that the most conservative missing data strategies excluded rapidly evolving sites, with four‐state sites present only when allowing ≥60% missing data per SNP. Topological similarity among estimated phylogenies was highest for the data sets with ≥60% missing data per SNP. Our results suggest that several desirable phylogenetic qualities were observed when allowing ≥60% missing data per SNP. However, at the highest missing data thresholds (80% and 90% missing data per SNP), we observed differences in performance between high‐ and mixed‐weight DNA extraction samples, which may indicate there are trade‐offs to consider when using degraded genomic template with RADseq protocols.  相似文献   

6.
MOTIVATION: The problem of phylogenetic inference from datasets including incomplete or uncertain entries is among the most relevant issues in systematic biology. In this paper, we propose a new method for reconstructing phylogenetic trees from partial distance matrices. The new method combines the usage of the four-point condition and the ultrametric inequality with a weighted least-squares approximation to solve the problem of missing entries. It can be applied to infer phylogenies from evolutionary data including some missing or uncertain information, for instance, when observed nucleotide or protein sequences contain gaps or missing entries. RESULTS: In a number of simulations involving incomplete datasets, the proposed method outperformed the well-known Ultrametric and Additive procedures. Generally, the new method also outperformed all the other competing approaches including Triangle and Fitch which is the most popular least-squares method for reconstructing phylogenies. We illustrate the usefulness of the introduced method by analyzing two well-known phylogenies derived from complete mammalian mtDNA sequences. Some interesting theoretical results concerning the NP-hardness of the ordinary and weighted least-squares fitting of a phylogenetic tree to a partial distance matrix are also established. AVAILABILITY: The T-Rex package including this method is freely available for download at http://www.info.uqam.ca/~makarenv/trex.html  相似文献   

7.
JJ Wiens  J Tiu 《PloS one》2012,7(8):e42925

Background

Phylogenies are essential to many areas of biology, but phylogenetic methods may give incorrect estimates under some conditions. A potentially common scenario of this type is when few taxa are sampled and terminal branches for the sampled taxa are relatively long. However, the best solution in such cases (i.e., sampling more taxa versus more characters) has been highly controversial. A widespread assumption in this debate is that added taxa must be complete (no missing data) in order to save analyses from the negative impacts of limited taxon sampling. Here, we evaluate whether incomplete taxa can also rescue analyses under these conditions (empirically testing predictions from an earlier simulation study).

Methodology/Principal Findings

We utilize DNA sequence data from 16 vertebrate species with well-established phylogenetic relationships. In each replicate, we randomly sample 4 species, estimate their phylogeny (using Bayesian, likelihood, and parsimony methods), and then evaluate whether adding in the remaining 12 species (which have 50, 75, or 90% of their data replaced with missing data cells) can improve phylogenetic accuracy relative to analyzing the 4 complete taxa alone. We find that in those cases where sampling few taxa yields an incorrect estimate, adding taxa with 50% or 75% missing data can frequently (>75% of relevant replicates) rescue Bayesian and likelihood analyses, recovering accurate phylogenies for the original 4 taxa. Even taxa with 90% missing data can sometimes be beneficial.

Conclusions

We show that adding taxa that are highly incomplete can improve phylogenetic accuracy in cases where analyses are misled by limited taxon sampling. These surprising empirical results confirm those from simulations, and show that the benefits of adding taxa may be obtained with unexpectedly small amounts of data. These findings have important implications for the debate on sampling taxa versus characters, and for studies attempting to resolve difficult phylogenetic problems.  相似文献   

8.
Species trees have traditionally been inferred from a few selected markers, and genome‐wide investigations remain largely restricted to model organisms or small groups of species for which sampling of fresh material is available, leaving out most of the existing and historical species diversity. The genomes of an increasing number of species, including specimens extracted from natural history collections, are being sequenced at low depth. While these data sets are widely used to analyse organelle genomes, the nuclear fraction is generally ignored. Here we evaluate different reference‐based methods to infer phylogenies of large taxonomic groups from such data sets. Using the example of the Oleeae tribe, a worldwide‐distributed group, we build phylogenies based on single nucleotide polymorphisms (SNPs) obtained using two reference genomes (the olive and ash trees). The inferred phylogenies are overall congruent, yet present differences that might reflect the effect of distance to the reference on the amount of missing data. To limit this issue, genome complexity was reduced by using pairs of orthologous coding sequences as the reference, thus allowing us to combine SNPs obtained using two distinct references. Concatenated and coalescence trees based on these combined SNPs suggest events of incomplete lineage sorting and/or hybridization during the diversification of this large phylogenetic group. Our results show that genome‐wide phylogenetic trees can be inferred from low‐depth sequence data sets for eukaryote groups with complex genomes, and histories of reticulate evolution. This opens new avenues for large‐scale phylogenomics and biogeographical analyses covering both the extant and the historical diversity stored in museum collections.  相似文献   

9.
The use of phylogenetic comparative methods in ecological research has advanced during the last twenty years, mainly due to accurate phylogenetic reconstructions based on molecular data and computational and statistical advances. We used phylogenetic correlograms and phylogenetic eigenvector regression (PVR) to model body size evolution in 35 worldwide Felidae (Mammalia, Carnivora) species using two alternative phylogenies and published body size data. The purpose was not to contrast the phylogenetic hypotheses but to evaluate how analyses of body size evolution patterns can be affected by the phylogeny used for comparative analyses (CA). Both phylogenies produced a strong phylogenetic pattern, with closely related species having similar body sizes and the similarity decreasing with increasing distances in time. The PVR explained 65% to 67% of body size variation and all Moran's I values for the PVR residuals were non-significant, indicating that both these models explained phylogenetic structures in trait variation. Even though our results did not suggest that any phylogeny can be used for CA with the same power, or that "good" phylogenies are unnecessary for the correct interpretation of the evolutionary dynamics of ecological, biogeographical, physiological or behavioral patterns, it does suggest that developments in CA can, and indeed should, proceed without waiting for perfect and fully resolved phylogenies.  相似文献   

10.
Evolutionary biologists have adopted simple likelihood models for purposes of estimating ancestral states and evaluating character independence on specified phylogenies; however, for purposes of estimating phylogenies by using discrete morphological data, maximum parsimony remains the only option. This paper explores the possibility of using standard, well-behaved Markov models for estimating morphological phylogenies (including branch lengths) under the likelihood criterion. An important modification of standard Markov models involves making the likelihood conditional on characters being variable, because constant characters are absent in morphological data sets. Without this modification, branch lengths are often overestimated, resulting in potentially serious biases in tree topology selection. Several new avenues of research are opened by an explicitly model-based approach to phylogenetic analysis of discrete morphological data, including combined-data likelihood analyses (morphology + sequence data), likelihood ratio tests, and Bayesian analyses.  相似文献   

11.
Recent reviews of the construction of large phylogenies have focused on supertree methods that involve separate analyses of data sets and subsequent integration of the resulting trees. Here, we consider the alternative method of analyzing all character data simultaneously. Such 'supermatrix' analyses use information from each character directly and enable straightforward incorporation of diverse kinds of data, including characters from fossils. The approach has been extended by the development of new methods, including model-based techniques for analyzing heterogeneous data and hierarchical methods for constructing extremely large trees. Recent work also suggests that the problem of missing data in supermatrix analyses has been overstated. Although the supermatrix approach is not suited for all cases, we suggest that its inherent strengths will ensure that it will continue to have a central role in inferring large phylogenetic trees from diverse data.  相似文献   

12.
The statistical estimation of phylogenies is always associated with uncertainty, and accommodating this uncertainty is an important component of modern phylogenetic comparative analysis. The birth–death polytomy resolver is a method of accounting for phylogenetic uncertainty that places missing (unsampled) taxa onto phylogenetic trees, using taxonomic information alone. Recent studies of birds and mammals have used this approach to generate pseudoposterior distributions of phylogenetic trees that are complete at the species level, even in the absence of genetic data for many species. Many researchers have used these distributions of phylogenies for downstream evolutionary analyses that involve inferences on phenotypic evolution, geography, and community assembly. I demonstrate that the use of phylogenies constructed in this fashion is inappropriate for many questions involving traits. Because species are placed on trees at random with respect to trait values, the birth–death polytomy resolver breaks down natural patterns of trait phylogenetic structure. Inferences based on these trees are predictably and often drastically biased in a direction that depends on the underlying (true) pattern of phylogenetic structure in traits. I illustrate the severity of the phenomenon for both continuous and discrete traits using examples from a global bird phylogeny.  相似文献   

13.
We report on new techniques we have developed for reconstructing phylogenies on whole genomes. Our mathematical techniques include new polynomial-time methods for bounding the inversion length of a candidate tree and new polynomial-time methods for estimating genomic distances which greatly improve the accuracy of neighbor-joining analyses. We demonstrate the power of these techniques through an extensive performance study based on simulating genome evolution under a wide range of model conditions. Combining these new tools with standard approaches (fast reconstruction with neighbor-joining, exploration of all possible refinements of strict consensus trees, etc.) has allowed us to analyze datasets that were previously considered computationally impractical. In particular, we have conducted a complete phylogenetic analysis of a subset of the Campanulaceae family, confirming various conjectures about the relationships among members of the subset and about the principal mechanism of evolution for their chloroplast genome. We give representative results of the extensive experimentation we conducted on both real and simulated datasets in order to validate and characterize our approaches. We find that our techniques provide very accurate reconstructions of the true tree topology even when the data are generated by processes that include a significant fraction of transpositions and when the data are close to saturation.  相似文献   

14.
A recurring methodological problem in the evaluation of the predictive validity of selection methods is that the values of the criterion variable are available for selected applicants only. This so-called range restriction problem causes biased population estimates. Correction methods for direct and indirect range restriction scenarios have widely studied for continuous criterion variables but not for dichotomous ones. The few existing approaches are inapplicable because they do not consider the unknown base rate of success. Hence, there is a lack of scientific research on suitable correction methods and the systematic analysis of their accuracies in the cases of a naturally or artificially dichotomous criterion. We aim to overcome this deficiency by viewing the range restriction problem as a missing data mechanism. We used multiple imputation by chained equations to generate complete criterion data before estimating the predictive validity and the base rate of success. Monte Carlo simulations were conducted to investigate the accuracy of the proposed correction in dependence of selection ratio, predictive validity, and base rate of success in an experimental design. In addition, we compared our proposed missing data approach with Thorndike’s well-known correction formulas that have only been used in the case of continuous criterion variables so far. The results show that the missing data approach is more accurate in estimating the predictive validity than Thorndike’s correction formulas. The accuracy of our proposed correction increases as the selection ratio and the correlation between predictor and criterion increase. Furthermore, the missing data approach provides a valid estimate of the unknown base rate of success. On the basis of our findings, we argue for the use of multiple imputation by chained equations in the evaluation of the predictive validity of selection methods when the criterion is dichotomous.  相似文献   

15.
Standard methods of phylogenetic reconstruction are based on models that assume homogeneity of nucleotide composition among taxa. However, this assumption is often violated in biological data sets. In this study, we examine possible effects of nucleotide heterogeneity among lineages on the phylogenetic reconstruction of a bacterial group that spans a wide range of genomic nucleotide contents: obligately endosymbiotic bacteria and free-living or commensal species in the gamma-Proteobacteria. We focus on AT-rich primary endosymbionts to better understand the origins of obligately intracellular lifestyles. Previous phylogenetic analyses of this bacterial group point to the importance of accounting for base compositional variation in estimating relationships, particularly between endosymbiotic and free-living taxa. Here, we develop an approach to compare susceptibility of various phylogenetic reconstruction methods to the effects of nucleotide heterogeneity. First, we identify candidate trees of gamma-Proteobacteria groEL and 16S rRNA using approaches that assume homogeneous and stationary base composition, including Bayesian, maximum likelihood, parsimony, and distance methods. We then create permutations of the resulting candidate trees by varying the placement of the AT-rich endosymbiont Buchnera. These permutations are evaluated under the nonhomogeneous and nonstationary maximum likelihood model of Galtier and Gouy, which allows equilibrium base content to vary among examined lineages. Our results show that commonly used phylogenetic methods produce incongruent trees of the Enterobacteriales, and that the placement of Buchnera is especially unstable. However, under a nonhomogeneous model, various groEL and 16S rRNA phylogenies that separate Buchnera from other AT-rich endosymbionts (Blochmannia and Wigglesworthia) have consistently and significantly higher likelihood scores. Blochmannia and Wigglesworthia appear to have evolved from secondary endosymbionts, and represent an origin of primary endosymbiosis that is independent from Buchnera. This application of a nonhomogeneous model offers a computationally feasible way to test specific phylogenetic hypotheses for taxa with heterogeneous and nonstationary base composition.  相似文献   

16.
phangorn: phylogenetic analysis in R   总被引:4,自引:0,他引:4  
SUMMARY: phangorn is a package for phylogenetic reconstruction and analysis in the R language. Previously it was only possible to estimate phylogenetic trees with distance methods in R. phangorn, now offers the possibility of reconstructing phylogenies with distance based methods, maximum parsimony or maximum likelihood (ML) and performing Hadamard conjugation. Extending the general ML framework, this package provides the possibility of estimating mixture and partition models. Furthermore, phangorn offers several functions for comparing trees, phylogenetic models or splits, simulating character data and performing congruence analyses. AVAILABILITY: phangorn can be obtained through the CRAN homepage http://cran.r-project.org/web/packages/phangorn/index.html. phangorn is licensed under GPL 2.  相似文献   

17.
A phylogeny of the Platyhelminthes: towards a total-evidence solution   总被引:1,自引:1,他引:0  
Littlewood  D. T. J.  Bray  R. A.  Clough  K. A. 《Hydrobiologia》1998,383(1-3):155-160
We advocate a total-evidence approach for the reconstruction of working phylogenies for the Turbellaria and the phylum Platyhelminthes. Few morphology-based character matrices are available in the systematic literature concerning flatworms, and molecular-based phylogenies are rapidly providing the only means by which we can estimate phylogenies cladistically. Character matrices based on gross morphology and ultrastructure are required and should be internally consistent, i.e. character coding should follow a set of a priori guidelines and character duplication and contradiction is avoided. In order to test our molecular phylogenies we need complementary data sets from morphology. To understand morphological homology we need phylogenetic evidence from independent (e.g. molecular) data. Fully complementary morphological and molecular data sets enable us to validate phylogenetic hypotheses and the combination of these sets in phylogenetic reconstruction utilises all statements of homology. Working phylogenies which include all phylogenetic information not only shed light on individual character evolution, but form a strong basis for comparative studies investigating the origin and evolutionary radiation of the taxonomic group under scrutiny. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

18.
We present a method for estimating the most general reversible substitution matrix corresponding to a given collection of pairwise aligned DNA sequences. This matrix can then be used to calculate evolutionary distances between pairs of sequences in the collection. If only two sequences are considered, our method is equivalent to that of Lanave et al. (1984). The main novelty of our approach is in combining data from different sequence pairs. We describe a weighting method for pairs of taxa related by a known tree that results in uniform weights for all branches. Our method for estimating the rate matrix results in fast execution times, even on large data sets, and does not require knowledge of the phylogenetic relationships among sequences. In a test case on a primate pseudogene, the matrix we arrived at resembles one obtained using maximum likelihood, and the resulting distance measure is shown to have better linearity than is obtained in a less general model.  相似文献   

19.
Commonly used methods for inferring phylogenies were designed before the emergence of high-throughput sequencing and can generally not accommodate the challenges associated with noisy, diploid sequencing data. In many applications, diploid genomes are still treated as haploid through the use of ambiguity characters; while the uncertainty in genotype calling—arising as a consequence of the sequencing technology—is ignored. In order to address this problem, we describe two new probabilistic approaches for estimating genetic distances: distAngsd-geno and distAngsd-nuc, both implemented in a software suite named distAngsd. These methods are specifically designed for next-generation sequencing data, utilize the full information from the data, and take uncertainty in genotype calling into account. Through extensive simulations, we show that these new methods are markedly more accurate and have more stable statistical behaviors than other currently available methods for estimating genetic distances—even for very low depth data with high error rates.  相似文献   

20.
Species-specific obligate pollination mutualism between Glochidion trees (Euphorbiaceae) and Epicephala moths (Gracillariidae) involves a large number of interacting species and resembles the classically known fig-fig wasp and yucca-yucca moth associations. To assess the extent of parallel cladogenesis in Glochidion-Epicephala association, we reconstruct phylogenetic relationships of 18 species of Glochidion using nuclear ribosomal DNA sequences (internal and external transcribed spacers) and those of the corresponding 18 Epicephala species using mitochondrial (the cytochrome oxidase subunit I gene) and nuclear DNA sequences (the arginine kinase and elongation factor-1alpha genes). Based on the obtained phylogenies, we determine whether Glochidion and Epicephala have undergone parallel diversification using several different methods for investigating the level of cospeciation between phylogenies. These tests indicate that there is generally a greater degree of correlation between Glochidion and Epicephala phylogenies than expected in a random association, but the results are sensitive to selection of different phylogenetic hypotheses and analytical methods for evaluating cospeciation. Perfect congruence between phylogenies is not found in this association, which likely resulted from host shift by the moths. The observed significant discrepancy between Glochidion and Epicephala phylogenies implies that the one-to-one specificity between the plants and moths has been maintained through a complex speciation process or that there is an underestimated diversity of association between Glochidion trees and Epicephala moths.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号