首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The effect of missing data on phylogenetic methods is a potentially important issue in our attempts to reconstruct the Tree of Life. If missing data are truly problematic, then it may be unwise to include species in an analysis that lack data for some characters (incomplete taxa) or to include characters that lack data for some species. Given the difficulty of obtaining data from all characters for all taxa (e.g., fossils), missing data might seriously impede efforts to reconstruct a comprehensive phylogeny that includes all species. Fortunately, recent simulations and empirical analyses suggest that missing data cells are not themselves problematic, and that incomplete taxa can be accurately placed as long as the overall number of characters in the analysis is large. However, these studies have so far only been conducted on parsimony, likelihood, and neighbor-joining methods. Although Bayesian phylogenetic methods have become widely used in recent years, the effects of missing data on Bayesian analysis have not been adequately studied. Here, we conduct simulations to test whether Bayesian analyses can accurately place incomplete taxa despite extensive missing data. In agreement with previous studies of other methods, we find that Bayesian analyses can accurately reconstruct the position of highly incomplete taxa (i.e., 95% missing data), as long as the overall number of characters in the analysis is large. These results suggest that highly incomplete taxa can be safely included in many Bayesian phylogenetic analyses.  相似文献   

2.
Rooted phylogenetic trees constructed from different datasets (e.g. from different genes) are often conflicting with one another, i.e. they cannot be integrated into a single phylogenetic tree. Phylogenetic networks have become an important tool in molecular evolution, and rooted phylogenetic networks are able to represent conflicting rooted phylogenetic trees. Hence, the development of appropriate methods to compute rooted phylogenetic networks from rooted phylogenetic trees has attracted considerable research interest of late. The CASS algorithm proposed by van Iersel et al. is able to construct much simpler networks than other available methods, but it is extremely slow, and the networks it constructs are dependent on the order of the input data. Here, we introduce an improved CASS algorithm, BIMLR. We show that BIMLR is faster than CASS and less dependent on the input data order. Moreover, BIMLR is able to construct much simpler networks than almost all other methods. BIMLR is available at http://nclab.hit.edu.cn/wangjuan/BIMLR/.  相似文献   

3.
The determination of whether the pattern of trait evolution observed in a comparative analysis of species data is due to adaptation to current environments, to phylogenetic inertia, or to both of these forces requires that one control for the effects of either force when making an assessment of the evolutionary role of the other. Orzack and Sober (2001) developed the method of controlled comparisons to make such assessments; their implementation of the method focussed on a discretely varying trait. Here, we show that the method of controlled comparisons can be viewed as a meta-method, which can be implemented in many ways. We discuss which recent methods for the comparative analysis of continuously distributed traits can generate controlled comparisons and can thereby be used to properly assess whether current adaptation and/or phylogenetic inertia have influenced a trait's evolution. The implementation of controlled comparisons is illustrated by an analysis of sex-ratio data for fig wasps. This analysis suggests that current adaptation and phylogenetic inertia influence this trait.  相似文献   

4.
Phylogenetic methods for the analysis of species data are widely used in evolutionary studies. However, preliminary data transformations and data reduction procedures (such as a size‐correction and principal components analysis, PCA) are often performed without first correcting for nonindependence among the observations for species. In the present short comment and attached R and MATLAB code, I provide an overview of statistically correct procedures for phylogenetic size‐correction and PCA. I also show that ignoring phylogeny in preliminary transformations can result in significantly elevated variance and type I error in our statistical estimators, even if subsequent analysis of the transformed data is performed using phylogenetic methods. This means that ignoring phylogeny during preliminary data transformations can possibly lead to spurious results in phylogenetic statistical analyses of species data.  相似文献   

5.
类群取样与系统发育分析精确度之探索   总被引:6,自引:2,他引:4  
Appropriate and extensive taxon sampling is one of the most important determinants of accurate phylogenetic estimation. In addition, accuracy of inferences about evolutionary processes obtained from phylogenetic analyses is improved significantly by thorough taxon sampling efforts. Many recent efforts to improve phylogenetic estimates have focused instead on increasing sequence length or the number of overall characters in the analysis, and this often does have a beneficial effect on the accuracy of phylogenetic analyses. However, phylogenetic analyses of few taxa (but each represented by many characters) can be subject to strong systematic biases, which in turn produce high measures of repeatability (such as bootstrap proportions) in support of incorrect or misleading phylogenetic results. Thus, it is important for phylogeneticists to consider both the sampling of taxa, as well as the sampling of characters, in designing phylogenetic studies. Taxon sampling also improves estimates of evolutionary parameters derived from phylogenetic trees, and is thus important for improved applications of phylogenetic analyses. Analysis of sensitivity to taxon inclusion, the possible effects of long-branch attraction, and sensitivity of parameter estimation for model-based methods should be a part of any careful and thorough phylogenetic analysis. Furthermore, recent improvements in phylogenetic algorithms and in computational power have removed many constraints on analyzing large, thoroughly sampled data sets. Thorough taxon sampling is thus one of the most practical ways to improve the accuracy of phylogenetic estimates, as well as the accuracy of biological inferences that are based on these phylogenetic trees.  相似文献   

6.
The ability to generate large molecular datasets for phylogenetic studies benefits biologists, but such data expansion introduces numerous analytical problems. A typical molecular phylogenetic study implicitly assumes that sequences evolve under stationary, reversible and homogeneous conditions, but this assumption is often violated in real datasets. When an analysis of large molecular datasets results in unexpected relationships, it often reflects violation of phylogenetic assumptions, rather than a correct phylogeny. Molecular evolutionary phenomena such as base compositional heterogeneity and among‐site rate variation are known to affect phylogenetic inference, resulting in incorrect phylogenetic relationships. The ability of methods to overcome such bias has not been measured on real and complex datasets. We investigated how base compositional heterogeneity and among‐site rate variation affect phylogenetic inference in the context of a mitochondrial genome phylogeny of the insect order Coleoptera. We show statistically that our dataset is affected by base compositional heterogeneity regardless of how the data are partitioned or recoded. Among‐site rate variation is shown by comparing topologies generated using models of evolution with and without a rate variation parameter in a Bayesian framework. When compared for their effectiveness in dealing with systematic bias, standard phylogenetic methods tend to perform poorly, and parsimony without any data transformation performs worst. Two methods designed specifically to overcome systematic bias, LogDet and a Bayesian method implementing variable composition vectors, can overcome some level of base compositional heterogeneity, but are still affected by among‐site rate variation. A large degree of variation in both noise and phylogenetic signal among all three codon positions is observed. We caution and argue that more data exploration is imperative, especially when many genes are included in an analysis.  相似文献   

7.
Abstract This study is concerned with statistical methods used for the analysis of comparative data (in which observations are not expected to be independent because they are sampled across phylogenetically related species). The phylogenetically independent contrasts (PIC), phylogenetic generalized least‐squares (PGLS), and phylogenetic autocorrelation (PA) methods are compared. Although the independent contrasts are not orthogonal, they are independent if the data conform to the Brownian motion model of evolution on which they are based. It is shown that uncentered correlations and regressions through the origin using the PIC method are identical to those obtained using PGLS with an intercept included in the model. The PIC method is a special case of PGLS. Corrected standard errors are given for estimates of the ancestral states based on the PGLS approach. The treatment of trees with hard polytomies is discussed and is shown to be an algorithmic rather than a statistical problem. Some of the relationships among the methods are shown graphically using the multivariate space in which variables are represented as vectors with respect to OTUs used as coordinate axes. The maximum‐likelihood estimate of the autoregressive parameter, ρ, has not been computed correctly in previous studies (an appendix with MATLAB code provides a corrected algorithm). The importance of the eigenvalues and eigenvectors of the connection matrix, W, for the distribution of ρ is discussed. The PA method is shown to have several problems that limit its usefulness in comparative studies. Although the PA method is a generalized least‐squares procedure, it cannot be made equivalent to the PGLS method using a phylogenetic model.  相似文献   

8.
Molecular sequences provide a rich source of data for inferring the phylogenetic relationships among species. However, recent work indicates that even an accurate multiple alignment of a large sequence set may yield an incorrect phylogeny and that the quality of the phylogenetic tree improves when the input consists only of the highly conserved, motif regions of the alignment. This work introduces two methods of producing multiple alignments that include only the conserved regions of the initial alignment. The first method retains conserved motifs, whereas the second retains individual conserved sites in the initial alignment. Using parsimony analysis on a mitochondrial data set containing 19 species among which the phylogenetic relationships are widely accepted, both conserved alignment methods produce better phylogenetic trees than the complete alignment. Unlike any of the 19 inference methods used before to analyze this data, both methods produce trees that are completely consistent with the known phylogeny. The motif-based method employs far fewer alignment sites for comparable error rates. For a larger data set containing mitochondrial sequences from 39 species, the site-based method produces a phylogenetic tree that is largely consistent with known phylogenetic relationships and suggests several novel placements. J. Exp. Zool. ( Mol. Dev. Evol.) 285:128-139, 1999.  相似文献   

9.
On gaps.   总被引:4,自引:0,他引:4  
Gaps result from the alignment of sequences of unequal length during primary homology assessment. Viewed as character states originating from particular biological events (mutation), gaps contain historical information suitable for phylogenetic analysis. The effect of gaps as a source of phylogenetic data is explored via sensitivity analysis and character congruence among different data partitions. Example data sets are provided to show that gaps contain important phylogenetic information not recovered by those methods that omit gaps in their calculations. However, gap cost schemes are arbitrary (although they must be explicit) and thus data exploration is a necessity of molecular analyses, while character congruence is necessary as an external criterion for hypothesis decision.  相似文献   

10.
Abstract: Random amplified polymorphic DNA (RAPD) analysis was used to study the phylogenetic relationships between species in Allium section Schoenoprasum and for the investigation of the intraspecific differentiation of A. schoenoprasum. RAPD analysis of 39 samples representing eight species of sect. Schoenoprasum and one sample of A. atrosanguineum (sect. Annuloprasum ) resulted in 233 interpretable RAPD bands. The analysis clearly distinguishes the species of section Schoenoprasum. The arrangement of the accessions of A. schoenoprasum in all dendrograms mirrors the geographical distribution, with a clear differentiation between an Asian and European subgroup. Within the European group, Scandinavian material is clearly distinct from S and E European material. Informally described morphological types of A. schoenoprasum could not be confirmed by RAPD analysis but represent recurrent ecological adaptations. A combination of phenetic (UPGMA, neighbour-joining analysis), cladistic (parsimony analysis), and statistical (PCA) methods of data analysis resulted in clearer phylogenetic interpretations than each of the methods facilitates when used separately.  相似文献   

11.
The adequacy and utility of behavioural characters in phylogenetics is widely acknowledged, especially for stereotyped behaviours. However, the most common behaviours are not stereotyped, and these are usually seen as inappropriate or more difficult to analyze in a phylogenetic context. A few methods have been proposed to deal with such data, although they have never been tested on samples larger than six species, which limits their evolutionary interest. In the present study, we perform behavioural observations on 13 cockroach species and derive behavioural phylogenetic characters with the successive event‐pairing method. We combine these characters with morphological and molecular data (approximately 6800 bp) in a phylogenetic study of 41 species. We then reconstruct ancestral states of the behavioural data to study evolution of social behaviour in these insects with regard to their social systems (i.e. solitary, gregarious, and subsocial) and diversity of habitat choice. We report for the first time that nonstereotyped behavioural data are adequate for phylogenetic analyses: they are no more homoplastic than traditional data, and support several phylogenetic relationships that we discuss. From an evolutionary perspective, we show that the solitary species Thanatophyllum akinetum does not display original behavioural interactions, suggesting phylogenetic inertia of interactive behaviours despite a radical change in social structure. Conversely, the subsocial species Parasphaeria boleiriana shows original behavioural interactions, which could result from its peculiar social system or habitat. We conclude that phylogenetic approaches in studies of behaviour are useful for deciphering evolution of behaviour and discriminating between its different modalities, even for nonstereotyped characters. © 2013 The Linnean Society of London, Biological Journal of the Linnean Society, 2014, 111 , 58–77.  相似文献   

12.
A method is proposed to conduct phylogenetic analyses of comparative or interspecific data when the true phylogeny is not known. Standard models of speciation and/or extinction or other methods are used to generate a sample from the set of all possible phylogenies for the measured species. The comparative data are then analyzed on each of the possible trees to obtain a distribution of possible evolutionary statistics for these data. The mean of this distribution is proposed as a reasonable estimate of the true evolutionary statistic of interest. Ways of obtaining confidence intervals and of developing hypothesis tests for this mean statistic are also proposed. The method can be used with any comparative method or phylogenetic analysis technique when phylogenetic relationships among species are not known or when branch lengths for a phylogeny in units of expected character change (as required by most methods) are not available. Computer programs to conduct the analyses are available on request.  相似文献   

13.
Putative apomorphic character states are the only relevant phylogenetic signal contained in sets of sequence data. Using the sequence position as a character, a way to identify putative apomorphies prior to phylogenetic analysis is proposed. It is shown that distance-matrix methods use trivial characters. The concept of the asymmetrical split is presented for determination of character polarity. It is furthermore argued that groundpatterns (node sequences) should be reconstructed prior to the study of relationships between taxa of high phylogenetic age. The 'evolutionary noise'contained in groundpatterns can be illustrated with a network of distances using a split-decomposition analysis.  相似文献   

14.
Systematists have access to multiple sources of character information in phylogenetic analysis. For example, it is not unusual to have nucleotide sequences from several different genes, or to have molecular and morphological data. How should diverse data be analyzed in phylogenetic analysis? Several methods have been proposed for the treatment of partitioned data: the total evidence, separate analysis, and conditional combination approaches. Here, we review some of the advantages and disadvantages of the different approaches, with special concentration on which methods help us to discern the evolutionary process and provide the most accurate estimates of phylogeny.  相似文献   

15.
It has been argued that continuous characteristics should be excluded from cladistic analysis for two reasons: because the data are considered inappropriate; and because the methods for the conversion of these data into codes are considered arbitrary. Metric data, however, fulfill the sole criterion for inclusion in phylogenetic analysis, the presence of homologous character states, and thus cannot be excluded as a class of data. The second line of reasoning, that coding methods are arbitrary, applies to gap and segment coding, but quantitative data can be coded in a nonarbitrary manner by means of tests of statistical significance. These procedures, which are both objective and repeatable, determine the probability that two taxa possess an homologous character state; that is, if they have inherited a particular central tendency and distribution of individual variates unchanged from a common ancestor. Thus, the application of statistical tests to quantitative data empirically detects the presence of evolu tionary change, the raw material of phylogenetic reconstruction.  相似文献   

16.
Allozyme data are widely used to infer the phylogenies of populations and closely-related species. Numerous parsimony, distance, and likelihood methods have been proposed for phylogenetic analysis of these data; the relative merits of these methods have been debated vigorously, but their accuracy has not been well explored. In this study, I compare the performance of 13 phylogenetic methods (six parsimony, six distance, and continuous maximum likelihood) by applying a congruence approach to eight allozyme data sets from the literature. Clades are identified that are supported by multiple data sets other than allozymes (e.g. morphology, DNA sequences), and the ability of different methods to recover these 'known' clades is compared. The results suggest that (1) distance and likelihood methods generally outperform parsimony methods, (2) methods that utilize frequency data tend to perform well, and (3) continuous maximum likelihood is among the most accurate methods, and appears to be robust to violations of its assumptions. These results are in agreement with those from recent simulation studies, and help provide a basis for empirical workers to choose among the many methods available for analysing allozyme characters.  相似文献   

17.
Phylogenetic comparative methods have become a standard statistical approach for analysing interspecific data, under the assumption that traits of species are more similar than expected by chance (i.e. phylogenetic signal is present). Here I test for phylogenetic signal in intraspecific body size datasets to evaluate whether intraspecific datasets may require phylogenetic analysis. I also compare amounts of phylogenetic signal in intraspecific and interspecific body size datasets. Some intraspecific body size datasets contain significant phylogenetic signal. Detection of significant phylogenetic signal was dependant upon the number of populations (n) and the amount of phylogenetic signal (K) for a given dataset. Amounts of phylogenetic signal do not differ between intraspecific and interspecific datasets. Further, relationships between significance of phylogenetic signal and sample size and amount of phylogenetic signal are similar for intraspecific and interspecific datasets. Thus, intraspecific body size datasets are similar to interspecific body size datasets with respect to phylogenetic signal. Whether these results are general for all characters requires further study.  相似文献   

18.
Microbial ecology research is currently driven by the continuously decreasing cost of DNA sequencing and the improving accuracy of data analysis methods. One such analysis method is phylogenetic placement, which establishes the phylogenetic identity of the anonymous environmental sequences in a sample by means of a given phylogenetic reference tree. However, assessing the diversity of a sample remains challenging, as traditional methods do not scale well with the increasing data volumes and/or do not leverage the phylogenetic placement information. Here, we present scrapp , a highly parallel and scalable tool that uses a molecular species delimitation algorithm to quantify the diversity distribution over the reference phylogeny for a given phylogenetic placement of the sample. scrapp employs a novel approach to cluster phylogenetic placements, called placement space clustering, to efficiently perform dimensionality reduction, so as to scale on large data volumes. Furthermore, it uses the phylogeny‐aware molecular species delimitation method mPTP to quantify diversity. We evaluated scrapp using both, simulated and empirical data sets. We use simulated data to verify our approach. Tests on an empirical data set show that scrapp ‐derived metrics can classify samples by their diversity‐correlated features equally well or better than existing, commonly used approaches. scrapp is available at https://github.com/pbdas/scrapp .  相似文献   

19.
We propose a new method to estimate and correct for phylogenetic inertia in comparative data analysis. The method, called phylogenetic eigenvector regression (PVR) starts by performing a principal coordinate analysis on a pairwise phylogenetic distance matrix between species. Traits under analysis are regressed on eigenvectors retained by a broken-stick model in such a way that estimated values express phylogenetic trends in data and residuals express independent evolution of each species. This partitioning is similar to that realized by the spatial autoregressive method, but the method proposed here overcomes the problem of low statistical performance that occurs with autoregressive method when phylogenetic correlation is low or when sample size is too small to detect it. Also, PVR is easier to perform with large samples because it is based on well-known techniques of multivariate and regression analyses. We evaluated the performance of PVR and compared it with the autoregressive method using real datasets and simulations. A detailed worked example using body size evolution of Carnivora mammals indicated that phylogenetic inertia in this trait is elevated and similarly estimated by both methods. In this example, Type I error at α = 0.05 of PVR was equal to 0.048, but an increase in the number of eigenvectors used in the regression increases the error. Also, similarity between PVR and the autoregressive method, defined by correlation between their residuals, decreased by overestimating the number of eigenvalues necessary to express the phylogenetic distance matrix. To evaluate the influence of cladogram topology on the distribution of eigenvalues extracted from the double-centered phylogenetic distance matrix, we analyzed 100 randomly generated cladograms (up to 100 species). Multiple linear regression of log transformed variables indicated that the number of eigenvalues extracted by the broken-stick model can be fully explained by cladogram topology. Therefore, the broken-stick model is an adequate criterion for determining the correct number of eigenvectors to be used by PVR. We also simulated distinct levels of phylogenetic inertia by producing a trend across 10, 25, and 50 species arranged in “comblike” cladograms and then adding random vectors with increased residual variances around this trend. In doing so, we provide an evaluation of the performance of both methods with data generated under different evolutionary models than tested previously. The results showed that both PVR and autoregressive method are efficient in detecting inertia in data when sample size is relatively high (more than 25 species) and when phylogenetic inertia is high. However, PVR is more efficient at smaller sample sizes and when level of phylogenetic inertia is low. These conclusions were also supported by the analysis of 10 real datasets regarding body size evolution in different animal clades. We concluded that PVR can be a useful alternative to an autoregressive method in comparative data analysis.  相似文献   

20.
The merging of community ecology and phylogenetic biology   总被引:2,自引:0,他引:2  
The increasing availability of phylogenetic data, computing power and informatics tools has facilitated a rapid expansion of studies that apply phylogenetic data and methods to community ecology. Several key areas are reviewed in which phylogenetic information helps to resolve long-standing controversies in community ecology, challenges previous assumptions, and opens new areas of investigation. In particular, studies in phylogenetic community ecology have helped to reveal the multitude of processes driving community assembly and have demonstrated the importance of evolution in the assembly process. Phylogenetic approaches have also increased understanding of the consequences of community interactions for speciation, adaptation and extinction. Finally, phylogenetic community structure and composition holds promise for predicting ecosystem processes and impacts of global change. Major challenges to advancing these areas remain. In particular, determining the extent to which ecologically relevant traits are phylogenetically conserved or convergent, and over what temporal scale, is critical to understanding the causes of community phylogenetic structure and its evolutionary and ecosystem consequences. Harnessing phylogenetic information to understand and forecast changes in diversity and dynamics of communities is a critical step in managing and restoring the Earth's biota in a time of rapid global change.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号