首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Liu L  Yu L 《Systematic biology》2011,60(5):661-667
In this study, we develop a distance method for inferring unrooted species trees from a collection of unrooted gene trees. The species tree is estimated by the neighbor joining (NJ) tree built from a distance matrix in which the distance between two species is defined as the average number of internodes between two species across gene trees, that is, average gene-tree internode distance. The distance method is named NJ(st) to distinguish it from the original NJ method. Under the coalescent model, we show that if gene trees are known or estimated correctly, the NJ(st) method is statistically consistent in estimating unrooted species trees. The simulation results suggest that NJ(st) and STAR (another coalescence-based method for inferring species trees) perform almost equally well in estimating topologies of species trees, whereas the Bayesian coalescence-based method, BEST, outperforms both NJ(st) and STAR. Unlike BEST and STAR, the NJ(st) method can take unrooted gene trees to infer species trees without using an outgroup. In addition, the NJ(st) method can handle missing data and is thus useful in phylogenomic studies in which data sets often contain missing loci for some individuals.  相似文献   

2.
树木花期预报在林果、养蜂、园林和旅游业等方面有很大的实用价值。该文以大山樱(Prunus sargentii)为例,探讨通过花芽形态测量进行花期预报的新方法。通过1998~2000年对北京玉渊潭公园大山樱进行的数据采集和处理,建立了线性和指数两种预报模型。2002年的试报检验表明,采用3株的观测数据,并利用3日滑动平均的方法,对观测数据进行处理后所作的预报,误差在3 d以内的预报达80%以上;2003年连续测报的平均误差,模型1为1.6 d,模型2为2.1 d。这一树木花期预报的物候学新方法,简便易行、建模周期短、预报精度高,在春季芽膨大后,直至露瓣期之前,可以逐日连续发布预报。  相似文献   

3.
Estimating species trees using multiple-allele DNA sequence data   总被引:3,自引:0,他引:3  
Several techniques, such as concatenation and consensus methods, are available for combining data from multiple loci to produce a single statement of phylogenetic relationships. However, when multiple alleles are sampled from individual species, it becomes more challenging to estimate relationships at the level of species, either because concatenation becomes inappropriate due to conflicts among individual gene trees, or because the species from which multiple alleles have been sampled may not form monophyletic groups in the estimated tree. We propose a Bayesian hierarchical model to reconstruct species trees from multiple-allele, multilocus sequence data, building on a recently proposed method for estimating species trees from single allele multilocus data. A two-step Markov Chain Monte Carlo (MCMC) algorithm is adopted to estimate the posterior distribution of the species tree. The model is applied to estimate the posterior distribution of species trees for two multiple-allele datasets--yeast (Saccharomyces) and birds (Manacus-manakins). The estimates of the species trees using our method are consistent with those inferred from other methods and genetic markers, but in contrast to other species tree methods, it provides credible regions for the species tree. The Bayesian approach described here provides a powerful framework for statistical testing and integration of population genetics and phylogenetics.  相似文献   

4.
What does the posterior probability of a phylogenetic tree mean?This simulation study shows that Bayesian posterior probabilities have the meaning that is typically ascribed to them; the posterior probability of a tree is the probability that the tree is correct, assuming that the model is correct. At the same time, the Bayesian method can be sensitive to model misspecification, and the sensitivity of the Bayesian method appears to be greater than the sensitivity of the nonparametric bootstrap method (using maximum likelihood to estimate trees). Although the estimates of phylogeny obtained by use of the method of maximum likelihood or the Bayesian method are likely to be similar, the assessment of the uncertainty of inferred trees via either bootstrapping (for maximum likelihood estimates) or posterior probabilities (for Bayesian estimates) is not likely to be the same. We suggest that the Bayesian method be implemented with the most complex models of those currently available, as this should reduce the chance that the method will concentrate too much probability on too few trees.  相似文献   

5.
Maximum likelihood supertrees   总被引:2,自引:0,他引:2  
  相似文献   

6.
Human microbiome research characterizes the microbial content of samples from human habitats to learn how interactions between bacteria and their host might impact human health. In this work a novel parametric statistical inference method based on object-oriented data analysis (OODA) for analyzing HMP data is proposed. OODA is an emerging area of statistical inference where the goal is to apply statistical methods to objects such as functions, images, and graphs or trees. The data objects that pertain to this work are taxonomic trees of bacteria built from analysis of 16S rRNA gene sequences (e.g. using RDP); there is one such object for each biological sample analyzed. Our goal is to model and formally compare a set of trees. The contribution of our work is threefold: first, a weighted tree structure to analyze RDP data is introduced; second, using a probability measure to model a set of taxonomic trees, we introduce an approximate MLE procedure for estimating model parameters and we derive LRT statistics for comparing the distributions of two metagenomic populations; and third the Jumpstart HMP data is analyzed using the proposed model providing novel insights and future directions of analysis.  相似文献   

7.
This paper describes a model for the topological mapping of trifurcating botanical trees. The model was based on a system of modular units that represented the interconnectivity of shoot meristems (terminal segments) and internodes (internal segments) within whole plant canopies, organized with increasing centrifugal ordering. The model was capable of describing the dynamics of plant growth as expressed by changes in topological parameters over time. Preliminary calculations for experimental trees indicated that the model represents growth in a biologically sound manner. Methods are described for the calculation of the architecture parameters size, size-complexity, structural complexity, and tree asymmetry index (TAI). Parameter calculations were based on the mathematical principles developed for the classification of bifurcating dendrite trees, and were designed to both extract structural information, and to enable statistical comparison between trees of different size. Parameters were mathematically adjusted for trifurcation, and appeared to be able to represent quantitatively the architectural properties of tree structures. In addition to the calculation of the TAI for trifurcating trees, new methods were developed to enable comparisons to be made of the architectural complexity of trifurcating trees of differing size. These were based on the principle of the pair-wise comparison of the mean centrifugal order number (MCON) with respect to segments against highest order number. We argue and illustrate that this principle can be more informative than that of pair-wise comparison of the MCON against tree degree (topological size). Further improvements to this method were made by examining branching points (vertices) rather than segments (links) to calculate the MCON.  相似文献   

8.
An improved Bayesian method is presented for estimating phylogenetic trees using DNA sequence data. The birth-death process with species sampling is used to specify the prior distribution of phylogenies and ancestral speciation times, and the posterior probabilities of phylogenies are used to estimate the maximum posterior probability (MAP) tree. Monte Carlo integration is used to integrate over the ancestral speciation times for particular trees. A Markov Chain Monte Carlo method is used to generate the set of trees with the highest posterior probabilities. Methods are described for an empirical Bayesian analysis, in which estimates of the speciation and extinction rates are used in calculating the posterior probabilities, and a hierarchical Bayesian analysis, in which these parameters are removed from the model by an additional integration. The Markov Chain Monte Carlo method avoids the requirement of our earlier method for calculating MAP trees to sum over all possible topologies (which limited the number of taxa in an analysis to about five). The methods are applied to analyze DNA sequences for nine species of primates, and the MAP tree, which is identical to a maximum-likelihood estimate of topology, has a probability of approximately 95%.   相似文献   

9.
Liu L  Pearl DK 《Systematic biology》2007,56(3):504-514
The desire to infer the evolutionary history of a group of species should be more viable now that a considerable amount of multilocus molecular data is available. However, the current molecular phylogenetic paradigm still reconstructs gene trees to represent the species tree. Further, commonly used methods of combining data, such as the concatenation method, are known to be inconsistent in some circumstances. In this paper, we propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions, such as those that arise in a Bayesian analysis of DNA sequence data. Our model employs substitution models used in traditional phylogenetics but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a complete stochastic model to estimate gene trees, species trees, ancestral population sizes, and species divergence times simultaneously. Our model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. We apply the method to two multilocus data sets of DNA sequences. The estimates of the species tree topology and divergence times appear to be robust to the prior of the population size, whereas the estimates of effective population sizes are sensitive to the prior used in the analysis. These analyses also suggest that the model is superior to the concatenation method in fitting these data sets and thus provides a more realistic assessment of the variability in the distribution of the species tree that may have produced the molecular information at hand. Future improvements of our model and algorithm should include consideration of other factors that can cause discordance of gene trees and species trees, such as horizontal transfer or gene duplication.  相似文献   

10.
A working guide to boosted regression trees   总被引:33,自引:0,他引:33  
1. Ecologists use statistical models for both explanation and prediction, and need techniques that are flexible enough to express typical features of their data, such as nonlinearities and interactions. 2. This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model. Boosted regression trees combine the strengths of two algorithms: regression trees (models that relate a response to their predictors by recursive binary splits) and boosting (an adaptive method for combining many simple models to give improved predictive performance). The final BRT model can be understood as an additive regression model in which individual terms are simple trees, fitted in a forward, stagewise fashion. 3. Boosted regression trees incorporate important advantages of tree-based methods, handling different types of predictor variables and accommodating missing data. They have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and automatically handle interaction effects between predictors. Fitting multiple trees in BRT overcomes the biggest drawback of single tree models: their relatively poor predictive performance. Although BRT models are complex, they can be summarized in ways that give powerful ecological insight, and their predictive performance is superior to most traditional modelling methods. 4. The unique features of BRT raise a number of practical issues in model fitting. We demonstrate the practicalities and advantages of using BRT through a distributional analysis of the short-finned eel (Anguilla australis Richardson), a native freshwater fish of New Zealand. We use a data set of over 13 000 sites to illustrate effects of several settings, and then fit and interpret a model using a subset of the data. We provide code and a tutorial to enable the wider use of BRT by ecologists.  相似文献   

11.
Signal peptide identification is of immense importance in drug design. Accurate identification of signal peptides is the first critical step to be able to change the direction of the targeting proteins and use the designed drug to target a specific organelle to correct a defect. Because experimental identification is the most accurate method, but is expensive and time-consuming, an efficient and affordable automated system is of great interest. In this article, we propose using an adapted neural network, called a bio-basis function neural network, and decision trees for predicting signal peptides. The bio-basis function neural network model and decision trees achieved 97.16% and 97.63% accuracy respectively, demonstrating that the methods work well for the prediction of signal peptides. Moreover, decision trees revealed that position P(1'), which is important in forming signal peptides, most commonly comprises either leucine or alanine. This concurs with the (P(3)-P(1)-P(1')) coupling model.  相似文献   

12.
Katoh K  Miyata T 《FEBS letters》1999,463(1-2):129-132
Applying the tree bisection and reconnection (TBR) algorithm, we have developed a heuristic method (maximum likelihood (ML)-TBR) for inferring the ML tree based on tree topology search. For initial trees from which iterative processes start in ML-TBR, two cases were considered: one is 100 neighbor-joining (NJ) trees based on the bootstrap resampling and the other is 100 randomly generated trees. The same ML tree was obtained in both cases. All different iterative processes started from 100 independent initial trees ultimately converged on one optimum tree with the largest log-likelihood value, suggesting that a limited number of initial trees will be quite enough in ML-TBR. This also suggests that the optimum tree corresponds to the global optimum in tree topology space and thus probably coincides with the ML tree inferred by intact ML analysis. This method has been applied to the inference of phylogenetic tree of the SOX family members. The mammalian testis-determining gene SRY is believed to have evolved from SOX-3, a member of the SOX family, based on several lines of evidence, including their sequence similarity, the location of SOX-3 on the X chromosome and some aspects of their expression. This model should be supported directly from the phylogenetic tree of the SOX family, but no evidence has been provided to date. A recently published NJ tree shows implausibly remote origin of SRY, suggesting that a more sophisticated method is required for understanding this problem. The ML tree inferred by the present method showed that the SRYs of marsupial and placental mammals form a monophyletic cluster which had diverged from the mammalian SOX-3 in the early evolution of mammals.  相似文献   

13.
The multispecies coalescent provides an elegant theoretical framework for estimating species trees and species demographics from genetic markers. However, practical applications of the multispecies coalescent model are limited by the need to integrate or sample over all gene trees possible for each genetic marker. Here we describe a polynomial-time algorithm that computes the likelihood of a species tree directly from the markers under a finite-sites model of mutation effectively integrating over all possible gene trees. The method applies to independent (unlinked) biallelic markers such as well-spaced single nucleotide polymorphisms, and we have implemented it in SNAPP, a Markov chain Monte Carlo sampler for inferring species trees, divergence dates, and population sizes. We report results from simulation experiments and from an analysis of 1997 amplified fragment length polymorphism loci in 69 individuals sampled from six species of Ourisia (New Zealand native foxglove).  相似文献   

14.
TOPD/FMTS: a new software to compare phylogenetic trees   总被引:1,自引:0,他引:1  
SUMMARY: TOPD/FMTS has been developed to evaluate similarities and differences between phylogenetic trees. The software implements several new algorithms (including the Disagree method that returns the taxa, that disagree between two trees and the Nodal method that compares two trees using nodal information) and several previously described methods (such as the Partition method, Triplets or Quartets) to compare phylogenetic trees. One of the novelties of this software is that the FMTS (From Multiple to Single) program allows the comparison of trees that contain both orthologs and paralogs. Each option is also complemented with a randomization analysis to test the null hypothesis that the similarity between two trees is not better than chance expectation. AVAILABILITY: The Perl source code of TOPD/FMTS is available at http://genomes.urv.es/topd.  相似文献   

15.
Comparisons are made of the accuracy of the restricted maximum-likelihood, Wagner parsimony, and UPGMA (unweighted pair-group method using arithmetic averages) clustering methods to estimate phylogenetic trees. Data matrices were generated by constructing simulated stochastic evolution in a multidimensional gene-frequency space using a simple genetic-drift model (Brownian-motion, random-walk) with constant rates of divergence in all lineages. Ten differentphylogenetic tree topologies of 20 operational taxonomic units (OTU's), representing a range of tree shapes, were used. Felsenstein's restricted maximum-likelihood method, Wagner parsimony, and UPGMA clustering were used to construct trees from the resulting data matrices. The computations for the restricted maximum-likelihood method were performed on a Cray-1 supercomputer since the required calculations (especially when optimized for the vector hardware) are performed substantially faster than on more conventional computing systems. The overall level of accuracy of tree reconstruction depends on the topology of the true phylogenetic tree. The UPGMA clustering method, especially when genetic-distance coefficients are used, gives the most accurate estimates of the true phylogeny (for our model with constant evolutionary rates). For large numbers of loci, all methods give similar results, but trends in the results imply that the restricted maximum-likelihood method would produce the most accurate trees if sample sizes were large enough.  相似文献   

16.
The problem of reconstructing the duplication history of a set of tandemly repeated sequences was first introduced by Fitch (1977). Many recent studies deal with this problem, showing the validity of the unequal recombination model proposed by Fitch, describing numerous inference algorithms, and exploring the combinatorial properties of these new mathematical objects, which are duplication trees. In this paper, we deal with the topological rearrangement of these trees. Classical rearrangements used in phylogeny (NNI, SPR, TBR, ...) cannot be applied directly on duplication trees. We show that restricting the neighborhood defined by the SPR (Subtree Pruning and Regrafting) rearrangement to valid duplication trees, allows exploring the whole duplication tree space. We use these restricted rearrangements in a local search method which improves an initial tree via successive rearrangements. This method is applied to the optimization of parsimony and minimum evolution criteria. We show through simulations that this method improves all existing programs for both reconstructing the topology of the true tree and recovering its duplication events. We apply this approach to tandemly repeated human Zinc finger genes and observe that a much better duplication tree is obtained by our method than using any other program.  相似文献   

17.
Species complexes undergoing rapid radiation present a challenge in molecular systematics because of the possibility that ancestral polymorphism is retained in component gene trees. Coalescent theory has demonstrated that gene trees often fail to match lineage trees when taxon divergence times are less than the ancestral effective population sizes. Suggestions to increase the number of loci and the number of individuals per taxon have been proposed; however, phylogenetic methods to adequately analyze these data in a coalescent framework are scarce. We compare two approaches to estimating lineage (species) trees using multiple individuals and multiple loci: the commonly used partitioned Bayesian analysis of concatenated sequences and a modification of a newly developed hierarchical Bayesian method (BEST) that simultaneously estimates gene trees and species trees from multilocus data. We test these approaches on a phylogeny of rapidly radiating species wherein divergence times are likely to be smaller than effective population sizes, and incomplete lineage sorting is known, in the rodent genus, Thomomys. We use seven independent noncoding nuclear sequence loci (total approximately 4300 bp) and between 1 and 12 individuals per taxon to construct a phylogenetic hypothesis for eight Thomomys species. The majority-rule consensus tree from the partitioned concatenated analysis included 14 strongly supported bipartitions, corroborating monophyletic species status of five of the eight named species. The BEST tree strongly supported only the split between the two subgenera and showed very low support for any other clade. Comparison of both lineage trees to individual gene trees revealed that the concatenation method appears to ignore conflicting signals among gene trees, whereas the BEST tree considers conflicting signals and downweights support for those nodes. Bayes factor analysis of posterior tree distributions from both analyses strongly favor the model underlying the BEST analysis. This comparison underscores the risks of overreliance on results from concatenation, and ignoring the properties of coalescence, especially in cases of recent, rapid radiations.  相似文献   

18.
Gene tree distributions under the coalescent process   总被引:10,自引:0,他引:10  
Under the coalescent model for population divergence, lineage sorting can cause considerable variability in gene trees generated from any given species tree. In this paper, we derive a method for computing the distribution of gene tree topologies given a bifurcating species tree for trees with an arbitrary number of taxa in the case that there is one gene sampled per species. Applications for gene tree distributions include determining exact probabilities of topological equivalence between gene trees and species trees and inferring species trees from multiple datasets. In addition, we examine the shapes of gene tree distributions and their sensitivity to changes in branch lengths, species tree shape, and tree size. The method for computing gene tree distributions is implemented in the computer program COAL.  相似文献   

19.
松材线虫病因其破坏性强、传播速度快和防治难度大等特点,严重威胁着我国的松林资源.及时发现、定位和清理病死松树是控制松材线虫病蔓延的有效手段.本研究利用小型无人机获得松材线虫病疫点的可见光和多光谱的航摄影像.根据松树针叶颜色变化,将松材线虫Bursaphelenchus xylophilus侵染的松树分为病树和枯死树两种...  相似文献   

20.
Ecosystem carbon (C) balance is hypothesised to be sensitive to the mycorrhizal strategies that plants use to acquire nutrients. To test this idea, we coupled an optimality‐based plant nitrogen (N) acquisition model with a microbe‐focused soil organic matter (SOM) model. The model accurately predicted rhizosphere processes and C–N dynamics across a gradient of stands varying in their relative abundance of arbuscular mycorrhizal (AM) and ectomycorrhizal (ECM) trees. When mycorrhizal dominance was switched – ECM trees dominating plots previously occupied by AM trees, and vice versa – legacy effects were apparent, with consequences for both C and N stocks in soil. Under elevated productivity, ECM trees enhanced decomposition more than AM trees via microbial priming of unprotected SOM. Collectively, our results show that ecosystem responses to global change may hinge on the balance between rhizosphere priming and SOM protection, and highlight the importance of dynamically linking plants and microbes in terrestrial biosphere models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号