首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Phylogenetic analyses of 110 serpin protein sequences revealed clades consistent with independent phylogenetic analyses based on exon-intron structure and diagnostic amino acid sites. Trees were estimated by maximum likelihood, neighbor joining, and partial split decomposition using both the BLOSUM 62 and Jones-Taylor-Thornton substitution matrices. Neighbor-joining trees gave results closest to those based on independent analyses using genomic and chromosomal data. The maximum-likelihood trees derived using the quartet puzzling algorithm were very conservative, producing many small clades that separated groups of proteins that other results suggest were related. Independent analyses based on exon-intron structure suggested that a neighbor-joining tree was more accurate than maximum-likelihood trees obtained using the quartet puzzling algorithm.  相似文献   

2.
Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions.  相似文献   

3.
We would like to use maximum likelihood to estimate parameters such as the effective population size N(e) or, if we do not know mutation rates, the product 4N(e) mu of mutation rate per site and effective population size. To compute the likelihood for a sample of unrecombined nucleotide sequences taken from a random-mating population it is necessary to sum over all genealogies that could have led to the sequences, computing for each one the probability that it would have yielded the sequences, and weighting each one by its prior probability. The genealogies vary in tree topology and in branch lengths. Although the likelihood and the prior are straightforward to compute, the summation over all genealogies seems at first sight hopelessly difficult. This paper reports that it is possible to carry out a Monte Carlo integration to evaluate the likelihoods approximately. The method uses bootstrap sampling of sites to create data sets for each of which a maximum likelihood tree is estimated. The resulting trees are assumed to be sampled from a distribution whose height is proportional to the likelihood surface for the full data. That it will be so is dependent on a theorem which is not proven, but seems likely to be true if the sequences are not short. One can use the resulting estimated likelihood curve to make a maximum likelihood estimate of the parameter of interest, N(e) or of 4N(e) mu. The method requires at least 100 times the computational effort required for estimation of a phylogeny by maximum likelihood, but is practical on today's work stations. The method does not at present have any way of dealing with recombination.  相似文献   

4.
We examined the effect of increasing the number of sampled amplified fragment length polymorphism (AFLP) bands to reconstruct an accurate and well-supported AFLP-based phylogeny. In silico AFLP was performed using simulated DNA sequences evolving along balanced and unbalanced model trees with recent, uniform and ancient radiations and average branch lengths (from the most internal node to the tip) ranging from 0.02 to 0.05 substitutions per site. Trees were estimated by minimum evolution (ME) and maximum parsimony (MP) methods from both DNA sequences and virtual AFLP fingerprints. The comparison of the true tree with the estimated AFLP trees suggests that moderate numbers of AFLP bands are necessary to recover the correct topology with high bootstrap support values (i.e. >70%). Fewer numbers of bands are necessary for shorter tree lengths and for balanced than for unbalanced tree topologies. However, branch length estimation was rather unreliable and did not improve substantially after a certain number of bands were sampled. These results hold for different levels of genome coverage and number of taxa analysed. In silico AFLP using bacterial genomic DNA sequences recovered a well-supported tree topology that mirrored an empirical phylogeny based on a set of 31 orthologous gene sequences when as few as 263 AFLP bands were scored. These results suggest that AFLPs may be an efficient alternative to traditional DNA sequencing for accurate topology reconstruction of shallow trees when not very short ancestral branches exist.  相似文献   

5.
Efficient likelihood computations with nonreversible models of evolution   总被引:4,自引:0,他引:4  
Recent advances in heuristics have made maximum likelihood phylogenetic tree estimation tractable for hundreds of sequences. Noticeably, these algorithms are currently limited to reversible models of evolution, in which Felsenstein's pulley principle applies. In this paper we show that by reorganizing the way likelihood is computed, one can efficiently compute the likelihood of a tree from any of its nodes with a nonreversible model of DNA sequence evolution, and hence benefit from cutting-edge heuristics. This computational trick can be used with reversible models of evolution without any extra cost. We then introduce nhPhyML, the adaptation of the nonhomogeneous nonstationary model of Galtier and Gouy (1998; Mol. Biol. Evol. 15:871-879) to the structure of PhyML, as well as an approximation of the model in which the set of equilibrium frequencies is limited. This new version shows good results both in terms of exploration of the space of tree topologies and ancestral G+C content estimation. We eventually apply it to rRNA sequences slowly evolving sites and conclude that the model and a wider taxonomic sampling still do not plead for a hyperthermophilic last universal common ancestor.  相似文献   

6.
MOTIVATION: A large, high-quality database of homologous sequence alignments with good estimates of their corresponding phylogenetic trees will be a valuable resource to those studying phylogenetics. It will allow researchers to compare current and new models of sequence evolution across a large variety of sequences. The large quantity of data may provide inspiration for new models and methodology to study sequence evolution and may allow general statements about the relative effect of different molecular processes on evolution. RESULTS: The Pandit 7.6 database contains 4341 families of sequences derived from the seed alignments of the Pfam database of amino acid alignments of families of homologous protein domains (Bateman et al., 2002). Each family in Pandit includes an alignment of amino acid sequences that matches the corresponding Pfam family seed alignment, an alignment of DNA sequences that contain the coding sequence of the Pfam alignment when they can be recovered (overall, 82.9% of sequences taken from Pfam) and the alignment of amino acid sequences restricted to only those sequences for which a DNA sequence could be recovered. Each of the alignments has an estimate of the phylogenetic tree associated with it. The tree topologies were obtained using the neighbor joining method based on maximum likelihood estimates of the evolutionary distances, with branch lengths then calculated using a standard maximum likelihood approach.  相似文献   

7.
MOTIVATION: In recent years there has been increased interest in producing large and accurate phylogenetic trees using statistical approaches. However for a large number of taxa, it is not feasible to construct large and accurate trees using only a single processor. A number of specialized parallel programs have been produced in an attempt to address the huge computational requirements of maximum likelihood. We express a number of concerns about the current set of parallel phylogenetic programs which are currently severely limiting the widespread availability and use of parallel computing in maximum likelihood-based phylogenetic analysis. RESULTS: We have identified the suitability of phylogenetic analysis to large-scale heterogeneous distributed computing. We have completed a distributed and fully cross-platform phylogenetic tree building program called distributed phylogeny reconstruction by maximum likelihood. It uses an already proven maximum likelihood-based tree building algorithm and a popular phylogenetic analysis library for all its likelihood calculations. It offers one of the most extensive sets of DNA substitution models currently available. We are the first, to our knowledge, to report the completion of a distributed phylogenetic tree building program that can achieve near-linear speedup while only using the idle clock cycles of machines. For those in an academic or corporate environment with hundreds of idle desktop machines, we have shown how distributed computing can deliver a 'free' ML supercomputer.  相似文献   

8.
Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson–Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).    相似文献   

9.
Several maximum likelihood and distance matrix methods for estimating phylogenetic trees from homologous DNA sequences were compared when substitution rates at sites were assumed to follow a gamma distribution. Computer simulations were performed to estimate the probabilities that various tree estimation methods recover the true tree topology. The case of four species was considered, and a few combinations of parameters were examined. Attention was applied to discriminating among different sources of error in tree reconstruction, i.e., the inconsistency of the tree estimation method, the sampling error in the estimated tree due to limited sequence length, and the sampling error in the estimated probability due to the number of simulations being limited. Compared to the least squares method based on pairwise distance estimates, the joint likelihood analysis is found to be more robust when rate variation over sites is present but ignored and an assumption is thus violated. With limited data, the likelihood method has a much higher probability of recovering the true tree and is therefore more efficient than the least squares method. The concept of statistical consistency of a tree estimation method and its implications were explored, and it is suggested that, while the efficiency (or sampling error) of a tree estimation method is a very important property, statistical consistency of the method over a wide range of, if not all, parameter values is prerequisite.  相似文献   

10.
Summary A method for molecular phylogeny construction is newly developed. The method, called the stepwise ancestral sequence method, estimates molecular phylogenetic trees and ancestral sequences simultaneously on the basis of parsimony and sequence homology. For simplicity the emphasis is placed more on parsiomony than on sequence homology in the present study, though both are certainly important. Because parsimony alone will sometimes generate plural candidate trees, the method retains not one but five candidates from which one can then single out the final tree taking other criteria into account.The properties and performance of the method are then examined by simulating an evolving gene along a model phylogenetic tree. The estimated trees are found to lie in a narrow range of the parsimony criteria used in the present study. Thus, other criteria such as biological evidence and likelihood are necessary to single out the correct tree among them, with biological evidence taking precedence over any other criterion. The computer simulation also reveals that the method satisfactorily estimates both tree topology and ancestral sequences, at least for the evolutionary model used in the present study.  相似文献   

11.
The Charadriiformes is a large and diverse order of shorebirds currently classified into 19 families, including morphologically aberrant forms that are of uncertain phylogenetic placement within non-passerine birds in general. Recent attempts using morphological characters have failed to recover a well-supported phylogeny depicting higher level relationships within Charadriiformes and the limits to the order, primarily because of inconsistency and homoplasy in these data. Moreover, these trees are incongruent with the relationships presented in the DNA hybridization tapestry of, including the location of the root and the branching order of major clades within the shorebirds. To help clarify this systematic confusion we therefore sequenced the large RAG-1 nuclear exon (2850 bp) from 36 species representing 17 families of shorebirds for which DNA was available. Trees built with maximum parsimony, maximum likelihood or Bayesian methods are topologically identical and fully resolved, with high support at basal nodes. This further attests to the phylogenetic utility of the RAG-1 sequences at higher taxonomic levels within birds. The RAG-1 tree is topologically similar to the DNA hybridization tree in depicting three major subordinal clades of shorebirds, the Charadrii (thick-knees, sheathbills, plovers, oystercatchers, and allies), Scolopaci (sandpipers and jacanas) and the Lari (coursers, pratincoles, gulls, terns, skimmers, and skuas). However, the basal split in the RAG-1 tree is between Charadrii and (Scolopaci+Lari), whereas in the DNA hybridization tree Scolopaci is the sister group to the (Charadrii+Lari). Thus in both of these DNA-based trees the Alcidae (auks, murres, and allies) are not basal among shorebirds as hypothesized in morphological trees, but instead are placed as a tip clade within Lari. The enigmatic buttonquails (Turnicidae), variously hypothesized as being allied to either the Galliformes, Gruiformes, or Charadriiformes, are shown to be a basal lineage in the more conventional Lari clade. Divergence times estimated with rate-smoothing methods and minimum time constraints imposed at nodes with key fossils suggest that Charadriiformes originated in Gondwanaland.  相似文献   

12.
The ant subfamily Formicinae is a large assemblage (2458 species (J. Nat. Hist. 29 (1995) 1037), including species that weave leaf nests together with larval silk and in which the metapleural gland-the ancestrally defining ant character-has been secondarily lost. We used sequences from two mitochondrial genes (cytochrome b and cytochrome oxidase 2) from 18 formicine and 4 outgroup taxa to derive a robust phylogeny, employing a search for tree islands using 10000 randomly constructed trees as starting points and deriving a maximum likelihood consensus tree from the ML tree and those not significantly different from it. Non-parametric bootstrapping showed that the ML consensus tree fit the data significantly better than three scenarios based on morphology, with that of Bolton (Identification Guide to the Ant Genera of the World, Harvard University Press, Cambridge, MA) being the best among these alternative trees. Trait mapping showed that weaving had arisen at least four times and possibly been lost once. A maximum likelihood analysis showed that loss of the metapleural gland is significantly associated with the weaver life-pattern. The graph of the frequencies with which trees were discovered versus their likelihood indicates that trees with high likelihoods have much larger basins of attraction than those with lower likelihoods. While this result indicates that single searches are more likely to find high- than low-likelihood tree islands, it also indicates that searching only for the single best tree may lose important information.  相似文献   

13.
Liu K  Warnow T 《PloS one》2012,7(3):e33104
The standard approach to phylogeny estimation uses two phases, in which the first phase produces an alignment on a set of homologous sequences, and the second phase estimates a tree on the multiple sequence alignment. POY, a method which seeks a tree/alignment pair minimizing the total treelength, is the most widely used alternative to this two-phase approach. The topological accuracy of trees computed under treelength optimization is, however, controversial. In particular, one study showed that treelength optimization using simple gap penalties produced poor trees and alignments, and suggested the possibility that if POY were used with an affine gap penalty, it might be able to be competitive with the best two-phase methods. In this paper we report on a study addressing this possibility. We present a new heuristic for treelength, called BeeTLe (Better Treelength), that is guaranteed to produce trees at least as short as POY. We then use this heuristic to analyze a large number of simulated and biological datasets, and compare the resultant trees and alignments to those produced using POY and also maximum likelihood (ML) and maximum parsimony (MP) trees computed on a number of alignments. In general, we find that trees produced by BeeTLe are shorter and more topologically accurate than POY trees, but that neither POY nor BeeTLe produces trees as topologically accurate as ML trees produced on standard alignments. These findings, taken as a whole, suggest that treelength optimization is not as good an approach to phylogenetic tree estimation as maximum likelihood based upon good alignment methods.  相似文献   

14.
Steel demonstrated that the maximum-likelihood function for a phylogenetic tree may have multiple local maxima. If this phenomenon were general, it would compromise the applicability of maximum likelihood as an optimality criterion for phylogenetic trees. In several simulation studies reported on in this paper, the true tree, and other trees of very high likelihood, rarely had multiple maxima. Our results thus provide reassurance that the value of maximum likelihood as a tree selection criterion is not compromised by the presence of multiple local maxima--the best estimates of the true tree are not likely to have them. This result holds true even when an incorrect nucleotide substitution model is used for tree selection.  相似文献   

15.
Modeling residue usage in aligned protein sequences via maximum likelihood   总被引:9,自引:6,他引:3  
A computational method is presented for characterizing residue usage, i.e., site-specific residue frequencies, in aligned protein sequences. The method obtains frequency estimates that maximize the likelihood of the sequences in a simple model for sequence evolution, given a tree or a set of candidate trees computed by other methods. These maximum- likelihood frequencies constitute a profile of the sequences, and thus the method offers a rigorous alternative to sequence weighting for constructing such a profile. The ability of this method to discard misleading phylogenetic effects allows the biochemical propensities of different positions in a sequence to be more clearly observed and interpreted.   相似文献   

16.
17.
IQPNNI: moving fast through tree space and stopping in time   总被引:12,自引:0,他引:12  
An efficient tree reconstruction method (IQPNNI) is introduced to reconstruct a phylogenetic tree based on DNA or amino acid sequence data. Our approach combines various fast algorithms to generate a list of potential candidate trees. The key ingredient is the definition of so-called important quartets (IQs), which allow the computation of an intermediate tree in O(n(2)) time for n sequences. The resulting tree is then further optimized by applying the nearest neighbor interchange (NNI) operation. Subsequently a random fraction of the sequences is deleted from the best tree found so far. The deleted sequences are then re-inserted in the smaller tree using the important quartet puzzling (IQP) algorithm. These steps are repeated several times and the best tree, with respect to the likelihood criterion, is considered as the inferred phylogenetic tree. Moreover, we suggest a rule which indicates when to stop the search. Simulations show that IQPNNI gives a slightly better accuracy than other programs tested. Moreover, we applied the approach to 218 small subunit rRNA sequences and 500 rbcL sequences. We found trees with higher likelihood compared to the results by others. A program to reconstruct DNA or amino acid based phylogenetic trees is available online (http://www.bi.uni-duesseldorf.de/software/iqpnni).  相似文献   

18.
MOTIVATION: B cells responding to antigenic stimulation can fine-tune their binding properties through a process of affinity maturation composed of somatic hypermutation, affinity-selection and clonal expansion. The mutation rate of the B cell receptor DNA sequence, and the effect of these mutations on affinity and specificity, are of critical importance for understanding immune and autoimmune processes. Unbiased estimates of these properties are currently lacking due to the short time-scales involved and the small numbers of sequences available. RESULTS: We have developed a bioinformatic method based on a maximum likelihood analysis of phylogenetic lineage trees to estimate the parameters of a B cell clonal expansion model, which includes somatic hypermutation with the possibility of lethal mutations. Lineage trees are created from clonally related B cell receptor DNA sequences. Important links between tree shapes and underlying model parameters are identified using mutual information. Parameters are estimated using a likelihood function based on the joint distribution of several tree shapes, without requiring a priori knowledge of the number of generations in the clone (which is not available for rapidly dividing populations in vivo). A systematic validation on synthetic trees produced by a mutating birth-death process simulation shows that our estimates are precise and robust to several underlying assumptions. These methods are applied to experimental data from autoimmune mice to demonstrate the existence of hypermutating B cells in an unexpected location in the spleen.  相似文献   

19.
20.
Phylogenetic relationships of mushrooms and their relatives within the order Agaricales were addressed by using nuclear large subunit ribosomal DNA sequences. Approximately 900 bases of the 5' end of the nucleus-encoded large subunit RNA gene were sequenced for 154 selected taxa representing most families within the Agaricales. Several phylogenetic methods were used, including weighted and equally weighted parsimony (MP), maximum likelihood (ML), and distance methods (NJ). The starting tree for branch swapping in the ML analyses was the tree with the highest ML score among previously produced MP and NJ trees. A high degree of consensus was observed between phylogenetic estimates obtained through MP and ML. NJ trees differed according to the distance model that was used; however, all NJ trees still supported most of the same terminal groupings as the MP and ML trees did. NJ trees were always significantly suboptimal when evaluated against the best MP and ML trees, by both parsimony and likelihood tests. Our analyses suggest that weighted MP and ML provide the best estimates of Agaricales phylogeny. Similar support was observed between bootstrapping and jackknifing methods for evaluation of tree robustness. Phylogenetic analyses revealed many groups of agaricoid fungi that are supported by moderate to high bootstrap or jackknife values or are consistent with morphology-based classification schemes. Analyses also support separate placement of the boletes and russules, which are basal to the main core group of gilled mushrooms (the Agaricineae of Singer). Examples of monophyletic groups include the families Amanitaceae, Coprinaceae (excluding Coprinus comatus and subfamily Panaeolideae), Agaricaceae (excluding the Cystodermateae), and Strophariaceae pro parte (Stropharia, Pholiota, and Hypholoma); the mycorrhizal species of Tricholoma (including Leucopaxillus, also mycorrhizal); Mycena and Resinomycena; Termitomyces, Podabrella, and Lyophyllum; and Pleurotus with Hohenbuehelia. Several groups revealed by these data to be nonmonophyletic include the families Tricholomataceae, Cortinariaceae, and Hygrophoraceae and the genera Clitocybe, Omphalina, and Marasmius. This study provides a framework for future systematics studies in the Agaricales and suggestions for analyzing large molecular data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号