首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
With growing amounts of genome data and constant improvement of models of molecular evolution, phylogenetic reconstruction became more reliable. However, our knowledge of the real process of molecular evolution is still limited. When enough large-sized data sets are analyzed, any subtle biases in statistical models can support incorrect topologies significantly because of the high signal-to-noise ratio. We propose a procedure to locate sequences in a multidimensional vector space (MVS), in which the geometry of the space is uniquely determined in such a way that the vectors of sequence evolution are orthogonal among different branches. In this paper, the MVS approach is developed to detect and remove biases in models of molecular evolution caused by unrecognized convergent evolution among lineages or unexpected patterns of substitutions. Biases in the estimated pairwise distances are identified as deviations (outliers) of sequence spatial vectors from the expected orthogonality. Modifications to the estimated distances are made by minimizing an index to quantify the deviations. In this way, it becomes possible to reconstruct the phylogenetic tree, taking account of possible biases in the model of molecular evolution. The efficacy of the modification procedure was verified by simulating evolution on various topologies with rate heterogeneity and convergent change. The phylogeny of placental mammals in previous analyses of large data sets has varied according to the genes being analyzed. Systematic deviations caused by convergent evolution were detected by our procedure in all representative data sets and were found to strongly affect the tree structure. However, the bias correction yielded a consistent topology among data sets. The existence of strong biases was validated by examining the sites of convergent evolution between the hedgehog and other species in mitochondrial data set. This convergent evolution explains why it has been difficult to determine the phylogenetic placement of the hedgehog in previous studies.  相似文献   

2.

Background  

Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction.  相似文献   

3.
As whole genome sequences continue to expand in number and complexity, effective methods for comparing and categorizing both genes and species represented within extremely large datasets are required. Methods introduced to date have generally utilized incomplete and likely insufficient subsets of the available data. We have developed an accurate and efficient method for producing robust gene and species phylogenies using very large whole genome protein datasets. This method relies on multidimensional protein vector definitions supplied by the singular value decomposition (SVD) of a large sparse data matrix in which each protein is uniquely represented as a vector of overlapping tetrapeptide frequencies. Quantitative pairwise estimates of species similarity were obtained by summing the protein vectors to form species vectors, then determining the cosines of the angles between species vectors. Evolutionary trees produced using this method confirmed many accepted prokaryotic relationships. However, several unconventional relationships were also noted. In addition, we demonstrate that many of the SVD-derived right basis vectors represent particular conserved protein families, while many of the corresponding left basis vectors describe conserved motifs within these families as sets of correlated peptides (copeps). This analysis represents the most detailed simultaneous comparison of prokaryotic genes and species available to date.  相似文献   

4.
5.
Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.   相似文献   

6.
Only recently has Bayesian inference of phylogeny been proposed. The method is now a practical alternative to the other methods; indeed, the method appears to possess advantages over the other methods in terms of ability to use complex models of evolution, ease of interpretation of the results, and computational efficiency. However, the method should be used cautiously. The results of a Bayesian analysis should be examined with respect to the sensitivity of the results to the priors used and the reliability of the Markov chain Monte Carlo approximation of the probabilities of trees.  相似文献   

7.
Xiao-Guang Yang 《Biologia》2009,64(4):811-818
The phylogeny of Cetacea (whales, dolphins, porpoises) has long attracted the interests of biologists and has been investigated by many researchers based on different datasets. However, some phylogenetic relationships within Cetacea still remain controversial. In this study, Bayesian analyses were performed to infer the phylogeny of 25 representative species within Cetacea based on their mitochondrial genomes for the first time. The analyses recovered the clades resolved by the previous studies and strongly supported most of the current cetacean classifications, such as the monophyly of Odontoceti (toothed whales) and Mysticeti (baleen whales). The analyses provided a reliable and comprehensive phylogeny of Cetacea which can provide a foundation for further exploration of cetacean ecology, conservation and biology. The results also showed that: (i) the mitochondrial genomes were very informative for inferring phylogeny of Cetacea; and (ii) the Bayesian analyses outperformed other phylogenetic methods on inferring mitochondrial genome-based phylogeny of Cetacea.  相似文献   

8.
Adipokinetic neuropeptides from the corpora cardiaca of 17 species of Odonata encompassing mainly the families Corduliidae and Libellulidae were isolated and structurally elucidated using liquid chromatography coupled with ion trap electrospray ionization mass spectrometry. It became evident that all species of the family Corduliidae studied express the peptide code-named Libau-AKH (pGlu-Val-Asn-Phe-Thr-Pro-Ser-Trp amide), which is also present in all but one libellulid species, Erythemis simplicicollis which expresses Erysi-AKH (pGlu-Leu-Asn-Phe-Thr-Pro-Ser-Trp amide). This divergence from all other Libellulids is due to a nonsynonymous missense single nucleotide polymorphism (SNP) in the nucleotide coding sequence (CDS) of prepro-AKH CDS and supports the polyphyletic nature of Sympetrinae and other subfamilies of libellulids. Despite this exception, these findings then support the hypothesis that Corduliidae and Libellulidae are closely related as stated in most phylogenies. The presence of Anaim-AKH (pGlu-Val-Asn-Phe-Ser-Pro-Ser-Trp amide) in Macromiidae likely distinguishes species in this family from Corduliidae. Current molecular genetic phylogenies and our AKH findings suggest that Syncordulia gracilis, which expresses Anaim-AKH, does not belong in Corduliidae. Evolution of AKHs in anisopteran Odonata are likely due to nucleotide substitution involving nonsynonymous missense SNPs in the CDS of prepro-AKH.  相似文献   

9.
Methods for Bayesian inference of phylogeny using DNA sequences based on Markov chain Monte Carlo (MCMC) techniques allow the incorporation of arbitrarily complex models of the DNA substitution process, and other aspects of evolution. This has increased the realism of models, potentially improving the accuracy of the methods, and is largely responsible for their recent popularity. Another consequence of the increased complexity of models in Bayesian phylogenetics is that these models have, in several cases, become overparameterized. In such cases, some parameters of the model are not identifiable; different combinations of nonidentifiable parameters lead to the same likelihood, making it impossible to decide among the potential parameter values based on the data. Overparameterized models can also slow the rate of convergence of MCMC algorithms due to large negative correlations among parameters in the posterior probability distribution. Functions of parameters can sometimes be found, in overparameterized models, that are identifiable, and inferences based on these functions are legitimate. Examples are presented of overparameterized models that have been proposed in the context of several Bayesian methods for inferring the relative ages of nodes in a phylogeny when the substitution rate evolves over time.  相似文献   

10.
SUMMARY: BAli-Phy is a Bayesian posterior sampler that employs Markov chain Monte Carlo to explore the joint space of alignment and phylogeny given molecular sequence data. Simultaneous estimation eliminates bias toward inaccurate alignment guide-trees, employs more sophisticated substitution models during alignment and automatically utilizes information in shared insertion/deletions to help infer phylogenies. AVAILABILITY: Software is available for download at http://www.biomath.ucla.edu/msuchard/bali-phy.  相似文献   

11.
Success of maximum likelihood phylogeny inference in the four-taxon case   总被引:8,自引:4,他引:8  
We used simulated data to investigate a number of properties of maximum- likelihood (ML) phylogenetic tree estimation for the case of four taxa. Simulated data were generated under a broad range of conditions, including wide variation in branch lengths, differences in the ratio of transition and transversion substitutions, and the absence of presence of gamma-distributed site-to-site rate variation. Data were analyzed in the ML framework with two different substitution models, and we compared the ability of the two models to reconstruct the correct topology. Although both models were inconsistent for some branch-length combinations in the presence of site-to-site variation, the models were efficient predictors of topology under most simulation conditions. We also examined the performance of the likelihood ratio (LR) test for significant positive interior branch length. This test was found to be misleading under many simulation conditions, rejecting too often under some simulation conditions. Under the null hypothesis of zero length internal branch, LR statistics are assumed to be asymptotically distributed chi 2(1); with limited data, the distribution of LR statistics under the null hypothesis varies from chi 2(1).   相似文献   

12.
In the construction of large antibody libraries by in vivo recombination, two non-homogeneous loxP sites are required for the exchange of Vgenes between phagemids to create many new VH-VL combinations.The mutated loxP511 was designed not to recombine with the wild-type loxP (loxPwt) in early studies and a combination of the two has been used to construct antibody libraries. But recent reports have shown that recombination occurs between loxPwt and loxP511. This suggests that the combinational use of loxP511 and loxPwt might lead to the loss of the V gene diversity of antibody libraries. Therefore, it is necessary to find a new combination of loxPs to avoid the excision recombination in the antibody library. In this study,we found that the excision recombination between loxP511 and loxP2272, another mutated loxP sequence,was undetectable within one phagemid, while the excision recombination between loxP511 and loxPwt occurred at a frequency of 40%, higher than that reported previously. Furthermore, the in vivo recombination of different phagemids with loxP511 and loxP2272 showed that the V gene exchange was efficiently mediated to produce new VH-VL combinations. It was concluded that the loxP511 and loxP2272 combination was more favorable for reducing the excision recombination and constructing large phage antibody libraries with high diversity.  相似文献   

13.
Summary A maximum likelihood method for inferring protein phylogeny was developed. It is based on a Markov model that takes into account the unequal transition probabilities among pairs of amino acids and does not assume constancy of rate among different lineages. Therefore, this method is expected to be powerful in inferring phylogeny among distantly related proteins, either orthologous or parallogous, where the evolutionary rate may deviate from constancy. Not only amino acid substitutions but also insertion/deletion events during evolution were incorporated into the Markov model. A simple method for estimating a bootstrap probability for the maximum likelihood tree among alternatives without performing a maximum likelihood estimation for each resampled data set was developed. These methods were applied to amino acid sequence data of a photosynthetic membrane protein,psbA, from photosystem II, and the phylogeny of this protein was discussed in relation to the origin of chloroplasts.  相似文献   

14.
Vroom JA  Wang CL 《BioTechniques》2008,44(7):924-926
We have developed a modular method of plasmid construction that can join multiple DNA components in a single reaction. A nicking enzyme is used to create 5' and 3' overhangs on PCR-generated DNA components. Without the use of ligase or restriction enzymes, components are joined using oligonucleotide linkers that recognize the overhangs. By specifying the sequences of the linkers, desired components can be assembled in any combination and order to generate different plasmid vectors.  相似文献   

15.
We recently developed a method for producing comprehensive gene and species phylogenies from unaligned whole genome data using singular value decomposition (SVD) to analyze character string frequencies. This work provides an integrated gene and species phylogeny for 64 vertebrate mitochondrial genomes composed of 832 total proteins. In addition, to provide a theoretical basis for the method, we present a graphical interpretation of both the original frequency matrix and the SVD-derived matrix. These large matrices describe high-dimensional Euclidean spaces within which biomolecular sequences can be uniquely represented as vectors. In particular, the SVD-derived vector space describes each protein relative to a restricted set of newly defined, independent axes, each of which represents a novel form of conserved motif, termed a correlated peptide motif. A quantitative comparison of the relative orientations of protein vectors in this space provides accurate and straightforward estimates of sequence similarity, which can in turn be used to produce comprehensive gene trees. Alternatively, the vector representations of genes from individual species can be summed, allowing species trees to be produced.  相似文献   

16.
Veratrum (Melanthiaceae) comprises ca. 27 species with highly variable morphology. This study aims to construct the molecular phylogeny of this genus to infer its floral evolution and historical biogeography, which have not been examined in detail before. Maximum parsimony, maximum likelihood, and Bayesian analyses were performed on the separate and combined ITS, trnL-F, and atpB-rbcL sequences to reconstruct the phylogenetic tree of the genus. All Veratrum taxa formed a monophyletic group, within which two distinct clades were distinguished: species with white-to-green perianth formed one highly supported clade, and the species with black-purple perianth constituted another highly supported clade. Phylogenetic inference on flower color evolution suggested that white-to-green perianth was a plesiomorphic state and black-purple perianth was apomorphic for Veratrum. When species distribution areas were traced as a multi-state character, parsimonious optimization inferred that Veratrum possibly originated in East Asia. Our study confirmed previous phylogenetic and taxonomic suggestions on this genus and provided a typical example of plant radiation across the Northern Hemisphere.  相似文献   

17.
Molecular phylogeny has been regarded as the ultimate tool for the reconstruction of relationships among eukaryotes-especially the different protist groups-given the difficulty in interpreting morphological data from an evolutionary point of view. In fact, the use of ribosomal RNA as a marker has provided the first well resolved eukaryotic phylogenies, leading to several important evolutionary hypotheses. The most significant is that several early-emerging, amitochondriate lineages, are living relics from the early times of eukaryotic evolution. The use of alternative protein markers and the recognition of several molecular phylogeny reconstruction artefacts, however, have strongly challenged these ideas. The putative early emerging lineages have been demonstrated as late-emerging ones, artefactually misplaced to the base of the tree. The present state of eukaryotic evolution is best described by a multifurcation, in agreement with the 'big bang' hypothesis that assumes a rapid diversification of the major eukaryotic phyla. For further resolution, the analysis of genomic data through improved phylogenetic methods will be required.  相似文献   

18.

Background  

The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes.  相似文献   

19.
Brownian motion has been a model widely used for describing phenotypic evolution of continuous characters under random drift. Evolution of traits evolving under weak stabilizing selection, together with drift, can also be modeled by the Ornstein-Uhlenbeck process, in which a population moves at random on an adaptive peak under the influence of drift with selection returning the population towards the optimum. Obviously, reliability of an evolutionary model stands or falls with the extent to which the underlying assumptions are supported or violated. Another potential problem of continuous characters as a source of data for phylogeny inference is the correlation between them. To assess whether the Brownian motion model or the Ornstein-Uhlenbeck model are suitable for modeling the evolution of continuous cranial and dental characters and to what extent these characters are correlated with one another, 11 measurements encompassing various aspects of the mouse skull morphology were collected and subjected to a comparative analysis using the generalized least squares method. It could be shown that only about one-half of the characters evolved according to the Brownian motion model or the Ornstein-Uhlenbeck model. Moreover, about 44% of the correlation coefficients exceeded 0.8, suggesting a need for removing at least phenotypic covariances from the data prior to a phylogenetic analysis. Finally, ancestral states of the characters under study were estimated with the generalized least square method. There has been a general trend towards enlarging the overall size of the skull and increasing the braincase volume in the species of the genus Mus.  相似文献   

20.
In this paper, a 3D map of protein fold space was produced using Dali structure alignment and nonmetric multidimensional scaling. The fold space comprises four radial clusters, which correspond to the four classes of SCOP. The overall structure of the protein fold space is largely determined by three factors: secondary structure composition, topology of beta sheet, and domain size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号