首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

The HIV virus is known for its ability to exploit numerous genetic and evolutionary mechanisms to ensure its proliferation, among them, high replication, mutation and recombination rates. Sliding MinPD, a recently introduced computational method [1], was used to investigate the patterns of evolution of serially-sampled HIV-1 sequence data from eight patients with a special focus on the emergence of X4 strains. Unlike other phylogenetic methods, Sliding MinPD combines distance-based inference with a nonparametric bootstrap procedure and automated recombination detection to reconstruct the evolutionary history of longitudinal sequence data. We present serial evolutionary networks as a longitudinal representation of the mutational pathways of a viral population in a within-host environment. The longitudinal representation of the evolutionary networks was complemented with charts of clinical markers to facilitate correlation analysis between pertinent clinical information and the evolutionary relationships.  相似文献   

2.
A new sequence distance measure for phylogenetic tree construction   总被引:5,自引:0,他引:5  
MOTIVATION: Most existing approaches for phylogenetic inference use multiple alignment of sequences and assume some sort of an evolutionary model. The multiple alignment strategy does not work for all types of data, e.g. whole genome phylogeny, and the evolutionary models may not always be correct. We propose a new sequence distance measure based on the relative information between the sequences using Lempel-Ziv complexity. The distance matrix thus obtained can be used to construct phylogenetic trees. RESULTS: The proposed approach does not require sequence alignment and is totally automatic. The algorithm has successfully constructed consistent phylogenies for real and simulated data sets. AVAILABILITY: Available on request from the authors.  相似文献   

3.
Evolutionary relationships are typically inferred from molecular sequence data using a statistical model of the evolutionary process. When the model accurately reflects the underlying process, probabilistic phylogenetic methods recover the correct relationships with high accuracy. There is ample evidence, however, that models commonly used today do not adequately reflect real-world evolutionary dynamics. Virtually all contemporary models assume that relatively fast-evolving sites are fast across the entire tree, whereas slower sites always evolve at relatively slower rates. Many molecular sequences, however, exhibit site-specific changes in evolutionary rates, called "heterotachy." Here we examine the accuracy of 2 phylogenetic methods for incorporating heterotachy, the mixed branch length model--which incorporates site-specific rate changes by summing likelihoods over multiple sets of branch lengths on the same tree--and the covarion model, which uses a hidden Markov process to allow sites to switch between variable and invariable as they evolve. Under a variety of simple heterogeneous simulation conditions, the mixed model was dramatically more accurate than homotachous models, which were subject to topological biases as well as biases in branch length estimates. When data were simulated with strong versions of the types of heterotachy observed in real molecular sequences, the mixed branch length model was more accurate than homotachous techniques. Analyses of empirical data sets confirmed that the mixed branch length model can improve phylogenetic accuracy under conditions that cause homotachous models to fail. In contrast, the covarion model did not improve phylogenetic accuracy compared with homotachous models and was sometimes substantially less accurate. We conclude that a mixed branch length approach, although not the solution to all phylogenetic errors, is a valuable strategy for improving the accuracy of inferred trees.  相似文献   

4.
Observations from molecular marker studies on recently diverged species indicate that substitution patterns in DNA sequences can often be complex and poorly described by tree-like bifurcating evolutionary models. These observations might result from processes of species diversification and/or processes of sequence evolution that are not tree-like. In these cases, bifurcating tree representations provide poor visualization of phylogenetic signals in sequence data. In this paper, we use median networks to study DNA sequence substitution patterns in plant nuclear and chloroplast markers. We describe how to prune median networks to obtain so called pruned median networks. These simpler networks may help to provide a useful framework for investigating the phylogenetic complexity of recently diverged taxa with hybrid origins.  相似文献   

5.
ABSTRACT: BACKGROUND: Distance-based phylogenetic reconstruction methods use evolutionary distances between species in order to reconstruct the phylogenetic tree spanning them. There are many different methods for estimating distances from sequence data. These methods assume different substitution models and have different statistical properties. Since the true substitution model is typically unknown, it is important to consider the effect of model misspecification on the performance of a distance estimation method. RESULTS: This paper continues the line of research which attempts to adjust to each given set of input sequences a distance function which maximizes the expected topological accuracy of the reconstructed tree. We focus here on the effect of systematic error caused by assuming an inadequate model, but consider also the stochastic error caused by using short sequences. We introduce a theoretical framework for analyzing both sources of error based on the notion of deviation from additivity, which quantifies the contribution of model misspecification to the estimation error. We demonstrate this framework by studying the behavior of the Jukes-Cantor distance function when applied to data generated according to Kimura's two-parameter model with a transition-transversion bias. We provide both a theoretical derivation for this case, and a detailed simulation study on quartet trees. CONCLUSIONS: We demonstrate both analytically and experimentally that by deliberately assuming an oversimplified evolutionary model, it is possible to increase the topological accuracy of reconstruction. Our theoretical framework provides new insights into the mechanisms that enables statistically inconsistent reconstruction methods to outperform consistent methods.  相似文献   

6.
MOTIVATION: Most phylogenetic methods assume that the sequences of nucleotides or amino acids have evolved under stationary, reversible and homogeneous conditions. When these assumptions are violated by the data, there is an increased probability of errors in the phylogenetic estimates. Methods to examine aligned sequences for these violations are available, but they are rarely used, possibly because they are not widely known or because they are poorly understood. RESULTS: We describe and compare the available tests for symmetry of k-dimensional contingency tables from homologous sequences, and develop two new tests to evaluate different aspects of the evolutionary processes. For any pair of sequences, we consider a partition of the test for symmetry into a test for marginal symmetry and a test for internal symmetry. The proposed tests can be used to identify appropriate models for estimation of evolutionary relationships under a Markovian model. Simulations under more or less complex evolutionary conditions were done to display the performance of the tests. Finally, the tests were applied to an alignment of small-subunit ribosomal RNA sequences of five species of bacteria to outline the evolutionary processes under which they evolved. AVAILABILITY: Programs written in R to do the tests on nucleotides are available from http://www.maths.usyd.edu.au/u/johnr/testsym/  相似文献   

7.
Phylogenetic studies based on DNA sequences typically ignore the potential occurrence of recombination, which may produce different alignment regions with different evolutionary histories. Traditional phylogenetic methods assume that a single history underlies the data. If recombination is present, can we expect the inferred phylogeny to represent any of the underlying evolutionary histories? We examined this question by applying traditional phylogenetic reconstruction methods to simulated recombinant sequence alignments. The effect of recombination on phylogeny estimation depended on the relatedness of the sequences involved in the recombinational event and on the extent of the different regions with different phylogenetic histories. Given the topologies examined here, when the recombinational event was ancient, or when recombination occurred between closely related taxa, one of the two phylogenies underlying the data was generally inferred. In this scenario, the evolutionary history corresponding to the majority of the positions in the alignment was generally recovered. Very different results were obtained when recombination occurred recently among divergent taxa. In this case, when the recombinational breakpoint divided the alignment in two regions of similar length, a phylogeny that was different from any of the true phylogenies underlying the data was inferred.  相似文献   

8.
Application of phylogenetic networks in evolutionary studies   总被引:42,自引:0,他引:42  
The evolutionary history of a set of taxa is usually represented by a phylogenetic tree, and this model has greatly facilitated the discussion and testing of hypotheses. However, it is well known that more complex evolutionary scenarios are poorly described by such models. Further, even when evolution proceeds in a tree-like manner, analysis of the data may not be best served by using methods that enforce a tree structure but rather by a richer visualization of the data to evaluate its properties, at least as an essential first step. Thus, phylogenetic networks should be employed when reticulate events such as hybridization, horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved, and, even in the absence of such events, phylogenetic networks have a useful role to play. This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted. Additionally, the article outlines the beginnings of a comprehensive statistical framework for applying split network methods. We show how split networks can represent confidence sets of trees and introduce a conservative statistical test for whether the conflicting signal in a network is treelike. Finally, this article describes a new program, SplitsTree4, an interactive and comprehensive tool for inferring different types of phylogenetic networks from sequences, distances, and trees.  相似文献   

9.
Debate exists over how to incorporate information from multipartite sequence data in phylogenetic analyses. Strict combined-data approaches argue for concatenation of all partitions and estimation of one evolutionary history, maximizing the explanatory power of the data. Consensus/independence approaches endorse a two-step procedure where partitions are analyzed independently and then a consensus is determined from the multiple results. Mixtures across the model space of a strict combined-data approach and a priori independent parameters are popular methods to integrate these methods. We propose an alternative middle ground by constructing a Bayesian hierarchical phylogenetic model. Our hierarchical framework enables researchers to pool information across data partitions to improve estimate precision in individual partitions while permitting estimation and testing of tendencies in across-partition quantities. Such across-partition quantities include the distribution from which individual topologies relating the sequences within a partition are drawn. We propose standard hierarchical priors on continuous evolutionary parameters across partitions, while the structure on topologies varies depending on the research problem. We illustrate our model with three examples. We first explore the evolutionary history of the guinea pig (Cavia porcellus) using alignments of 13 mitochondrial genes. The hierarchical model returns substantially more precise continuous parameter estimates than an independent parameter approach without losing the salient features of the data. Second, we analyze the frequency of horizontal gene transfer using 50 prokaryotic genes. We assume an unknown species-level topology and allow individual gene topologies to differ from this with a small estimable probability. Simultaneously inferring the species and individual gene topologies returns a transfer frequency of 17%. We also examine HIV sequences longitudinally sampled from HIV+ patients. We ask whether posttreatment development of CCR5 coreceptor virus represents concerted evolution from middisease CXCR4 virus or reemergence of initial infecting CCR5 virus. The hierarchical model pools partitions from multiple unrelated patients by assuming that the topology for each patient is drawn from a multinomial distribution with unknown probabilities. Preliminary results suggest evolution and not reemergence.  相似文献   

10.

Background

Visualising the evolutionary history of a set of sequences is a challenge for molecular phylogenetics. One approach is to use undirected graphs, such as median networks, to visualise phylogenies where reticulate relationships such as recombination or homoplasy are displayed as cycles. Median networks contain binary representations of sequences as nodes, with edges connecting those sequences differing at one character; hypothetical ancestral nodes are invoked to generate a connected network which contains all most parsimonious trees. Quasi-median networks are a generalisation of median networks which are not restricted to binary data, although phylogenetic information contained within the multistate positions can be lost during the preprocessing of data. Where the history of a set of samples contain frequent homoplasies or recombination events quasi-median networks will have a complex topology. Graph reduction or pruning methods have been used to reduce network complexity but some of these methods are inapplicable to datasets in which recombination has occurred and others are procedurally complex and/or result in disconnected networks.

Results

We address the problems inherent in construction and reduction of quasi-median networks. We describe a novel method of generating quasi-median networks that uses all characters, both binary and multistate, without imposing an arbitrary ordering of the multistate partitions. We also describe a pruning mechanism which maintains at least one shortest path between observed sequences, displaying the underlying relations between all pairs of sequences while maintaining a connected graph.

Conclusion

Application of this approach to 5S rDNA sequence data from sea beet produced a pruned network within which genetic isolation between populations by distance was evident, demonstrating the value of this approach for exploration of evolutionary relationships.  相似文献   

11.
Methods to interpret personal genome sequences are increasingly required. Here, we report a novel framework (EvoTol) to identify disease-causing genes using patient sequence data from within protein coding-regions. EvoTol quantifies a gene''s intolerance to mutation using evolutionary conservation of protein sequences and can incorporate tissue-specific gene expression data. We apply this framework to the analysis of whole-exome sequence data in epilepsy and congenital heart disease, and demonstrate EvoTol''s ability to identify known disease-causing genes is unmatched by competing methods. Application of EvoTol to the human interactome revealed networks enriched for genes intolerant to protein sequence variation, informing novel polygenic contributions to human disease.  相似文献   

12.
Model-based phylogenetic reconstruction methods traditionally assume homogeneity of nucleotide frequencies among sequence sites and lineages. Yet, heterogeneity in base composition is a characteristic shared by most biological sequences. Compositional variation in time, reflected in the compositional biases among contemporary sequences, has already been extensively studied, and its detrimental effects on phylogenetic estimates are known. However, fewer studies have focused on the effects of spatial compositional heterogeneity within genes. We show here that different sites in an alignment do not always share a unique compositional pattern, and we provide examples where nucleotide frequency trends are correlated with the site-specific rate of evolution in RNA genes. Spatial compositional heterogeneity is shown to affect the estimation of evolutionary parameters. With standard phylogenetic methods, estimates of equilibrium frequencies are found to be biased towards the composition observed at fast-evolving sites. Conversely, the ancestral composition estimates of some time-heterogeneous but spatially homogeneous methods are found to be biased towards frequencies observed at invariant and slow-evolving sites. The latter finding challenges the result of a previous study arguing against a hyperthermophilic last universal ancestor from the low apparent G + C content of its rRNA sequences. We propose a new model to account for compositional variation across sites. A Gaussian process prior is used to allow for a smooth change in composition with evolutionary rate. The model has been implemented in the phylogenetic inference software PHASE, and Bayesian methods can be used to obtain the model parameters. The results suggest that this model can accurately capture the observed trends in present-day RNA sequences.  相似文献   

13.
SUMMARY: Serial NetEvolve is a flexible simulation program that generates DNA sequences evolved along a tree or recombinant network. It offers a user-friendly Windows graphical interface and a Windows or Linux simulator with a diverse selection of parameters to control the evolutionary model. Serial NetEvolve is a modification of the Treevolve program with the following additional features: simulation of serially-sampled data, the choice of either a clock-like or a variable rate model of sequence evolution, sampling from the internal nodes and the output of the randomly generated tree or network in our newly proposed NeTwick format. AVAILABILITY: From website http://biorg.cis.fiu.edu/SNE Contacts: giri@cis.fiu.edu SUPPLEMENTARY INFORMATION: Manual and examples available from http://biorg.cis.fiu.edu/SNE.  相似文献   

14.
A graphical method for detecting recombination in phylogenetic data sets   总被引:9,自引:3,他引:6  
Current phylogenetic tree reconstruction methods assume that there is a single underlying tree topology for all sites along the sequence. The presence of mosaic sequences due to recombination violates this assumption and will cause phylogenetic methods to give misleading results due to the imposition of a single tree topology on all sites. The detection of mosaic sequences caused by recombination is therefore an important first step in phylogenetic analysis. A graphical method for the detection of recombination, based on the least squares method of phylogenetic estimation, is presented here. This method locates putative recombination breakpoints by moving a window along the sequence. The performance of the method is assessed by simulation and by its application to a real data set.   相似文献   

15.

Background  

Commonly used phylogenetic models assume a homogeneous evolutionary process throughout the tree. It is known that these homogeneous models are often too simplistic, and that with time some properties of the evolutionary process can change (due to selection or drift). In particular, as constraints on sequences evolve, the proportion of variable sites can vary between lineages. This affects the ability of phylogenetic methods to correctly estimate phylogenetic trees, especially for long timescales. To date there is no phylogenetic model that allows for change in the proportion of variable sites, and the degree to which this affects phylogenetic reconstruction is unknown.  相似文献   

16.
Most methods for inferring phylogenies from sequence data assume that patterns of substitution have been stationary over time. Changes in evolutionary constraint can result in nonstationary substitution patterns that are phylogenetically misleading unless modeled appropriately. Here we present a multiple-alignment-based method to identify regions that are likely to contain misleading phylogenetic signals due to changes in evolutionary constraints. The method uses a moving window approach to identify regions with a statistically significant deviation from stationarity in the physicochemical properties of amino acids among taxa. The protocol has been implemented in the software package DRUIDS (Detecting Regions of Unexpected Internal Deviation from Stationarity), available from the first author upon request.  相似文献   

17.
Abstract

Molecular sequence data have become prominent tools for phylogenetic relationship inference, particularly useful in the analysis of highly diverse taxonomic orders. Ribosomal RNA sequences provide markers that can be used in the study of phylogeny, because their function and structure have been conserved to a large extent throughout the evolutionary history of organisms. These sequences are inferred from cloned or enzymatically amplified gene sequences, or determined by direct RNA sequencing. The first step of the phylogenetic interpretation of nucleic acid sequence variations implies proper alignment of corresponding sequences from various organisms. Best alignment based on similarity criteria is greatly reinforced, in the case of ribosomal RNAs, by secondary structure homologies. Distance matrix methods to infer evolutionary trees are based on the assumption that the phylogenetic distance between each pair of organisms is proportional to the number of nucleotide substitution events. Computed tree inference methods usually take into consideration the possibility of unequal mutation rates among lineages. Divergence times can be estimated on the tree, provided that at least one lineage has been dated by fossil records. We have utilized this approach based on ribosomal RNA sequence comparison to investigate the phylogenetic relationship between dinoflagellated and other eukaryote protists, and to refine controverse phylogenies of the class Dinophycae.  相似文献   

18.
SUMMARY: NetGen is an event-driven simulator that creates phylogenetic networks by extending the birth-death model to include diploid hybridizations. DNA sequences are evolved in conjunction with the topology, enabling hybridization decisions to be based on contemporary evolutionary distances. NetGen supports variable rate lineages, root sequence specification, outgroup generation and many other options. This note describes the NetGen application and proposes an extension of the Newick format to accommodate phylogenetic networks. AVAILABILITY: NetGen is written in C and is available in source form at http://www.phylo.unm.edu/~morin/.  相似文献   

19.
Most phylogenetic methods assume that the sequences evolved under homogeneous, stationary and reversible conditions. Compositional heterogeneity in data intended for studies of phylogeny suggests that the data did not evolve under these conditions. SeqVis, a Java application for analysis of nucleotide content, reads sequence alignments in several formats and plots the nucleotide content in a tetrahedron. Once plotted, outliers can be identified, thus allowing for decisions on the applicability of the data for phylogenetic analysis. AVAILABILITY: http://www.bio.usyd.edu.au/jermiin/programs.htm.  相似文献   

20.
Ren F  Tanaka H  Yang Z 《Gene》2009,441(1-2):119-125
Supermatrix and supertree methods are two strategies advocated for phylogenetic analysis of sequence data from multiple gene loci, especially when some species are missing at some loci. The supermatrix method concatenates sequences from multiple genes into a data supermatrix for phylogenetic analysis, and ignores differences in evolutionary dynamics among the genes. The supertree method analyzes each gene separately and assembles the subtrees estimated from individual genes into a supertree for all species. Most algorithms suggested for supertree construction lack statistical justifications and ignore uncertainties in the subtrees. Instead of supermatrix or supertree, we advocate the use of likelihood function to combine data from multiple genes while accommodating their differences in the evolutionary process. This combines the strengths of the supermatrix and supertree methods while avoiding their drawbacks. We conduct computer simulation to evaluate the performance of the supermatrix, supertree, and maximum likelihood methods applied to two phylogenetic problems: molecular-clock dating of species divergences and reconstruction of species phylogenies. The results confirm the theoretical superiority of the likelihood method. Supertree or separate analyses of data of multiple genes may be useful in revealing the characteristics of the evolutionary process of multiple gene loci, and the information may be used to formulate realistic models for combined analysis of all genes by likelihood.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号