首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Wiuf C 《Genetics》2004,166(1):537-545
In this study compatibility with a tree for unphased genotype data is discussed. If the data are compatible with a tree, the data are consistent with an assumption of no recombination in its evolutionary history. Further, it is said that there is a solution to the perfect phylogeny problem; i.e., for each individual a pair of haplotypes can be defined and the set of all haplotypes can be explained without invoking recombination. A new algorithm to decide whether or not a sample is compatible with a tree is derived. The new algorithm relies on an equivalence relation between sites that mutually determine the phase of each other. (The previous algorithm was based on advanced graph theoretical tools.) The equivalence relation is used to derive the number of solutions to the perfect phylogeny problem. Further, a series of statistics, R ( j ) ( M ), j >or= 2, are defined. These can be used to detect recombination events in the sample's history and to divide the sample into regions that are compatible with a tree. The new statistics are applied to real data from human genes. The results from this application are discussed with reference to recent suggestions that recombination in the human genome is highly heterogeneous.  相似文献   

2.
Wang  Zhiwei  Liu  Kevin J. 《BMC genomics》2016,17(10):785-174

Background

The most widely used state-of-the-art methods for reconstructing species phylogenies from genomic sequence data assume that sampled loci are identically and independently distributed. In principle, free recombination between loci and a lack of intra-locus recombination are necessary to satisfy this assumption. Few studies have quantified the practical impact of recombination on species tree inference methods, and even fewer have used genomic sequence data for this purpose. One prominent exception is the 2012 study of Lanier and Knowles. A main finding from the study was that species tree inference methods are relatively robust to intra-locus recombination, assuming free recombination between loci. The latter assumption means that the open question regarding the impact of recombination on species tree analysis is not fully resolved.

Results

The goal of this study is to further investigate this open question. Using simulations based upon the multi-species coalescent-with-recombination model as well as empirical datasets, we compared common pipeline-based techniques for inferring species phylogenies. The simulation conditions included a range of dataset sizes and several choices for recombination rate which was either uniform across loci or incorporated recombination hotspots. We found that pipelines which explicitly utilize inferred recombination breakpoints to delineate recombination-free intervals result in greater accuracy compared to widely used alternatives that preprocess sequences based upon linkage disequilibrium decay. Furthermore, the use of a relatively simple approach for recombination breakpoint inference does not degrade the accuracy of downstream species tree inference compared to more accurate alternatives.

Conclusions

Our findings clarify the impact of recombination upon current phylogenomic pipelines for species tree inference. Pipeline-based approaches which utilize inferred recombination breakpoints to densely sample loci across genomic sequences can tolerate intra-locus recombination and violations of the assumption of free recombination between loci.
  相似文献   

3.
Since recombination leads to the generation of mosaic genomes that violate the assumption of traditional phylogenetic methods that sequence evolution can be accurately described by a single tree, results and conclusions based on phylogenetic analysis of data sets including recombinant sequences can be severely misleading. Many methods are able to adequately detect recombination between diverse sequences, for example between different HIV-1 subtypes. More problematic is the identification of recombinants among closely related sequences such as a viral population within a host. We describe a simple algorithmic procedure that enables detection of intra-host recombinants based on split-decomposition networks and a robust statistical test for recombination. By applying this algorithm to several published HIV-1 data sets we conclude that intra-host recombination was significantly underestimated in previous studies and that up to one-third of the env sequences longitudinally sampled from a given subject can be of recombinant origin. The results show that our procedure can be a valuable exploratory tool for detection of recombinant sequences before phylogenetic analysis, and also suggest that HIV-1 recombination in vivo is far more frequent and significant than previously thought.  相似文献   

4.
Conventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected.  相似文献   

5.
A graphical method for detecting recombination in phylogenetic data sets   总被引:9,自引:3,他引:6  
Current phylogenetic tree reconstruction methods assume that there is a single underlying tree topology for all sites along the sequence. The presence of mosaic sequences due to recombination violates this assumption and will cause phylogenetic methods to give misleading results due to the imposition of a single tree topology on all sites. The detection of mosaic sequences caused by recombination is therefore an important first step in phylogenetic analysis. A graphical method for the detection of recombination, based on the least squares method of phylogenetic estimation, is presented here. This method locates putative recombination breakpoints by moving a window along the sequence. The performance of the method is assessed by simulation and by its application to a real data set.   相似文献   

6.
Most phylogenetic tree estimation methods assume that there is a single set of hierarchical relationships among sequences in a data set for all sites along an alignment. Mosaic sequences produced by past recombination events will violate this assumption and may lead to misleading results from a phylogenetic analysis due to the imposition of a single tree along the entire alignment. Therefore, the detection of past recombination is an important first step in an analysis. A Bayesian model for the changes in topology caused by recombination events is described here. This model relaxes the assumption of one topology for all sites in an alignment and uses the theory of Hidden Markov models to facilitate calculations, the hidden states being the underlying topologies at each site in the data set. Changes in topology along the multiple sequence alignment are estimated by means of the maximum a posteriori (MAP) estimate. The performance of the MAP estimate is assessed by application of the model to data sets of four sequences, both simulated and real.  相似文献   

7.
MOTIVATION: A promising sliding-window method for the detection of interspecific recombination in DNA sequence alignments is based on the monitoring of changes in the posterior distribution of tree topologies with a probabilistic divergence measure. However, as the number of taxa in the alignment increases or the sliding-window size decreases, the posterior distribution becomes increasingly diffuse. This diffusion blurs the probabilistic divergence signal and adversely affects the detection accuracy. The present study investigates how this shortcoming can be redeemed with a pruning method based on post-processing clustering, using the Robinson-Foulds distance as a metric in tree topology space. RESULTS: An application of the proposed scheme to three synthetic and two real-world DNA sequence alignments illustrates the amount of improvement that can be obtained with the pruning method. The study also includes a comparison with two established recombination detection methods: Recpars and the DSS (difference of sum of squares) method. AVAILABILITY: Software, data and further supplementary material are available at the following website: http://www.bioss.sari.ac.uk/~dirk/Supplements/  相似文献   

8.
A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not tree-like. In a seminal paper, Wang et al.(1) studied the problem of constructing a phylogenetic network, allowing recombination between sequences, with the constraint that the resulting cycles must be disjoint. We call such a phylogenetic network a "galled-tree". They gave a polynomial-time algorithm that was intended to determine whether or not a set of sequences could be generated on galled-tree. Unfortunately, the algorithm by Wang et al.(1) is incomplete and does not constitute a necessary test for the existence of a galled-tree for the data. In this paper, we completely solve the problem. Moreover, we prove that if there is a galled-tree, then the one produced by our algorithm minimizes the number of recombinations over all phylogenetic networks for the data, even allowing multiple-crossover recombinations. We also prove that when there is a galled-tree for the data, the galled-tree minimizing the number of recombinations is "essentially unique". We also note two additional results: first, any set of sequences that can be derived on a galled tree can be derived on a true tree (without recombination cycles), where at most one back mutation per site is allowed; second, the site compatibility problem (which is NP-hard in general) can be solved in polynomial time for any set of sequences that can be derived on a galled tree. Perhaps more important than the specific results about galled-trees, we introduce an approach that can be used to study recombination in general phylogenetic networks. This paper greatly extends the conference version that appears in an earlier work.(8) PowerPoint slides of the conference talk can be found at our website.(7).  相似文献   

9.
Phylogenetic mixtures model the inhomogeneous molecular evolution commonly observed in data. The performance of phylogenetic reconstruction methods where the underlying data are generated by a mixture model has stimulated considerable recent debate. Much of the controversy stems from simulations of mixture model data on a given tree topology for which reconstruction algorithms output a tree of a different topology; these findings were held up to show the shortcomings of particular tree reconstruction methods. In so doing, the underlying assumption was that mixture model data on one topology can be distinguished from data evolved on an unmixed tree of another topology given enough data and the "correct" method. Here we show that this assumption can be false. For biologists, our results imply that, for example, the combined data from two genes whose phylogenetic trees differ only in terms of branch lengths can perfectly fit a tree of a different topology.  相似文献   

10.
An accurate estimate of the extent of recombination is important whenever phylogenetic methods are applied to potentially recombining nucleotide sequences. Here, data sets from viruses, bacteria, and mitochondria were examined for deviations from clonality using a new approach for detecting and measuring recombination. The apparent rate heterogeneity (ARH) among sites in a sequence alignment can be inflated as an artifact of recombination. However, the composition of polymorphic sites will differ in a data set with recombination-generated ARH versus a clonal data set that exhibits the equivalent degree of rate heterogeneity. This is because recombinant data sets, encompassing regions of conflicting phylogenetic history, tend to yield "starlike" trees that are superficially similar to those inferred from clonal data sets with weak phylogenetic signal throughout. Specifically, a recombinant data set will be unexpectedly rich in conflicting phylogenetic information compared with clonally generated data sets supporting the same tree shape. Its value of q-defined as the proportion of two-state parsimony-informative sites to all polymorphic sites-will be greater than that expected for nonrecombinant data. The method proposed here, the informative-sites test, compares the value of q against a null distribution of values found using Monte Carlo-simulated data evolved under the null hypothesis of clonality. A significant excess of q indicates that the assumption of clonality is not valid and hence that the ARH in the data is at least partly an artifact of recombination. Investigations of the procedure using simulated sequences indicated that it can successfully detect and measure recombination and that it is unlikely to produce "false positives." Simulations also showed that for recombinant data, na?ve use of maximum-likelihood models incorporating rate heterogeneity can lead to overestimation of the time to the most recent common ancestor. Application of the test to real data revealed for the first time that populations of viruses, like those of bacteria, can be brought close to complete linkage equilibrium by pervasive recombination. On the other hand, the test did not reject the hypothesis of clonality when applied to a data set from the coding region of human mitochondrial DNA, despite its high level of ARH and homoplasy.  相似文献   

11.
We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. We develop two algorithms for constructing optimal near-perfect phylogenies and provide empirical evidence of their performance. The first simple algorithm is fixed parameter tractable when the number of additional mutations and the number of characters that share four gametes with some other character are constants. The second, more involved algorithm for the problem is fixed parameter tractable when only the number of additional mutations is fixed. We have implemented both algorithms and shown them to be extremely efficient in practice on biologically significant data sets. This work proves the BNPP problem fixed parameter tractable and provides the first practical phylogenetic tree reconstruction algorithms that find guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity.  相似文献   

12.
By viewing the ancestral recombination graph as defining a sequence of trees, we show how possible evolutionary histories consistent with given data can be constructed using the minimum number of recombination events. In contrast to previously known methods, which yield only estimated lower bounds, our method of detecting recombination always gives the minimum number of recombination events if the right kind of rooted trees are used in our algorithm. A new lower bound can be defined if rooted trees with fewer constraints are used. As well as studying how often it actually is equal to the minimum, we test how this new lower bound performs in comparison to some other lower bounds. Our study indicates that the new lower bound is an improvement on earlier bounds. Also, using simulated data, we investigate how well our method can recover the actual site-specific evolutionary relationships. In the presence of recombination, using a single tree to describe the evolution of the entire locus clearly leads to lower average recovery percentages than does our method. Our study shows that recovering the actual local tree topologies can be done more accurately than estimating the actual number of recombination events.  相似文献   

13.
Modes of speciation and the neutral theory of biodiversity   总被引:5,自引:0,他引:5  
Hubbell's neutral theory of biodiversity has generated much debate over the need for niches to explain biodiversity patterns. Discussion of the theory has focused on its neutrality assumption, i.e. the functional equivalence of species in competition and dispersal. Almost no attention has been paid to another critical aspect of the theory, the assumptions on the nature of the speciation process. In the standard version of the neutral theory each individual has a fixed probability to speciate. Hence, the speciation rate of a species is directly proportional to its abundance in the metacommunity. We argue that this assumption is not realistic for most speciation modes because speciation is an emergent property of complex processes at larger spatial and temporal scales and, consequently, speciation rate can either increase or decrease with abundance. Accordingly, the assumption that speciation rate is independent of abundance (each species has a fixed probability to speciate) is a more natural starting point in a neutral theory of biodiversity. Here we present a neutral model based on this assumption and we confront this new model to 20 large data sets of tree communities, expecting the new model to fit the data better than Hubbell's original model. We find, however, that the data sets are much better fitted by Hubbell's original model. This implies that species abundance data can discriminate between different modes of speciation, or, stated otherwise, that the mode of speciation has a large impact on the species abundance distribution. Our model analysis points out new ways to study how biodiversity patterns are shaped by the interplay between evolutionary processes (speciation, extinction) and ecological processes (competition, dispersal).  相似文献   

14.
Large amount of population-scale genetic variation data are being collected in populations. One potentially important biological problem is to infer the population genealogical history from these genetic variation data. Partly due to recombination, genealogical history of a set of DNA sequences in a population usually cannot be represented by a single tree. Instead, genealogy is better represented by a genealogical network, which is a compact representation of a set of correlated local genealogical trees, each for a short region of genome and possibly with different topology. Inference of genealogical history for a set of DNA sequences under recombination has many potential applications, including association mapping of complex diseases. In this paper, we present two new methods for reconstructing local tree topologies with the presence of recombination, which extend and improve the previous work in. We first show that the "tree scan" method can be converted to a probabilistic inference method based on a hidden Markov model. We then focus on developing a novel local tree inference method called RENT that is both accurate and scalable to larger data. Through simulation, we demonstrate the usefulness of our methods by showing that the hidden-Markov-model-based method is comparable with the original method in terms of accuracy. We also show that RENT is competitive with other methods in terms of inference accuracy, and its inference error rate is often lower and can handle large data.  相似文献   

15.
Consequences of recombination on traditional phylogenetic analysis   总被引:38,自引:0,他引:38  
Schierup MH  Hein J 《Genetics》2000,156(2):879-891
We investigate the shape of a phylogenetic tree reconstructed from sequences evolving under the coalescent with recombination. The motivation is that evolutionary inferences are often made from phylogenetic trees reconstructed from population data even though recombination may well occur (mtDNA or viral sequences) or does occur (nuclear sequences). We investigate the size and direction of biases when a single tree is reconstructed ignoring recombination. Standard software (PHYLIP) was used to construct the best phylogenetic tree from sequences simulated under the coalescent with recombination. With recombination present, the length of terminal branches and the total branch length are larger, and the time to the most recent common ancestor smaller, than for a tree reconstructed from sequences evolving with no recombination. The effects are pronounced even for small levels of recombination that may not be immediately detectable in a data set. The phylogenies when recombination is present superficially resemble phylogenies for sequences from an exponentially growing population. However, exponential growth has a different effect on statistics such as Tajima's D. Furthermore, ignoring recombination leads to a large overestimation of the substitution rate heterogeneity and the loss of the molecular clock. These results are discussed in relation to viral and mtDNA data sets.  相似文献   

16.
GARD: a genetic algorithm for recombination detection   总被引:6,自引:0,他引:6  
MOTIVATION: Phylogenetic and evolutionary inference can be severely misled if recombination is not accounted for, hence screening for it should be an essential component of nearly every comparative study. The evolution of recombinant sequences can not be properly explained by a single phylogenetic tree, but several phylogenies may be used to correctly model the evolution of non-recombinant fragments. RESULTS: We developed a likelihood-based model selection procedure that uses a genetic algorithm to search multiple sequence alignments for evidence of recombination breakpoints and identify putative recombinant sequences. GARD is an extensible and intuitive method that can be run efficiently in parallel. Extensive simulation studies show that the method nearly always outperforms other available tools, both in terms of power and accuracy and that the use of GARD to screen sequences for recombination ensures good statistical properties for methods aimed at detecting positive selection. AVAILABILITY: Freely available http://www.datamonkey.org/GARD/  相似文献   

17.
The use of map functions in multipoint mapping.   总被引:4,自引:2,他引:2       下载免费PDF全文
The analysis of multipoint data in humans involves detection of linkage, inferences about order, and estimation of map lengths. In order to calculate likelihoods, it is necessary to have predictive formulas for multiple recombination frequencies. In the present study the Markovian assumption of Morton and MacLean is generalized to give predictive formulas for multiple-region recombination using realistic map functions. The best-fitting map functions have been determined by fitting the nine-locus data of Morgan et al. and the seven-locus data of Weinstein on the Drosophila X chromosome. Two map functions fit the data better than other published functions: that of Rao et al. with a map parameter of P = .33 and a new function suggested in the present paper. The close agreement of the estimate of the mapping parameter with a previous estimate inferred from human male meiosis suggests that the map function is robust. A further improvement in the fit to the data can be obtained by the addition of a second parameter to reduce the expected number of multiple recombinants. By comparison with the map functions recommended in the present paper, the assumption of no interference gives a poor fit to the data.  相似文献   

18.
Wiuf C  Posada D 《Genetics》2003,164(1):407-417
Recent experimental findings suggest that the assumption of a homogeneous recombination rate along the human genome is too naive. These findings point to block-structured recombination rates; certain regions (called hotspots) are more prone than other regions to recombination. In this report a coalescent model incorporating hotspot or block-structured recombination is developed and investigated analytically as well as by simulation. Our main results can be summarized as follows: (1) The expected number of recombination events is much lower in a model with pure hotspot recombination than in a model with pure homogeneous recombination, (2) hotspots give rise to large variation in recombination rates along the genome as well as in the number of historical recombination events, and (3) the size of a (nonrecombining) block in the hotspot model is likely to be overestimated grossly when estimated from SNP data. The results are discussed with reference to the current debate about block-structured recombination and, in addition, the results are compared to genome-wide variation in recombination rates. A number of new analytical results about the model are derived.  相似文献   

19.
The coalescent with recombination is a fundamental model to describe the genealogical history of DNA sequence samples from recombining organisms. Considering recombination as a process which acts along genomes and which creates sequence segments with shared ancestry, we study the influence of single recombination events upon tree characteristics of the coalescent. We focus on properties such as tree height and tree balance and quantify analytically the changes in these quantities incurred by recombination in terms of probability distributions. We find that changes in tree topology are often relatively mild under conditions of neutral evolution, while changes in tree height are on average quite large. Our results add to a quantitative understanding of the spatial coalescent and provide the neutral reference to which the impact by other evolutionary scenarios, for instance tree distortion by selective sweeps, can be compared.  相似文献   

20.
Analyses of the increasingly available genomic data continue to reveal the extent of hybridization and its role in the evolutionary diversification of various groups of species. We show, through extensive coalescent-based simulations of multilocus data sets on phylogenetic networks, how divergence times before and after hybridization events can result in incomplete lineage sorting with gene tree incongruence signatures identical to those exhibited by hybridization. Evolutionary analysis of such data under the assumption of a species tree model can miss all hybridization events, whereas analysis under the assumption of a species network model would grossly overestimate hybridization events. These issues necessitate a paradigm shift in evolutionary analysis under these scenarios, from a model that assumes a priori a single source of gene tree incongruence to one that integrates multiple sources in a unifying framework. We propose a framework of coalescence within the branches of a phylogenetic network and show how this framework can be used to detect hybridization despite incomplete lineage sorting. We apply the model to simulated data and show that the signature of hybridization can be revealed as long as the interval between the divergence times of the species involved in hybridization is not too small. We reanalyze a data set of 106 loci from 7 in-group Saccharomyces species for which a species tree with no hybridization has been reported in the literature. Our analysis supports the hypothesis that hybridization occurred during the evolution of this group, explaining a large amount of the incongruence in the data. Our findings show that an integrative approach to gene tree incongruence and its reconciliation is needed. Our framework will help in systematically analyzing genomic data for the occurrence of hybridization and elucidating its evolutionary role.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号