首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Conventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected.  相似文献   

2.
The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCM-boosted distance methods under the Jukes-Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths.  相似文献   

3.
Recombination is a common feature of many positive-strand RNA viruses, playing an important role in virus evolution. However, to date, there is limited understanding of the mechanisms behind the process. Utilising in vitro assays, we have previously shown that the template-switching event of recombination is a random and ubiquitous process that often leads to recombinant viruses with imprecise genomes containing sequence duplications. Subsequently, a process termed resolution, that has yet to be mechanistically studied, removes these duplicated sequences resulting in a virus population of wild type length genomes. Using defined imprecise recombinant viruses together with Oxford Nanopore and Illumina high throughput next generation sequencing technologies we have investigated the process of resolution. We show that genome resolution involves subsequent rounds of template-switching recombination with viral fitness resulting in the survival of a small subset of recombinant genomes. This alters our previously held understanding that recombination and resolution are independent steps of the process, and instead demonstrates that viruses undergo frequent and continuous recombination events over a prolonged period until the fittest viruses, predominantly those with wild type length genomes, dominate the population.  相似文献   

4.
5.
Ancestral maximum likelihood (AML) is a method that simultaneously reconstructs a phylogenetic tree and ancestral sequences from extant data (sequences at the leaves). The tree and ancestral sequences maximize the probability of observing the given data under a Markov model of sequence evolution, in which branch lengths are also optimized but constrained to take the same value on any edge across all sequence sites. AML differs from the more usual form of maximum likelihood (ML) in phylogenetics because ML averages over all possible ancestral sequences. ML has long been know to be statistically consistent - that is, it converges on the correct tree with probability approaching 1 as the sequence length grows. However, the statistical consistency of AML has not been formally determined, despite informal remarks in a literature that dates back 20 years. In this short note we prove a general result that implies that AML is statistically inconsistent. In particular we show that AML can 'shrink' short edges in a tree, resulting in a tree that has no internal resolution as the sequence length grows. Our results apply to any number of taxa.  相似文献   

6.
Most phylogenetic tree estimation methods assume that there is a single set of hierarchical relationships among sequences in a data set for all sites along an alignment. Mosaic sequences produced by past recombination events will violate this assumption and may lead to misleading results from a phylogenetic analysis due to the imposition of a single tree along the entire alignment. Therefore, the detection of past recombination is an important first step in an analysis. A Bayesian model for the changes in topology caused by recombination events is described here. This model relaxes the assumption of one topology for all sites in an alignment and uses the theory of Hidden Markov models to facilitate calculations, the hidden states being the underlying topologies at each site in the data set. Changes in topology along the multiple sequence alignment are estimated by means of the maximum a posteriori (MAP) estimate. The performance of the MAP estimate is assessed by application of the model to data sets of four sequences, both simulated and real.  相似文献   

7.
Due to genetic variation in the ancestor of two populations or two species, the divergence time for DNA sequences from two populations is variable along the genome. Within genomic segments all bases will share the same divergence-because they share a most recent common ancestor-when no recombination event has occurred to split them apart. The size of these segments of constant divergence depends on the recombination rate, but also on the speciation time, the effective population size of the ancestral population, as well as demographic effects and selection. Thus, inference of these parameters may be possible if we can decode the divergence times along a genomic alignment. Here, we present a new hidden Markov model that infers the changing divergence (coalescence) times along the genome alignment using a coalescent framework, in order to estimate the speciation time, the recombination rate, and the ancestral effective population size. The model is efficient enough to allow inference on whole-genome data sets. We first investigate the power and consistency of the model with coalescent simulations and then apply it to the whole-genome sequences of the two orangutan sub-species, Bornean (P. p. pygmaeus) and Sumatran (P. p. abelii) orangutans from the Orangutan Genome Project. We estimate the speciation time between the two sub-species to be thousand years ago and the effective population size of the ancestral orangutan species to be , consistent with recent results based on smaller data sets. We also report a negative correlation between chromosome size and ancestral effective population size, which we interpret as a signature of recombination increasing the efficacy of selection.  相似文献   

8.
The coalescent with recombination is a fundamental model to describe the genealogical history of DNA sequence samples from recombining organisms. Considering recombination as a process which acts along genomes and which creates sequence segments with shared ancestry, we study the influence of single recombination events upon tree characteristics of the coalescent. We focus on properties such as tree height and tree balance and quantify analytically the changes in these quantities incurred by recombination in terms of probability distributions. We find that changes in tree topology are often relatively mild under conditions of neutral evolution, while changes in tree height are on average quite large. Our results add to a quantitative understanding of the spatial coalescent and provide the neutral reference to which the impact by other evolutionary scenarios, for instance tree distortion by selective sweeps, can be compared.  相似文献   

9.
The evolution of homologous sequences affected by recombination or gene conversion cannot be adequately explained by a single phylogenetic tree. Many tree-based methods for sequence analysis, for example, those used for detecting sites evolving nonneutrally, have been shown to fail if such phylogenetic incongruity is ignored. However, it may be possible to propose several phylogenies that can correctly model the evolution of nonrecombinant fragments. We propose a model-based framework that uses a genetic algorithm to search a multiple-sequence alignment for putative recombination break points, quantifies the level of support for their locations, and identifies sequences or clades involved in putative recombination events. The software implementation can be run quickly and efficiently in a distributed computing environment, and various components of the methods can be chosen for computational expediency or statistical rigor. We evaluate the performance of the new method on simulated alignments and on an array of published benchmark data sets. Finally, we demonstrate that prescreening alignments with our method allows one to analyze recombinant sequences for positive selection.  相似文献   

10.
Genomic homologous recombination in planta.   总被引:8,自引:1,他引:7       下载免费PDF全文
S Gal  B Pisan  T Hohn  N Grimsley    B Hohn 《The EMBO journal》1991,10(6):1571-1578
A system for monitoring intrachromosomal homologous recombination in whole plants is described. A multimer of cauliflower mosaic virus (CaMV) sequences, arranged such that CaMV could only be produced by recombination, was integrated into Brassica napus nuclear DNA. This set-up allowed scoring of recombination events by the appearance of viral symptoms. The repeated homologous regions were derived from two different strains of CaMV so that different recombinant viruses (i.e. different recombination events) could be distinguished. In most of the transgenic plants, a single major virus species was detected. About half of the transgenic plants contained viruses of the same type, suggesting a hotspot for recombination. The remainder of the plants contained viruses with cross-over sites distributed throughout the rest of the homologous sequence. Sequence analysis of two recombinant molecules suggest that mismatch repair is linked to the recombination process.  相似文献   

11.
This paper proposes a graphical method for detecting interspecies recombination in multiple alignments of DNA sequences. A fixed-size window is moved along a given DNA sequence alignment. For every position, the marginal posterior probability over tree topologies is determined by means of a Markov chain Monte Carlo simulation. Two probabilistic divergence measures are plotted along the alignment, and are used to identify recombinant regions. The method is compared with established detection methods on a set of synthetic benchmark sequences and two real-world DNA sequence alignments.  相似文献   

12.
Large amount of population-scale genetic variation data are being collected in populations. One potentially important biological problem is to infer the population genealogical history from these genetic variation data. Partly due to recombination, genealogical history of a set of DNA sequences in a population usually cannot be represented by a single tree. Instead, genealogy is better represented by a genealogical network, which is a compact representation of a set of correlated local genealogical trees, each for a short region of genome and possibly with different topology. Inference of genealogical history for a set of DNA sequences under recombination has many potential applications, including association mapping of complex diseases. In this paper, we present two new methods for reconstructing local tree topologies with the presence of recombination, which extend and improve the previous work in. We first show that the "tree scan" method can be converted to a probabilistic inference method based on a hidden Markov model. We then focus on developing a novel local tree inference method called RENT that is both accurate and scalable to larger data. Through simulation, we demonstrate the usefulness of our methods by showing that the hidden-Markov-model-based method is comparable with the original method in terms of accuracy. We also show that RENT is competitive with other methods in terms of inference accuracy, and its inference error rate is often lower and can handle large data.  相似文献   

13.
Summary Using computer programs that analyze the evolutionary history and probability of relationship of protein sequences, we have investigated the gene duplication events that led to the present configuration of immunoglobulin C regions, with particular attention to the origins of the homology regions (domains) of the heavy chains. We conclude that all of the sequenced heavy chains share a common ancestor consisting of four domains and that the two shorter heavy chains, alpha and gamma, have independently lost most of the second domain. These conclusions allow us to align corresponding regions of these sequences for the purpose of deriving evolutionary trees. Three independent internal gene duplications are postulated to explain the observed pattern of relationships among the four domains: first a duplication of the ancestral single domain C region, followed by independent duplications of the resulting first and last domains. In these studies there was no evidence of crossing-over and recombination between ancestral chains of different classes; however, certain types of recombinations would not be detectable from the available sequence data.  相似文献   

14.
The ability of poxviruses to undergo intramolecular recombination within tandemly arranged homologous sequences can be used to generate chimeric genes and proteins. Genes containing regions of nucleotide homology will recombine to yield a single sequence composed of portions of both original genes. A recombinant virus containing two genes with a number of conserved regions will yield a population of recombinant viruses containing a spectrum of hybrid sequences derived by recombination between the original genes. This scheme has been used to generate hybrid human immunodeficiency virus type 1 env genes. Recombinant vaccinia viruses that contain two divergent env genes in tandem array have been constructed. In the absence of selective pressure to maintain both genes, recombination between conserved homologous regions in these genes generated a wide range of progeny, each of which expressed a novel variant polypeptide encoded by the newly created hybrid env gene. Poxvirus-mediated recombination may be applied to map type-specific epitopes, to create novel pharmaceuticals such as hybrid interferons, to study receptor-binding or enzyme substrate specificities, or to mimic the antigenic diversity found in numerous pathogens.  相似文献   

15.
Hao W 《Gene》2011,481(2):57-64
The evolution of influenza viruses is remarkably dynamic. Influenza viruses evolve rapidly in sequence and undergo frequent reassortment of different gene segments. Homologous recombination, although commonly seen as an important component of dynamic genome evolution in many other organisms, is believed to be rare in influenza. In this study, 256 gene segments from 32 influenza A genomes were examined for homologous recombination, three recombinant H1N1 strains were detected and they most likely resulted from one recombination event between two closely rated parental sequences. These findings suggest that homologous recombination in influenza viruses tends to take place between strains sharing high sequence similarity. The three recombinant strains were isolated at different time periods and they form a clade, indicating that recombinant strains could circulate. In addition, the simulation results showed that many recombinant sequences might not be detectable by currently existing recombinant detection programs when the parental sequences are of high sequence similarity. Finally, possible ways were discussed to improve the accuracy of the detection for recombinant sequences in influenza.  相似文献   

16.
Minin VN  Dorman KS  Fang F  Suchard MA 《Genetics》2007,175(4):1773-1785
We present a Bayesian framework for inferring spatial preferences of recombination from multiple putative recombinant nucleotide sequences. Phylogenetic recombination detection has been an active area of research for the last 15 years. However, only recently attempts to summarize information from several instances of recombination have been made. We propose a hierarchical model that allows for simultaneous inference of recombination breakpoint locations and spatial variation in recombination frequency. The dual multiple change-point model for phylogenetic recombination detection resides at the lowest level of our hierarchy under the umbrella of a common prior on breakpoint locations. The hierarchical prior allows for information about spatial preferences of recombination to be shared among individual data sets. To overcome the sparseness of breakpoint data, dictated by the modest number of available recombinant sequences, we a priori impose a biologically relevant correlation structure on recombination location log odds via a Gaussian Markov random field hyperprior. To examine the capabilities of our model to recover spatial variation in recombination frequency, we simulate recombination from a predefined distribution of breakpoint locations. We then proceed with the analysis of 42 human immunodeficiency virus (HIV) intersubtype gag recombinants and identify a putative recombination hotspot.  相似文献   

17.
18.
Replicating poxviruses catalyze high-frequency recombination reactions by a process that is not well understood. Using transfected DNA substrates we show that these viruses probably use a single-strand annealing recombination mechanism. Plasmids carrying overlapping portions of a luciferase gene expression cassette and luciferase assays were first shown to provide an accurate method of assaying recombinant frequencies. We then transfected pairs of DNAs into virus-infected cells and monitored the efficiencies of linear-by-linear, linear-by-circle, and circle-by-circle recombination. These experiments showed that vaccinia virus recombination systems preferentially catalyze linear-by-linear reactions much more efficiently than circle-by-circle reactions and catalyze circle-by-circle reactions more efficiently than linear-by-circle reactions. Reactions involving linear substrates required surprisingly little sequence identity, with only 16-bp overlaps still permitting approximately 4% recombinant production. Masking the homologies by adding unrelated DNA sequences to the ends of linear substrates inhibited recombination in a manner dependent upon the number of added sequences. Circular molecules were also recombined by replicating viruses but at frequencies 15- to 50-fold lower than are linear substrates. These results are consistent with mechanisms in which exonuclease or helicase processing of DNA ends permits the forming of recombinants through annealing of complementary single strands. Our data are not consistent with a model involving strand invasion reactions, because such reactions should favor mixtures of linear and circular substrates. We also noted that many of the reaction features seen in vivo were reproduced in a simple in vitro reaction requiring only purified vaccinia virus DNA polymerase, single-strand DNA binding protein, and pairs of linear substrates. The 3'-to-5' exonuclease activity of poxviral DNA polymerases potentially catalyzes recombination in vivo.  相似文献   

19.

Background  

Recombination has a profound impact on the evolution of viruses, but characterizing recombination patterns in molecular sequences remains a challenging endeavor. Despite its importance in molecular evolutionary studies, identifying the sequences that exhibit such patterns has received comparatively less attention in the recombination detection framework. Here, we extend a quartet-mapping based recombination detection method to enable identification of recombinant sequences without prior specifications of either query and reference sequences. Through simulations we evaluate different recombinant identification statistics and significance tests. We compare the quartet approach with triplet-based methods that employ additional heuristic tests to identify parental and recombinant sequences.  相似文献   

20.
We consider the Wright–Fisher model for a population of $N$ individuals, each identified with a sequence of a finite number of sites, and single-crossover recombination between them. We trace back the ancestry of single individuals from the present population. In the $N \rightarrow \infty $ limit without rescaling of parameters or time, this ancestral process is described by a random tree, whose branching events correspond to the splitting of the sequence due to recombination. With the help of a decomposition of the trees into subtrees, we calculate the probabilities of the topologies of the ancestral trees. At the same time, these probabilities lead to a semi-explicit solution of the deterministic single-crossover equation. The latter is a discrete-time dynamical system that emerges from the Wright–Fisher model via a law of large numbers and has been waiting for a solution for many decades.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号