首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected. In this context, a recombination hotspot can be inferred in phylogenetic analyses if we observe several consecutive breakpoints. We developed a distance measure between unrooted topologies that closely resembles the number of recombinations. By introducing a prior distribution on these recombination distances, a Bayesian hierarchical model was devised to detect phylogenetic inconsistencies occurring due to recombinations. This model relaxes the assumption of known parental sequences, still common in HIV analysis, allowing the entire dataset to be analyzed at once. On simulated datasets with up to 16 taxa, our method correctly detected recombination breakpoints and the number of recombination events for each breakpoint. The procedure is robust to rate and transitionratiotransversion heterogeneities for simulations with and without recombination. This recombination distance is related to recombination hotspots. Applying this procedure to a genomic HIV-1 dataset, we found evidence for hotspots and de novo recombination.  相似文献   

2.
MOTIVATION: We introduce a dual multiple change-point (MCP) model for recombination detection among aligned nucleotide sequences. The dual MCP model is an extension of the model introduced previously by Suchard and co-workers. In the original single MCP model, one change-point process is used to model spatial phylogenetic variation. Here, we show that using two change-point processes, one for spatial variation of tree topologies and the other for spatial variation of substitution process parameters, increases recombination detection accuracy. Statistical analysis is done in a Bayesian framework using reversible jump Markov chain Monte Carlo sampling to approximate the joint posterior distribution of all model parameters. RESULTS: We use primate mitochondrial DNA data with simulated recombination break-points at specific locations to compare the two models. We also analyze two real HIV sequences to identify recombination break-points using the dual MCP model.  相似文献   

3.
SUMMARY: TOPALi is a new Java graphical analysis application that allows the user to identify recombinant sequences within a DNA multiple alignment (either automatically or via manual investigation). TOPALi allows a choice of three statistical methods to predict the positions of breakpoints due to past recombination. The breakpoint predictions are then used to identify putative recombinant sequences and their relationships to other sequences. In addition to its sophisticated interface, TOPALi can import many sequence formats, estimate and display phylogenetic trees and allow interactive analysis and/or automatic HTML report generation. AVAILABILITY: TOPALi is freely available from http://www.bioss.ac.uk/software.html  相似文献   

4.
The evolution of homologous sequences affected by recombination or gene conversion cannot be adequately explained by a single phylogenetic tree. Many tree-based methods for sequence analysis, for example, those used for detecting sites evolving nonneutrally, have been shown to fail if such phylogenetic incongruity is ignored. However, it may be possible to propose several phylogenies that can correctly model the evolution of nonrecombinant fragments. We propose a model-based framework that uses a genetic algorithm to search a multiple-sequence alignment for putative recombination break points, quantifies the level of support for their locations, and identifies sequences or clades involved in putative recombination events. The software implementation can be run quickly and efficiently in a distributed computing environment, and various components of the methods can be chosen for computational expediency or statistical rigor. We evaluate the performance of the new method on simulated alignments and on an array of published benchmark data sets. Finally, we demonstrate that prescreening alignments with our method allows one to analyze recombinant sequences for positive selection.  相似文献   

5.
Bayesian phylogeographic methods simultaneously integrate geographical and evolutionary modelling, and have demonstrated value in assessing spatial spread patterns of measurably evolving organisms. We improve on existing phylogeographic methods by combining information from multiple phylogeographic datasets in a hierarchical setting. Consider N exchangeable datasets or strata consisting of viral sequences and locations, each evolving along its own phylogenetic tree and according to a conditionally independent geographical process. At the hierarchical level, a random graph summarizes the overall dispersion process by informing which migration rates between sampling locations are likely to be relevant in the strata. This approach provides an efficient and improved framework for analysing inherently hierarchical datasets. We first examine the evolutionary history of multiple serotypes of dengue virus in the Americas to showcase our method. Additionally, we explore an application to intrahost HIV evolution across multiple patients.  相似文献   

6.
We propose a novel method for detecting sites of molecular recombination in multiple alignments. Our approach is a compromise between previous extremes of computationally prohibitive but mathematically rigorous methods and imprecise heuristic methods. Using a combined algorithm for estimating tree structure and hidden Markov model parameters, our program detects changes in phylogenetic tree topology over a multiple sequence alignment. We evaluate our method on benchmark datasets from previous studies on two recombinant pathogens, Neisseria and HIV-1, as well as simulated data. We show that we are not only able to detect recombinant regions of vastly different sizes but also the location of breakpoints with great accuracy. We show that our method does well inferring recombination breakpoints while at the same time maintaining practicality for larger datasets. In all cases, we confirm the breakpoint predictions of previous studies, and in many cases we offer novel predictions.  相似文献   

7.
Phylogeneticists have developed several statistical methodsto infer recombination among molecular sequences that are evolutionarilyrelated. Of these methods, Markov change-point models currentlyprovide the most coherent framework. Yet, the Markov assumptionis faulty in that the inferred relatedness of homologous sequencesacross regions divided by recombinant events is not independent,particularly for nonrecombinant sequences as they share thesame history. To correct this limitation, we introduce a novelrandom tips (RT) model. The model springs from the idea thata recombinant sequence inherits its characters from an unknownnumber of ancestral full-length sequences, of which one onlyobserves the incomplete portions. The RT model decomposes recombinantsequences into their ancestral portions and then augments eachportion onto the data set as unique partially observed sequences.This data augmentation generates a random number of sequencesrelated to each other through a single inferable tree with thesame random number of tips. While intuitively pleasing, thissingle tree corrects the independence assumptions plaguing previousmethods while permitting the detection of recombination. Thesingle tree also allows for inference of the relative timesof recombination events and generalizes to incorporate multiplerecombinant sequences. This generalization answers importantquestions with which previous models struggle. For example,we demonstrate that a group of human immunodeficiency type 1recombinant viruses from Argentina, previously thought to havethe same recombinant history, actually consist of 2 groups:one, a clonal expansion of a reference sequence and anotherthat predates the formation of the reference sequence. In anotherexample, we demonstrate that 2 hepatitis B virus recombinantstrains share similar splicing locations, suggesting a commondescent of the 2 viruses. We implement and run both examplesin a software package called StepBrothers, freely availableto interested parties.  相似文献   

8.
MOTIVATION: Recombination plays an important role in the evolution of many pathogens, such as HIV or malaria. Despite substantial prior work, there is still a pressing need for efficient and effective methods of detecting recombination and analyzing recombinant sequences. RESULTS: We introduce Recco, a novel fast method that, given a multiple sequence alignment, scores the cost of obtaining one of the sequences from the others by mutation and recombination. The algorithm comes with an illustrative visualization tool for locating recombination breakpoints. We analyze the sequence alignment with respect to all choices of the parameter alpha weighting recombination cost against mutation cost. The analysis of the resulting cost curve yields additional information as to which sequence might be recombinant. On random genealogies Recco is comparable in its power of detecting recombination with the algorithm Geneconv (Sawyer, 1989). For specific relevant recombination scenarios Recco significantly outperforms Geneconv.  相似文献   

9.
Recombinant HIV-1 genomes contribute significantly to the diversity of variants within the HIV/AIDS pandemic. It is assumed that some of these mosaic genomes may have novel properties that have led to their prevalence, particularly in the case of the circulating recombinant forms (CRFs). In regions of the HIV-1 genome where recombination has a tendency to convey a selective advantage to the virus, we predict that the distribution of breakpoints--the identifiable boundaries that delimit the mosaic structure--will deviate from the underlying null distribution. To test this hypothesis, we generate a probabilistic model of HIV-1 copy-choice recombination and compare the predicted breakpoint distribution to the distribution from the HIV/AIDS pandemic. Across much of the HIV-1 genome, we find that the observed frequencies of inter-subtype recombination are predicted accurately by our model. This observation strongly indicates that in these regions a probabilistic model, dependent on local sequence identity, is sufficient to explain breakpoint locations. In regions where there is a significant over- (either side of the env gene) or under- (short regions within gag, pol, and most of env) representation of breakpoints, we infer natural selection to be influencing the recombination pattern. The paucity of recombination breakpoints within most of the envelope gene indicates that recombinants generated in this region are less likely to be successful. The breakpoints at a higher frequency than predicted by our model are approximately at either side of env, indicating increased selection for these recombinants as a consequence of this region, or at least part of it, having a tendency to be recombined as an entire unit. Our findings thus provide the first clear indication of the existence of a specific portion of the genome that deviates from a probabilistic null model for recombination. This suggests that, despite the wide diversity of recombinant forms seen in the viral population, only a minority of recombination events appear to be of significance to the evolution of HIV-1.  相似文献   

10.
11.

Background  

Recombination has a profound impact on the evolution of viruses, but characterizing recombination patterns in molecular sequences remains a challenging endeavor. Despite its importance in molecular evolutionary studies, identifying the sequences that exhibit such patterns has received comparatively less attention in the recombination detection framework. Here, we extend a quartet-mapping based recombination detection method to enable identification of recombinant sequences without prior specifications of either query and reference sequences. Through simulations we evaluate different recombinant identification statistics and significance tests. We compare the quartet approach with triplet-based methods that employ additional heuristic tests to identify parental and recombinant sequences.  相似文献   

12.
P Andolfatto  J D Wall  M Kreitman 《Genetics》1999,153(3):1297-1311
The existence of temporally stable frequency clines for In(2L)t in natural populations of Drosophila melanogaster suggests a role for selection in the maintenance of this polymorphism. We have collected nucleotide polymorphism data from the proximal breakpoint junction regions of In(2L)t to infer its evolutionary history. The finding of a novel LINE-like element near the In(2L)t breakpoint junction in sampled inverted chromosomes supports a transposable element-mediated origin for this inversion. An analysis of nucleotide variation in a Costa Rican population sample of standard and inverted chromosomes indicates a unique and relatively recent origin for In(2L)t. Additional In(2L)t alleles from three geographically diverse populations reveal no detectable geographic differentiation. Low levels of In(2L)t nucleotide polymorphism suggest a recent increase in the inversion's frequency in tropical populations. An unusual feature of our sample of standard alleles is a marked heterogeneity in levels of linkage disequilibrium among polymorphic sites across the breakpoint region. We introduce a test of neutral equilibrium haplotype structure that corrects both for multiple tests and for an arbitrarily chosen window size. It reveals that an approximately 1.4-kb region immediately spanning the breakpoint has fewer haplotypes than expected under the neutral model, given the expected level of recombination in this genomic region. Certain features of our data suggest that the unusual pattern in standard chromosomes is the product of selection rather than demography.  相似文献   

13.
Detectability of individual animals is highly variable and nearly always < 1; imperfect detection must be accounted for to reliably estimate population sizes and trends. Hierarchical models can simultaneously estimate abundance and effective detection probability, but there are several different mechanisms that cause variation in detectability. Neglecting temporary emigration can lead to biased population estimates because availability and conditional detection probability are confounded. In this study, we extend previous hierarchical binomial mixture models to account for multiple sources of variation in detectability. The state process of the hierarchical model describes ecological mechanisms that generate spatial and temporal patterns in abundance, while the observation model accounts for the imperfect nature of counting individuals due to temporary emigration and false absences. We illustrate our model’s potential advantages, including the allowance of temporary emigration between sampling periods, with a case study of southern red-backed salamanders Plethodon serratus. We fit our model and a standard binomial mixture model to counts of terrestrial salamanders surveyed at 40 sites during 3–5 surveys each spring and fall 2010–2012. Our models generated similar parameter estimates to standard binomial mixture models. Aspect was the best predictor of salamander abundance in our case study; abundance increased as aspect became more northeasterly. Increased time-since-rainfall strongly decreased salamander surface activity (i.e. availability for sampling), while higher amounts of woody cover objects and rocks increased conditional detection probability (i.e. probability of capture, given an animal is exposed to sampling). By explicitly accounting for both components of detectability, we increased congruence between our statistical modeling and our ecological understanding of the system. We stress the importance of choosing survey locations and protocols that maximize species availability and conditional detection probability to increase population parameter estimate reliability.  相似文献   

14.
Haplotype inference from phase-ambiguous multilocus genotype data is an important task for both disease-gene mapping and studies of human evolution. We report a novel haplotype-inference method based on a coalescence-guided hierarchical Bayes model. In this model, a hierarchical structure is imposed on the prior haplotype frequency distributions to capture the similarities among modern-day haplotypes attributable to their common ancestry. As a consequence, the model both allows distinct haplotypes to have different a priori probabilities according to the inferred hierarchical ancestral structure and results in a proper joint posterior distribution for all the parameters of interest. A Markov chain-Monte Carlo scheme is designed to draw from this posterior distribution. By using coalescence-based simulation and empirically generated data sets (Whitehead Institute's inflammatory bowel disease data sets and HapMap data sets), we demonstrate the merits of the new method in comparison with HAPLOTYPER and PHASE, with or without the presence of recombination hotspots and missing genotypes.  相似文献   

15.
Maps depicting cancer incidence rates have become useful tools in public health research, giving valuable information about the spatial variation in rates of disease. Typically, these maps are generated using count data aggregated over areas such as counties or census blocks. However, with the proliferation of geographic information systems and related databases, it is becoming easier to obtain exact spatial locations for the cancer cases and suitable control subjects. The use of such point data allows us to adjust for individual-level covariates, such as age and smoking status, when estimating the spatial variation in disease risk. Unfortunately, such covariate information is often subject to missingness. We propose a method for mapping cancer risk when covariates are not completely observed. We model these data using a logistic generalized additive model. Estimates of the linear and non-linear effects are obtained using a mixed effects model representation. We develop an EM algorithm to account for missing data and the random effects. Since the expectation step involves an intractable integral, we estimate the E-step with a Laplace approximation. This framework provides a general method for handling missing covariate values when fitting generalized additive models. We illustrate our method through an analysis of cancer incidence data from Cape Cod, Massachusetts. These analyses demonstrate that standard complete-case methods can yield biased estimates of the spatial variation of cancer risk.  相似文献   

16.
We have studied the recombination of plasmids bearing bom and cer sites. The bom ( basis of mobilization) site is required for conjugative transfer, while the cer ( Col E1 resolution) site is involved in the resolution of plasmid multimers, which increases plasmid stability. We constructed a pair of parent plasmids in such a way as to allow us select clones containing recombinant plasmids directly. Clone selection was based on the McrA sensitivity of recipient host DNA modified by M. Ecl18kI, which is encoded by one of the parent plasmids. The recombinant plasmid contains segments originating from both parental DNAs, which are bounded by bom and cer sites. Its structure is in accordance with our previously proposed model for recombination mediated by bom and cer sequences. The frequency of recombinant plasmid formation coincided with the frequency of recombination at the bom site. We also show that bom-mediated recombination in trans, unlike in cis, is independent of other genetic determinants on the conjugative plasmids.  相似文献   

17.
Debate exists over how to incorporate information from multipartite sequence data in phylogenetic analyses. Strict combined-data approaches argue for concatenation of all partitions and estimation of one evolutionary history, maximizing the explanatory power of the data. Consensus/independence approaches endorse a two-step procedure where partitions are analyzed independently and then a consensus is determined from the multiple results. Mixtures across the model space of a strict combined-data approach and a priori independent parameters are popular methods to integrate these methods. We propose an alternative middle ground by constructing a Bayesian hierarchical phylogenetic model. Our hierarchical framework enables researchers to pool information across data partitions to improve estimate precision in individual partitions while permitting estimation and testing of tendencies in across-partition quantities. Such across-partition quantities include the distribution from which individual topologies relating the sequences within a partition are drawn. We propose standard hierarchical priors on continuous evolutionary parameters across partitions, while the structure on topologies varies depending on the research problem. We illustrate our model with three examples. We first explore the evolutionary history of the guinea pig (Cavia porcellus) using alignments of 13 mitochondrial genes. The hierarchical model returns substantially more precise continuous parameter estimates than an independent parameter approach without losing the salient features of the data. Second, we analyze the frequency of horizontal gene transfer using 50 prokaryotic genes. We assume an unknown species-level topology and allow individual gene topologies to differ from this with a small estimable probability. Simultaneously inferring the species and individual gene topologies returns a transfer frequency of 17%. We also examine HIV sequences longitudinally sampled from HIV+ patients. We ask whether posttreatment development of CCR5 coreceptor virus represents concerted evolution from middisease CXCR4 virus or reemergence of initial infecting CCR5 virus. The hierarchical model pools partitions from multiple unrelated patients by assuming that the topology for each patient is drawn from a multinomial distribution with unknown probabilities. Preliminary results suggest evolution and not reemergence.  相似文献   

18.
Since recombination leads to the generation of mosaic genomes that violate the assumption of traditional phylogenetic methods that sequence evolution can be accurately described by a single tree, results and conclusions based on phylogenetic analysis of data sets including recombinant sequences can be severely misleading. Many methods are able to adequately detect recombination between diverse sequences, for example between different HIV-1 subtypes. More problematic is the identification of recombinants among closely related sequences such as a viral population within a host. We describe a simple algorithmic procedure that enables detection of intra-host recombinants based on split-decomposition networks and a robust statistical test for recombination. By applying this algorithm to several published HIV-1 data sets we conclude that intra-host recombination was significantly underestimated in previous studies and that up to one-third of the env sequences longitudinally sampled from a given subject can be of recombinant origin. The results show that our procedure can be a valuable exploratory tool for detection of recombinant sequences before phylogenetic analysis, and also suggest that HIV-1 recombination in vivo is far more frequent and significant than previously thought.  相似文献   

19.
Gene conversion is a recombinatorial mechanism which transfers genetic information from a donor into a recipient gene. A case of gene conversion between immunoglobulin VH region genes was analysed and palindromic sequences were found to be located near to the left recombinatorial breakpoint, which also is flanked by a direct repeat sequence. We performed a computer search for palindromes and direct repeats in the published sequences of eucaryotic genes which had been involved in gene conversion. In these sequences, the palindrome with the best or second best quality is located near to a breakpoint of recombination. A correlation of recombination breakpoints with direct repeats was not observed. This suggests that gene conversion is promoted by palindromic sequences.  相似文献   

20.
Since its initial identification in St. Petersburg, Russia, the recombinant hepatitis C virus (HCV) 2k/1b has been isolated from several countries throughout Eurasia. The 2k/1b strain is the only recombinant HCV to have spread widely, raising questions about the epidemiological background in which it first appeared. In order to further understand the circumstances by which HCV recombinants might be formed and spread, we estimated the date of the recombination event that generated the 2k/1b strain using a Bayesian phylogenetic approach. Our study incorporates newly isolated 2k/1b strains from Amsterdam, The Netherlands, and has employed a hierarchical Bayesian framework to combine information from different genomic regions. We estimate that 2k/1b originated sometime between 1923 and 1956, substantially before the first detection of the strain in 1999. The timescale and the geographic spread of 2k/1b suggest that it originated in the former Soviet Union at about the time that the world's first centralized national blood transfusion and storage service was being established. We also reconstructed the epidemic history of 2k/1b using coalescent theory-based methods, matching patterns previously reported for other epidemic HCV subtypes. This study demonstrates the practicality of jointly estimating dates of recombination from flanking regions of the breakpoint and further illustrates that rare genetic-exchange events can be particularly informative about the underlying epidemiological processes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号