首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recombination is an important evolutionary mechanism responsible for creating the patterns of haplotype variation observable in human populations. Recently, there has been extensive research on understanding the fine-scale variation in recombination across the human genome using DNA polymorphism data. Historical recombination events leave signature patterns in haplotype data. A nonparametric approach for estimating the number of historical recombination events is to compute the minimum number of recombination events in the history of a set of haplotypes. In this paper, we provide new and improved methods for computing lower bounds on the minimum number of recombination events. These methods are shown to detect a higher number of recombination events for a haplotype dataset from a region in the lipoprotein lipase gene than previous lower bounds. We apply our methods to two datasets for which recombination hotspots have been experimentally determined and demonstrate a high density of detectable recombination events in the regions annotated as recombination hotspots. The programs implementing the methods in this paper are available at www.cs.ucsd.edu/users/vibansal/RecBounds/.  相似文献   

2.
3.
Evolutionary processes such as hybridisation, lateral gene transfer, and recombination are all key factors in shaping the structure of genes and genomes. However, since such processes are not always best represented by trees, there is now considerable interest in using more general networks instead. For example, in recent studies it has been shown that networks can be used to provide lower bounds on the number of recombination events and also for the number of lateral gene transfers that took place in the evolutionary history of a set of molecular sequences. In this paper we describe the theoretical performance of some related bounds that result when merging pairs of trees into networks.  相似文献   

4.
The variation of the recombination rate along chromosomal DNA is one of the important determinants of the patterns of linkage disequilibrium. A number of inferential methods have been developed which estimate the recombination rate and its variation from population genetic data. The majority of these methods are based on modelling the genealogical process underlying a sample of DNA sequences and thus explicitly include a model of the demographic process. Here we propose a different inferential procedure based on a previously introduced framework where recombination is modelled as a point process along a DNA sequence. The approach infers regions containing putative hotspots based on the inferred minimum number of recombination events; it thus depends only indirectly on the underlying population demography. A Poisson point process model with local rates is then used to infer patterns of recombination rate estimation in a fully Bayesian framework. We illustrate this new approach by applying it to several population genetic datasets, including a region with an experimentally confirmed recombination hotspot.  相似文献   

5.
Bounds on the minimum number of recombination events in a sample history   总被引:11,自引:0,他引:11  
Myers SR  Griffiths RC 《Genetics》2003,163(1):375-394
Recombination is an important evolutionary factor in many organisms, including humans, and understanding its effects is an important task facing geneticists. Detecting past recombination events is thus important; this article introduces statistics that give a lower bound on the number of recombination events in the history of a sample, on the basis of the patterns of variation in the sample DNA. Such lower bounds are appropriate, since many recombination events in the history are typically undetectable, so the true number of historical recombinations is unobtainable. The statistics can be calculated quickly by computer and improve upon the earlier bound of Hudson and Kaplan 1985. A method is developed to combine bounds on local regions in the data to produce more powerful improved bounds. The method is flexible to different models of recombination occurrence. The approach gives recombination event bounds between all pairs of sites, to help identify regions with more detectable recombinations, and these bounds can be viewed graphically. Under coalescent simulations, there is a substantial improvement over the earlier method (of up to a factor of 2) in the expected number of recombination events detected by one of the new minima, across a wide range of parameter values. The method is applied to data from a region within the lipoprotein lipase gene and the amount of detected recombination is substantially increased. Further, there is strong clustering of detected recombination events in an area near the center of the region. A program implementing these statistics, which was used for this article, is available from http://www.stats.ox.ac.uk/mathgen/programs.html.  相似文献   

6.
Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected. In this context, a recombination hotspot can be inferred in phylogenetic analyses if we observe several consecutive breakpoints. We developed a distance measure between unrooted topologies that closely resembles the number of recombinations. By introducing a prior distribution on these recombination distances, a Bayesian hierarchical model was devised to detect phylogenetic inconsistencies occurring due to recombinations. This model relaxes the assumption of known parental sequences, still common in HIV analysis, allowing the entire dataset to be analyzed at once. On simulated datasets with up to 16 taxa, our method correctly detected recombination breakpoints and the number of recombination events for each breakpoint. The procedure is robust to rate and transitionratiotransversion heterogeneities for simulations with and without recombination. This recombination distance is related to recombination hotspots. Applying this procedure to a genomic HIV-1 dataset, we found evidence for hotspots and de novo recombination.  相似文献   

7.
Trisomy is the most common genetic abnormality in humans and is the leading cause of mental retardation. Although molecular studies that use a large number of highly polymorphic markers have been undertaken to understand the recombination patterns for chromosome abnormalities, there is a lack of multilocus approaches to incorporating crossover interference in the analysis of human trisomy data. In the present article, we develop two statistical methods that simultaneously use all genetic information in trisomy data. The first approach relies on a general relationship between multilocus trisomy probabilities and multilocus ordered-tetrad probabilities. Under the assumption that no more than one chiasma exists in each marker interval, we describe how to use the expectation-maximization algorithm to examine the probability distribution of the recombination events underlying meioses that lead to trisomy. One limitation of the first approach is that the amount of computation increases exponentially with the number of markers. The second approach models the crossover process as a chi(2) model. We describe how to use hidden Markov models to evaluate multilocus trisomy probabilities. Our methods are applicable when both parents are available or when only the nondisjoining parent is available. For both methods, genetic distances among a set of markers can be estimated and the pattern of overall chiasma distribution can be inspected for differences in recombination between meioses exhibiting trisomy and normal meioses. We illustrate the proposed approaches through their application to a set of trisomy 21 data.  相似文献   

8.
Domains are basic evolutionary units of proteins and most proteins have more than one domain. Advances in domain modeling and collection are making it possible to annotate a large fraction of known protein sequences by a linear ordering of their domains, yielding their architecture. Protein domain architectures link evolutionarily related proteins and underscore their shared functions. Here, we attempt to better understand this association by identifying the evolutionary pathways by which extant architectures may have evolved. We propose a model of evolution in which architectures arise through rearrangements of inferred precursor architectures and acquisition of new domains. These pathways are ranked using a parsimony principle, whereby scenarios requiring the fewest number of independent recombination events, namely fission and fusion operations, are assumed to be more likely. Using a data set of domain architectures present in 159 proteomes that represent all three major branches of the tree of life allows us to estimate the history of over 85% of all architectures in the sequence database. We find that the distribution of rearrangement classes is robust with respect to alternative parsimony rules for inferring the presence of precursor architectures in ancestral species. Analyzing the most parsimonious pathways, we find 87% of architectures to gain complexity over time through simple changes, among which fusion events account for 5.6 times as many architectures as fission. Our results may be used to compute domain architecture similarities, for example, based on the number of historical recombination events separating them. Domain architecture "neighbors" identified in this way may lead to new insights about the evolution of protein function.  相似文献   

9.
Wiuf C 《Genetics》2004,166(1):537-545
In this study compatibility with a tree for unphased genotype data is discussed. If the data are compatible with a tree, the data are consistent with an assumption of no recombination in its evolutionary history. Further, it is said that there is a solution to the perfect phylogeny problem; i.e., for each individual a pair of haplotypes can be defined and the set of all haplotypes can be explained without invoking recombination. A new algorithm to decide whether or not a sample is compatible with a tree is derived. The new algorithm relies on an equivalence relation between sites that mutually determine the phase of each other. (The previous algorithm was based on advanced graph theoretical tools.) The equivalence relation is used to derive the number of solutions to the perfect phylogeny problem. Further, a series of statistics, R ( j ) ( M ), j >or= 2, are defined. These can be used to detect recombination events in the sample's history and to divide the sample into regions that are compatible with a tree. The new statistics are applied to real data from human genes. The results from this application are discussed with reference to recent suggestions that recombination in the human genome is highly heterogeneous.  相似文献   

10.
11.
Given a set D of input sequences, a genealogy for D can be constructed backward in time using such evolutionary events as mutation, coalescent, and recombination. An ancestral configuration (AC) can be regarded as the multiset of all sequences present at a particular point in time in a possible genealogy for D. The complexity of computing the likelihood of observing D depends heavily on the total number of distinct ACs of D and, therefore, it is of interest to estimate that number. For D consisting of binary sequences of finite length, we consider the problem of enumerating exactly all distinct ACs. We assume that the root sequence type is known and that the mutation process is governed by the infinite-sites model. When there is no recombination, we construct a general method of obtaining closed-form formulas for the total number of ACs. The enumeration problem becomes much more complicated when recombination is involved. In that case, we devise a method of enumeration based on counting contingency tables and construct a dynamic programming algorithm for the approach. Last, we describe a method of counting the number of ACs that can appear in genealogies with less than or equal to a given number R of recombinations. Of particular interest is the case in which R is close to the minimum number of recombinations for D.  相似文献   

12.
Perfect phylogenetic networks with recombination.   总被引:1,自引:0,他引:1  
The perfect phylogeny problem is a classical problem in evolutionary tree construction. In this paper, we propose a new model called phylogenetic network with recombination that takes recombination events into account. We show that the problem of finding a perfect phylogenetic network with the minimum number of recombination events is NP-hard; we also present an efficient polynomial time algorithm for an interesting restricted version of the problem.  相似文献   

13.

Background

Tomato-infecting begomoviruses are widely distributed across the world and cause diseases of high economic impact on wide range of agriculturally important crops. Though recombination plays a pivotal role in diversification and evolution of these viruses, it is currently unknown whether there are differences in the number and quality of recombination events amongst different tomato-infecting begomovirus species. To examine this we sought to characterize the recombination events, estimate the frequency of recombination, and map recombination hotspots in tomato-infecting begomoviruses of South and Southeast Asia.

Results

Different methods used for recombination breakpoint analysis provided strong evidence for presence of recombination events in majority of the sequences analyzed. However, there was a clear evidence for absence or low Recombination events in viruses reported from North India. In addition, we provide evidence for non-random distribution of recombination events with the highest frequency of recombination being mapped in the portion of the N-terminal portion of Rep.

Conclusion

The variable recombination observed in these viruses signified that all begomoviruses are not equally prone to recombination. Distribution of recombination hotspots was found to be reliant on the relatedness of the genomic region involved in the exchange. Overall the frequency of phylogenetic violations and number of recombination events decreased with increasing parental sequence diversity. These findings provide valuable new information for understanding the diversity and evolution of tomato-infecting begomoviruses in Asia.  相似文献   

14.
In this paper, we extend the theoretical treatment of the Moran model of genetic drift with recombination and mutation, which was previously introduced by us for the case of two loci, to the case of n loci. Recombination, when considered in the Wright–Fisher model, makes it considerably less tractable. In the works of Griffiths, Hudson and Kaplan and their colleagues important properties were established using the coalescent approach. Other more recent approaches form a body of work to which we would like to contribute. The specific framework used in our paper allows finding close-form relationships, which however are limited to a set of distributions, which jointly characterize allelic states at a number of loci at the same or different chromosome(s) but which do not jointly characterize allelic states at a single locus on two or more chromosomes. However, the system is sufficiently rich to allow computing, albeit in general numerically, all possible multipoint linkage disequilibria under recombination, mutation and drift. We explore the algorithms enabling construction of the transition probability matrices of the Markov chain describing the process. We find that asymptotically the effects of recombination become indistinguishable, at least as characterized by the set of distributions we consider, from the effects of mutation and drift. Mathematically, the results are based on the foundations of the theory of semigroups of operators. This approach allows generalization to any Markov-type mutation model. Based on these fundamental results, we explore the rates of convergence to the limit distribution, using Dobrushin’s coefficient and spectral gap.  相似文献   

15.
16.
17.
Retracing the trajectories of past genetic events is crucial to understand the structure of the genome, both in individuals and across populations. A haplotype describes a string of polymorphic sites along a DNA segment. Haplotype diversity is due to mutations creating new variants, and to recombinations and gene conversions that mix and redistribute these variants among individual chromosomes in populations. A number of studies have revealed a relatively simple pattern of haplotype diversity in the human genome, dominated by a few common haplotypes representing founder ancestral ones. New haplotypes are usually rare and have a limited geographic distribution. We propose a method to derive a new haplotype from a set of putative ancestral haplotypes, once mutations in place, through minimal recombination and gene conversion pathways. We describe classes of pathways that represent the whole set of minimal pathways leading to a new haplotype. We show that obtaining this set of pathways can be represented as a problem of finding "secondary structures" of minimum energy. We present a polynomial algorithm solving this folding problem.  相似文献   

18.
By viewing the ancestral recombination graph as defining a sequence of trees, we show how possible evolutionary histories consistent with given data can be constructed using the minimum number of recombination events. In contrast to previously known methods, which yield only estimated lower bounds, our method of detecting recombination always gives the minimum number of recombination events if the right kind of rooted trees are used in our algorithm. A new lower bound can be defined if rooted trees with fewer constraints are used. As well as studying how often it actually is equal to the minimum, we test how this new lower bound performs in comparison to some other lower bounds. Our study indicates that the new lower bound is an improvement on earlier bounds. Also, using simulated data, we investigate how well our method can recover the actual site-specific evolutionary relationships. In the presence of recombination, using a single tree to describe the evolution of the entire locus clearly leads to lower average recovery percentages than does our method. Our study shows that recovering the actual local tree topologies can be done more accurately than estimating the actual number of recombination events.  相似文献   

19.
Allelic dimorphism is a characteristic feature of the Plasmodium falciparum msp1 gene encoding the merozoite surface protein 1, a strong malaria vaccine candidate. Meiotic recombination is a major mechanism for the generation of msp1 allelic diversity. Potential recombination sites have previously been mapped to specific regions within msp1 (a 5' 1-kb region and a 3' 0.4-kb region) with no evidence for recombination events in a central 3.5-kb region. However, evidence for the lack of recombination events is circumstantial and inconclusive because the number of msp1 sequences analysed is limited, and the frequency of recombination events has not been addressed previously in a high transmission area, where the frequency of meiotic recombination is expected to be high. In the present study, we have mapped potential allelic recombination sites in 34 full-length msp1 sequences, including 24 new sequences, from various geographic origins. We also investigated recombination events in blocks 6 to 16 by population genetic analysis of P. falciparum populations in Tanzania, where malaria transmission is intense. The results clearly provide no evidence of recombination events occurring between the two major msp1 allelic types, K1-type and Mad20-type, in the central region, but do show recombination events occurring throughout the entire gene within sequences of the Mad20-type. Thus, the present study indicates that allelic dimorphism of msp1 greatly affects inter-allelic recombination events, highlighting a unique feature of allelic diversity of P. falciparum msp1.  相似文献   

20.
A comparison of estimators of the population recombination rate   总被引:15,自引:0,他引:15  
Three new estimators of the population recombination rate C = 4Nr are introduced. These estimators summarize the data using the number of distinct haplotypes and the estimated minimum number of recombination events, then calculate the value of C that maximizes the likelihood of obtaining the summarized data. They are compared with a number of previously proposed estimators of the recombination rate. One of the newly proposed estimators is generally better than the others for the parameter values considered here, while the three programs that calculate maximum-likelihood estimates give conflicting results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号