首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider the following problem: Given a set of binary sequences, determine lower bounds on the minimum number of recombinations required to explain the history of the sample, under the infinite-sites model of mutation. The problem has implications for finding recombination hotspots and for the Ancestral Recombination Graph reconstruction problem. Hudson and Kaplan gave a lower bound based on the four-gamete test. In practice, their bound R/sub m/ often greatly underestimates the minimum number of recombinations. The problem was recently revisited by Myers and Griffiths, who introduced two new lower bounds R/sub h/ and R/sub s/ which are provably better, and also yield good bounds in practice. However, the worst-case complexities of their procedures for computing R/sub h/ and R/sub s/ are exponential and super-exponential, respectively. In this paper, we show that the number of nontrivial connected components, R/sub c/, in the conflict graph for a given set of sequences, computable in time 0(nm/sup 2/), is also a lower bound on the minimum number of recombination events. We show that in many cases, R/sub c/ is a better bound than R/sub h/. The conflict graph was used by Gusfield et al. to obtain a polynomial time algorithm for the galled tree problem, which is a special case of the Ancestral Recombination Graph (ARG) reconstruction problem. Our results also offer some insight into the structural properties of this graph and are of interest for the general Ancestral Recombination Graph reconstruction problem.  相似文献   

2.
By viewing the ancestral recombination graph as defining a sequence of trees, we show how possible evolutionary histories consistent with given data can be constructed using the minimum number of recombination events. In contrast to previously known methods, which yield only estimated lower bounds, our method of detecting recombination always gives the minimum number of recombination events if the right kind of rooted trees are used in our algorithm. A new lower bound can be defined if rooted trees with fewer constraints are used. As well as studying how often it actually is equal to the minimum, we test how this new lower bound performs in comparison to some other lower bounds. Our study indicates that the new lower bound is an improvement on earlier bounds. Also, using simulated data, we investigate how well our method can recover the actual site-specific evolutionary relationships. In the presence of recombination, using a single tree to describe the evolution of the entire locus clearly leads to lower average recovery percentages than does our method. Our study shows that recovering the actual local tree topologies can be done more accurately than estimating the actual number of recombination events.  相似文献   

3.
Recombination is an important evolutionary mechanism responsible for creating the patterns of haplotype variation observable in human populations. Recently, there has been extensive research on understanding the fine-scale variation in recombination across the human genome using DNA polymorphism data. Historical recombination events leave signature patterns in haplotype data. A nonparametric approach for estimating the number of historical recombination events is to compute the minimum number of recombination events in the history of a set of haplotypes. In this paper, we provide new and improved methods for computing lower bounds on the minimum number of recombination events. These methods are shown to detect a higher number of recombination events for a haplotype dataset from a region in the lipoprotein lipase gene than previous lower bounds. We apply our methods to two datasets for which recombination hotspots have been experimentally determined and demonstrate a high density of detectable recombination events in the regions annotated as recombination hotspots. The programs implementing the methods in this paper are available at www.cs.ucsd.edu/users/vibansal/RecBounds/.  相似文献   

4.
Meiotic recombination is a fundamental biological event and one of the principal evolutionary forces responsible for shaping genetic variation within species. In addition to its fundamental role, recombination is central to several critical applied problems. The most important example is "association mapping" in populations, which is widely hoped to help find genes that influence genetic diseases (Carlson et al., 2004; Clark, 2003). Hence, a great deal of recent attention has focused on problems of inferring the historical derivation of sequences in populations when both mutations and recombinations have occurred. In the algorithms literature, most of that recent work has been directed to single-crossover recombination. However, gene-conversion is an important, and more common, form of (two-crossover) recombination which has been much less investigated in the algorithms literature. In this paper, we explicitly incorporate gene-conversion into discrete methods to study historical recombination. We are concerned with algorithms for identifying and locating the extent of historical crossing-over and gene-conversion (along with single-nucleotide mutation), and problems of constructing full putative histories of those events. The novel technical issues concern the incorporation of gene-conversion into recently developed discrete methods (Myers and Griffiths, 2003; Song et al., 2005) that compute lower and upper-bound information on the amount of needed recombination without gene-conversion. We first examine the most natural extension of the lower bound methods from Myers and Griffiths (2003), showing that the extension can be computed efficiently, but that this extension can only yield weak lower bounds. We then develop additional ideas that lead to higher lower bounds, and show how to solve, via integer-linear programming, a more biologically realistic version of the lower bound problem. We also show how to compute effective upper bounds on the number of needed single-crossovers and gene-conversions, along with explicit networks showing a putative history of mutations, single-crossovers and gene-conversions. Both lower and upper bound methods can handle data with missing entries, and the upper bound method can be used to infer missing entries with high accuracy. We validate the significance of these methods by showing that they can be effectively used to distinguish simulation-derived sequences generated without gene-conversion from sequences that were generated with gene-conversion. We apply the methods to recently studied sequences of Arabidopsis thaliana, identifying many more regions in the sequences than were previously identified (Plagnol et al., 2006), where gene-conversion may have played a significant role. Demonstration software is available at www.csif.cs.ucdavis.edu/~gusfield.  相似文献   

5.
Recombination is one of the main forces shaping genome diversity, but the information it generates is often overlooked. A recombination event creates a junction between two parental sequences that may be transmitted to the subsequent generations. Just like mutations, these junctions carry evidence of the shared past of the sequences. We present the IRiS algorithm, which detects past recombination events from extant sequences and specifies the place of each recombination and which are the recombinants sequences. We have validated and calibrated IRiS for the human genome using coalescent simulations replicating standard human demographic history and a variable recombination rate model, and we have fine-tuned IRiS parameters to simultaneously optimize for false discovery rate, sensitivity, and accuracy in placing the recombination events in the sequence. Newer recombinations overwrite traces of past ones and our results indicate more recent recombinations are detected by IRiS with greater sensitivity. IRiS analysis of the MS32 region, previously studied using sperm typing, showed good concordance with estimated recombination rates. We also applied IRiS to haplotypes for 18 X-chromosome regions in HapMap Phase 3 populations. Recombination events detected for each individual were recoded as binary allelic states and combined into recotypes. Principal component analysis and multidimensional scaling based on recotypes reproduced the relationships between the eleven HapMap Phase III populations that can be expected from known human population history, thus further validating IRiS. We believe that our new method will contribute to the study of the distribution of recombination events across the genomes and, for the first time, it will allow the use of recombination as genetic marker to study human genetic variation.  相似文献   

6.
Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected. In this context, a recombination hotspot can be inferred in phylogenetic analyses if we observe several consecutive breakpoints. We developed a distance measure between unrooted topologies that closely resembles the number of recombinations. By introducing a prior distribution on these recombination distances, a Bayesian hierarchical model was devised to detect phylogenetic inconsistencies occurring due to recombinations. This model relaxes the assumption of known parental sequences, still common in HIV analysis, allowing the entire dataset to be analyzed at once. On simulated datasets with up to 16 taxa, our method correctly detected recombination breakpoints and the number of recombination events for each breakpoint. The procedure is robust to rate and transitionratiotransversion heterogeneities for simulations with and without recombination. This recombination distance is related to recombination hotspots. Applying this procedure to a genomic HIV-1 dataset, we found evidence for hotspots and de novo recombination.  相似文献   

7.
An analysis of the genome structure of soybean cultivars was conducted to determine if cultivars are composed of large regions of chromosomes inherited intact from one parent (indicative of minimal recombination) or if the chromosomes are a mixture of one parent's DNA interspersed with the DNA from the other parent (indicative of maximal recombination). Twenty-one single-cross-derived and 5 single-backcross-derived soybean cultivars and their immediate parents (47 genotypes) were analyzed at 89 RFLP loci to determine the minimal number and distribution of recombination events detected. Cultivars derived from single-cross and single-backcross breeding programs showed an average of 5.2 and 8.0 recombination events per cultivar, respectively. A homogeneity Chi-square test based upon a Poisson distribution of recombination events across 13 linkage groups indicated that the number of recombinations observed among linkage groups was random for the single-cross cultivars, but not for the single-backcross-derived cultivars. A twotailed t-test demonstrated that for some linkage groups, the number of recombinations per map unit exceeded the confidence interval developed from a t-distribution of recombinations standardized for map unit distance. Paired t-tests of the number of recombinations observed between linkage-group ends and the mid-portion of the linkage groups indicated that during the development of the cultivars analyzed in this study more recombinations were associated with the ends of linkage groups than with the middle region. Detailed analysis of each linkage group revealed that large portions of linkage groups D, F, and G were inherited intact from one parent in several cultivars. A portion of linkage group G, in contrast, showed more recombination events than expected, based on genetic distance. These analyses suggest that breeders may have selected against recombination events where agronomically favorable combinations of alleles are present in one parent, and for recombination in areas where agronomically favorable combinations of alleles are not present in either parent.Names are necessary to report factually on the available data; however, the USDA neither guarantees nor warrants the standard of the product, and the use of the name by the USDA implies no approval of the product to the exclusion of others that may also be available. Contribution of the Midwest Area, USDA-ARS, Project No. 3236 of the Iowa Agriculture and Home Economics Experiment Station, Ames, IA 50011. Journal Paper No. J-16533  相似文献   

8.
Lefebvre JF  Labuda D 《Genetics》2008,178(4):2069-2079
In this article we present a new heuristic approach (informative recombinations, InfRec) to analyze recombination density at the sequence level. InfRec is intuitive and easy and combines previously developed methods that (i) resolve genotypes into haplotypes, (ii) estimate the minimum number of recombinations, and (iii) evaluate the fraction of informative recombinations. We tested this approach in its sliding-window version on 117 genes from the SeattleSNPs program, resequenced in 24 African-Americans (AAs) and 23 European-Americans (EAs). We obtained population recombination rate estimates (rho(obs)) of 0.85 and 0.37 kb(-1) in AAs and EAs, respectively. Coalescence simulations indicated that these values account for both the recombinations and the gene conversions in the history of the sample. The intensity of rho(obs) varied considerably along the sequence, revealing the presence of recombination hotspots. Overall, we observed approximately 80% of recombinations in one-third and approximately 50% in only 10% of the sequence. InfRec performance, tested on published simulated and additional experimental data sets, was similar to that of other hotspot detection methods. Fast, intuitive, and visual, InfRec is not constrained by sample size limitations. It facilitates understanding data and provides a simple and flexible tool to analyze recombination intensity along the sequence.  相似文献   

9.
M. Xiong  S. W. Guo 《Genetics》1997,145(4):1201-1218
With increasing popularity of QTL mapping in economically important animals and experimental species, the need for statistical methodology for fine-scale QTL mapping becomes increasingly urgent. The ability to disentangle several linked QTL depends on the number of recombination events. An obvious approach to increase the recombination events is to increase sample size, but this approach is often constrained by resources. Moreover, increasing the sample size beyond a certain point will not further reduce the length of confidence interval for QTL map locations. The alternative approach is to use historical recombinations. We use analytical methods to examine the properties of fine QTL mapping using historical recombinations that are accumulated through repeated intercrossing from an F(2) population. We demonstrate that, using the historical recombinations, both simple and multiple regression models can reduce significantly the lengths of support intervals for estimated QTL map locations and the variances of estimated QTL map locations. We also demonstrate that, while the simple regression model using historical recombinations does not reduce the variances of the estimated additive and dominant effects, the multiple regression model does. We further determine the power and threshold values for both the simple and multiple regression models. In addition, we calculate the Kullback-Leibler distance and Fisher information for the simple regression model, in the hope to further understand the advantages and disadvantages of using historical recombinations relative to F(2) data.  相似文献   

10.
The candidate region for the Huntington disease (HD) gene has been narrowed down to a 2.2-Mb region between D4S10 and D4S98 on the short arm of chromosome 4. To map the HD gene within this candidate region 65 Dutch HD families were studied. In total 338 informative meioses were analyzed and 11 multiple informative crossovers were detected. Assuming a minimum number of recombinations and no double recombinations, our multiple informative crossovers are consistent with one specific genetic order for 12 loci: D4S10-(D4S81, D4S126)-D4S125-(D4S127, D4S95)-D4S43-(D4S115, D4S96, D4S111, D4S90, D4S141). This is in agreement with the known data derived from similar and other methods. The loci between brackets could not be mapped relative to each other. In our family material, two informative three-point marker recombination events were detected in the proximal HD candidate region, which are also informative for HD. Both recombination events map the HD gene distal to D4S81 and most likely distal to D4S125, narrowing down the HD candidate region to a 1.7-Mb region between D4S125 and D4S98.  相似文献   

11.
Some statistical properties of samples of DNA sequences are studied under an infinite-site neutral model with recombination. The two quantities of interest are R, the number of recombination events in the history of a sample of sequences, and RM, the number of recombination events that can be parsimoniously inferred from a sample of sequences. Formulas are derived for the mean and variance of R. In contrast to R, RM can be determined from the sample. Since no formulas are known for the mean and variance of RM, they are estimated with Monte Carlo simulations. It is found that RM is often much less than R, therefore, the number of recombination events may be greatly under-estimated in a parsimonious reconstruction of the history of a sample. The statistic RM can be used to estimate the product of the recombination rate and the population size or, if the recombination rate is known, to estimate the population size. To illustrate this, DNA sequences from the Adh region of Drosophila melanogaster are used to estimate the effective population size of this species.  相似文献   

12.
Wiuf C 《Genetics》2004,166(1):537-545
In this study compatibility with a tree for unphased genotype data is discussed. If the data are compatible with a tree, the data are consistent with an assumption of no recombination in its evolutionary history. Further, it is said that there is a solution to the perfect phylogeny problem; i.e., for each individual a pair of haplotypes can be defined and the set of all haplotypes can be explained without invoking recombination. A new algorithm to decide whether or not a sample is compatible with a tree is derived. The new algorithm relies on an equivalence relation between sites that mutually determine the phase of each other. (The previous algorithm was based on advanced graph theoretical tools.) The equivalence relation is used to derive the number of solutions to the perfect phylogeny problem. Further, a series of statistics, R ( j ) ( M ), j >or= 2, are defined. These can be used to detect recombination events in the sample's history and to divide the sample into regions that are compatible with a tree. The new statistics are applied to real data from human genes. The results from this application are discussed with reference to recent suggestions that recombination in the human genome is highly heterogeneous.  相似文献   

13.
An improved linkage map for human chromosome 19 containing 35 short tandem repeat polymorphisms (STRPs) and one VNTR (D19S20) was constructed. The map included 12 new (GATA)n tetranucleotide STRPs. Although total lengths of the male (114 cM) and female (128 cM) maps were similar, at both ends of the chromosome male recombination exceeded female recombination, while in the interior portion of the map female recombination was in excess. Cosmid clones containing the STRP sequences were identified and were positioned along the chromosome by fluorescent in situ hybridization. Four rounds of careful checking and removal of genotyping errors allowed biologically relevant conclusions to be made concerning the numbers and distributions of recombination events on chromosome 19. The average numbers of recombinations per chromosome matched closely the lengths of the genetic maps computed by using the program CRIMAP. Significant numbers of chromosomes with zero, one, two, or three recombinations were detected as products of both female and male meioses. On the basis of the total number of observed pairs of recombination events in which only a single informative marker was situated between the two recombinations, a maximal estimate for the rate of meiotic STRP “gene” conversion without recombination was calculated as 3 × 10−4/meiosis. For distances up to 30 cM between recombinations, many fewer chromosomes which had undergone exactly two recombinations were observed than were expected on the basis of the assumption of independent recombination locations. This strong new evidence for human meiotic interference will help to improve the accuracy of interpretation of clinical DNA test results involving polymorphisms flanking a genetic abnormality.  相似文献   

14.
Genome regions containing multiple copies of homologous genes, such as the immunoglobulin (Ig) heavy-chain constant-region (IGHC) locus, are often unstable and give rise to duplicated and deleted haplotypes. Analysis of such processes is fundamental to understanding the mechanisms of evolution of multigene families. In the IGHC region, a number of single and multiple gene deletions, derived from either unequal crossing-over or looping-out excision, have been described. To study these haplotypes at the population level, a simple and efficient method for preparing large numbers of DNA samples suitable for pulsed-field gel electrophoresis (PFGE) analysis was set up, and a sample of 110 blood donors was screened. Deletions were found to be frequent, as expected on the basis of previous serological surveys for homozygotes. Furthermore, a number of multigene duplications, never identified before, were detected. The total frequency of individuals bearing rearranged IGHC haplotypes was 10%. The genes involved in these deletions and duplications were assessed by densitometric analysis of standard Southern blots hybridized with several IGHC probes; two types of deletion and two types of duplication could thus be characterized. These data provide further evidence of the instability of the IGHC locus and demonstrate that unequal crossing-over is the most likely origin of rearranged IGHC haplotypes; they also suggest that such recombination events may be relatively frequent. Moreover, the simplicity and effectiveness of the large-scale PFGE screening approach will be of great help in the study of multigene families and of other loci involved in aberrant recombinations.  相似文献   

15.
Didelot X  Lawson D  Darling A  Falush D 《Genetics》2010,186(4):1435-1449
Bacteria and archaea reproduce clonally, but sporadically import DNA into their chromosomes from other organisms. In many of these events, the imported DNA replaces an homologous segment in the recipient genome. Here we present a new method to reconstruct the history of recombination events that affected a given sample of bacterial genomes. We introduce a mathematical model that represents both the donor and the recipient of each DNA import as an ancestor of the genomes in the sample. The model represents a simplification of the previously described coalescent with gene conversion. We implement a Monte Carlo Markov chain algorithm to perform inference under this model from sequence data alignments and show that inference is feasible for whole-genome alignments through parallelization. Using simulated data, we demonstrate accurate and reliable identification of individual recombination events and global recombination rate parameters. We applied our approach to an alignment of 13 whole genomes from the Bacillus cereus group. We find, as expected from laboratory experiments, that the recombination rate is higher between closely related organisms and also that the genome contains several broad regions of elevated levels of recombination. Application of the method to the genomic data sets that are becoming available should reveal the evolutionary history and private lives of populations of bacteria and archaea. The methods described in this article have been implemented in a computer software package, ClonalOrigin, which is freely available from http://code.google.com/p/clonalorigin/.  相似文献   

16.
In representing the evolutionary history of a set of binary DNA sequences by a connected graph, a set theoretical approach is introduced for studying recombination events. We show that set theoretical constraints have direct implications on the number of recombination events. We define a new lower bound on the number of recombination events and demonstrate the usefulness of our new approach through several explicit examples.  相似文献   

17.
Plasma cells secrete immunoglobulins other than immunoglobulin M (IgM) after a deletion and recombination in which a portion of the immunoglobulin heavy-chain locus (IgH), from the 5'-flanking region of the mu constant-region gene (C mu) to the 5'-flanking region of the secreted heavy-chain constant-region gene (CH), is deleted. The recombination step is believed to be targeted via switch regions, stretches of repetitive DNA which lie in the 5' flank of all CH genes except delta. Although serum levels of IgD are very low, particularly in the mouse, IgD-secreting plasmacytomas of BALB/c and C57BL/6 mice are known. In an earlier study of two BALB/c IgD-secreting hybridomas, we reported that both had deleted the C mu gene, and we concluded that this deletion was common in the normal generation of IgD-secreting cells. To learn how such switch recombinations occur in the absence of a switch region upstream of the C delta 1 exon, we isolated seven more BALB/c and two C57BL/6 IgD-secreting hybridomas. We determined the DNA sequences of the switch recombination junctions in eight of these hybridomas as well as that of the C57BL/6 hybridoma B1-8. delta 1 and of the BALB/c, IgD-secreting plasmacytoma TEPC 1033. All of the lines had deleted the C mu gene, and three had deleted the C delta 1 exon in the switch recombination event. The delta switch recombination junction sequences were similar to those of published productive switch recombinations occurring 5' to other heavy-chain genes, suggesting that nonhomologous, illegitimate recombination is utilized whenever the heavy-chain switch region is involved in recombination.  相似文献   

18.
Given a set D of input sequences, a genealogy for D can be constructed backward in time using such evolutionary events as mutation, coalescent, and recombination. An ancestral configuration (AC) can be regarded as the multiset of all sequences present at a particular point in time in a possible genealogy for D. The complexity of computing the likelihood of observing D depends heavily on the total number of distinct ACs of D and, therefore, it is of interest to estimate that number. For D consisting of binary sequences of finite length, we consider the problem of enumerating exactly all distinct ACs. We assume that the root sequence type is known and that the mutation process is governed by the infinite-sites model. When there is no recombination, we construct a general method of obtaining closed-form formulas for the total number of ACs. The enumeration problem becomes much more complicated when recombination is involved. In that case, we devise a method of enumeration based on counting contingency tables and construct a dynamic programming algorithm for the approach. Last, we describe a method of counting the number of ACs that can appear in genealogies with less than or equal to a given number R of recombinations. Of particular interest is the case in which R is close to the minimum number of recombinations for D.  相似文献   

19.
H. Innan  F. Tajima  R. Terauchi    N. T. Miyashita 《Genetics》1996,143(4):1761-1770
Nucleotide variation in the Adh region of the wild plant Arobidopsis thaliana was analyzed in 17 ecotypes sampled worldwide to investigate DNA polymorphism in natural plant populations. The investigated 2.4-kb Adh region was divided into four blocks by intragenic recombinations between two parental sequence types that diverged 6.3 million years (Myr) ago, if the nucleotide mutation rate μ = 10(-9) is assumed. Within each block, dimorphism of segregating variations was observed with intermediate frequencies, which caused a substantial amount of nucleotide variation in A. thaliana at the species level. The first recombination introduced the divergent variation that resulted in dimorphism in this plant species ~3.3 Myr ago, and three subsequent intragenic recombinations have occurred sporadically in ~1.1-Myr intervals. It was shown that there was only a limited number (six) of sequence types in this species and that no clear association was observed between sequence type and geographic origin. Taken together, these results suggest that A. thaliana has spread over the world only recently. It can be concluded that recombination played an important role in the evolutionary history of A. thaliana, especially through the generation of DNA polymorphism in the natural populations of this plant species.  相似文献   

20.
Genetic recombination is considered to be a very frequent phenomenon among enteroviruses (Family Picornaviridae, Genus Enterovirus). However, the recombination patterns may differ between enterovirus species and between types within species. Enterovirus C (EV-C) species contains 21 types. In the capsid coding P1 region, the types of EV-C species cluster further into three sub-groups (designated here as A–C). In this study, the recombination pattern of EV-C species sub-group B that contains types CVA-21, CVA-24, EV-C95, EV-C96 and EV-C99 was determined using partial 5′UTR and VP1 sequences of enterovirus strains isolated during poliovirus surveillance and previously published complete genome sequences. Several inter-typic recombination events were detected. Furthermore, the analyses suggested that inter-typic recombination events have occurred mainly within the distinct sub-groups of EV-C species. Only sporadic recombination events between EV-C species sub-group B and other EV-C sub-groups were detected. In addition, strict recombination barriers were inferred for CVA-21 genotype C and CVA-24 variant strains. These results suggest that the frequency of inter-typic recombinations, even within species, may depend on the phylogenetic position of the given viruses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号