首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We consider the following problem: Given a set of binary sequences, determine lower bounds on the minimum number of recombinations required to explain the history of the sample, under the infinite-sites model of mutation. The problem has implications for finding recombination hotspots and for the Ancestral Recombination Graph reconstruction problem. Hudson and Kaplan gave a lower bound based on the four-gamete test. In practice, their bound R/sub m/ often greatly underestimates the minimum number of recombinations. The problem was recently revisited by Myers and Griffiths, who introduced two new lower bounds R/sub h/ and R/sub s/ which are provably better, and also yield good bounds in practice. However, the worst-case complexities of their procedures for computing R/sub h/ and R/sub s/ are exponential and super-exponential, respectively. In this paper, we show that the number of nontrivial connected components, R/sub c/, in the conflict graph for a given set of sequences, computable in time 0(nm/sup 2/), is also a lower bound on the minimum number of recombination events. We show that in many cases, R/sub c/ is a better bound than R/sub h/. The conflict graph was used by Gusfield et al. to obtain a polynomial time algorithm for the galled tree problem, which is a special case of the Ancestral Recombination Graph (ARG) reconstruction problem. Our results also offer some insight into the structural properties of this graph and are of interest for the general Ancestral Recombination Graph reconstruction problem.  相似文献   

2.
The problem of inferring haplotype phase from a population of genotypes has received a lot of attention recently. This is partly due to the observation that there are many regions on human genomic DNA where genetic recombination is rare (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). A Haplotype Map project has been announced by NIH to identify and characterize populations in terms of these haplotypes. Recently, Gusfield introduced the perfect phylogeny haplotyping problem, as an algorithmic implication of the no-recombination in long blocks observation, together with the standard population-genetic assumption of infinite sites. Gusfield's solution based on matroid theory was followed by direct theta(nm2) solutions that use simpler techniques (Bafna et al., 2003; Eskin et al., 2003), and also bound the number of solutions to the PPH problem. In this short note, we address two questions that were left open. First, can the algorithms of Bafna et al. (2003) and Eskin et al. (2003) be sped-up to O(nm + m2) time, which would imply an O(nm) time-bound for the PPH problem? Second, if there are multiple solutions, can we find one that is most parsimonious in terms of the number of distinct haplotypes. We give reductions that suggests that the answer to both questions is "no." For the first problem, we show that computing the output of the first step (in either method) is equivalent to Boolean matrix multiplication. Therefore, the best bound we can presently achieve is O(nm(omega-1)), where omega < or = 2.52 is the exponent of matrix multiplication. Thus, any linear time solution to the PPH problem likely requires a different approach. For the second problem of computing a PPH solution that minimizes the number of distinct haplotypes, we show that the problem is NP-hard using a reduction from Vertex Cover (Garey and Johnson, 1979).  相似文献   

3.
This paper examines some of the rich structure of the syntenic distance model of evolutionary distance, introduced by Ferretti et al. (1996). The syntenic distance between two genomes is the minimum number of fissions, fusions, and translocations required to transform one into the other, ignoring gene order within chromosomes. We prove that the previously unanalyzed algorithm given by Ferretti et al. (1996) is a 2-approximation and no better, and that, further, it always outperforms the algorithm presented by DasGupta et al. (1998). We also prove the same results for an improved version of the Ferretti et al. algorithm. We then prove a number of properties which give insight into the structure of optimal move sequences. We give instances in which any move sequence working solely within connected components is nearly twice optimal and prove a general lower bound based on the spread of genes from each chromosome. We then prove a monotonicity property for the syntenic distance, and bound the difficulty of the hardest instance of any size. We discuss the results of implementing these algorithms and testing them on real and simulated synteny data.  相似文献   

4.
It is known (Reidys et al., 1997b. Bull. Math. Biol. 59(2), 339-397) that for any two secondary structures S,S' there exists an RNA sequence compatible with both, and that this result does not extend to more than two secondary structures. Indeed, a simple formula for the number of RNA sequences compatible with secondary structures S,S' plays a role in the algorithms of Flamm et al. (2001. RNA 7, 254-265) and of Abfalter et al. (2003. Proceedings of the German Conference on Bioinformatics, ) to design an RNA switch. Here we show that a natural extension of this problem is NP-complete. Unless P=NP, there is no polynomial time algorithm, which when given secondary structures S1,...,S(k), for k4, determines the least number of positions, such that after removal of all base pairs incident to these positions there exists an RNA nucleotide sequence compatible with the given secondary structures. We also consider a restricted version of this problem with a "fixed maximum" number of possible stars and show that it has a simple polynomial time solution.  相似文献   

5.
Bounds on the minimum number of recombination events in a sample history   总被引:11,自引:0,他引:11  
Myers SR  Griffiths RC 《Genetics》2003,163(1):375-394
Recombination is an important evolutionary factor in many organisms, including humans, and understanding its effects is an important task facing geneticists. Detecting past recombination events is thus important; this article introduces statistics that give a lower bound on the number of recombination events in the history of a sample, on the basis of the patterns of variation in the sample DNA. Such lower bounds are appropriate, since many recombination events in the history are typically undetectable, so the true number of historical recombinations is unobtainable. The statistics can be calculated quickly by computer and improve upon the earlier bound of Hudson and Kaplan 1985. A method is developed to combine bounds on local regions in the data to produce more powerful improved bounds. The method is flexible to different models of recombination occurrence. The approach gives recombination event bounds between all pairs of sites, to help identify regions with more detectable recombinations, and these bounds can be viewed graphically. Under coalescent simulations, there is a substantial improvement over the earlier method (of up to a factor of 2) in the expected number of recombination events detected by one of the new minima, across a wide range of parameter values. The method is applied to data from a region within the lipoprotein lipase gene and the amount of detected recombination is substantially increased. Further, there is strong clustering of detected recombination events in an area near the center of the region. A program implementing these statistics, which was used for this article, is available from http://www.stats.ox.ac.uk/mathgen/programs.html.  相似文献   

6.
By viewing the ancestral recombination graph as defining a sequence of trees, we show how possible evolutionary histories consistent with given data can be constructed using the minimum number of recombination events. In contrast to previously known methods, which yield only estimated lower bounds, our method of detecting recombination always gives the minimum number of recombination events if the right kind of rooted trees are used in our algorithm. A new lower bound can be defined if rooted trees with fewer constraints are used. As well as studying how often it actually is equal to the minimum, we test how this new lower bound performs in comparison to some other lower bounds. Our study indicates that the new lower bound is an improvement on earlier bounds. Also, using simulated data, we investigate how well our method can recover the actual site-specific evolutionary relationships. In the presence of recombination, using a single tree to describe the evolution of the entire locus clearly leads to lower average recovery percentages than does our method. Our study shows that recovering the actual local tree topologies can be done more accurately than estimating the actual number of recombination events.  相似文献   

7.
Recombination is an important evolutionary mechanism responsible for creating the patterns of haplotype variation observable in human populations. Recently, there has been extensive research on understanding the fine-scale variation in recombination across the human genome using DNA polymorphism data. Historical recombination events leave signature patterns in haplotype data. A nonparametric approach for estimating the number of historical recombination events is to compute the minimum number of recombination events in the history of a set of haplotypes. In this paper, we provide new and improved methods for computing lower bounds on the minimum number of recombination events. These methods are shown to detect a higher number of recombination events for a haplotype dataset from a region in the lipoprotein lipase gene than previous lower bounds. We apply our methods to two datasets for which recombination hotspots have been experimentally determined and demonstrate a high density of detectable recombination events in the regions annotated as recombination hotspots. The programs implementing the methods in this paper are available at www.cs.ucsd.edu/users/vibansal/RecBounds/.  相似文献   

8.
Sequencing by hybridization is a method for reconstructing a DNA sequence based on its k-mer content. This content, called the spectrum of the sequence, can be obtained from hybridization with a universal DNA chip. However, even with a sequencing chip containing all 4(9) 9-mers and assuming no hybridization errors, only about 400-bases-long sequences can be reconstructed unambiguously. Drmanac et al. (1989) suggested sequencing long DNA targets by obtaining spectra of many short overlapping fragments of the target, inferring their relative positions along the target, and then computing spectra of subfragments that are short enough to be uniquely recoverable. Drmanac et al. do not treat the realistic case of errors in the hybridization process. In this paper, we study the effect of such errors. We show that the probability of ambiguous reconstruction in the presence of (false negative) errors is close to the probability in the errorless case. More precisely, the ratio between these probabilities is 1 + O(p = (1 - p)(4). 1 = d) where d is the average length of subfragments, and p is the probability of a false negative. We also obtain lower and upper bounds for the probability of unambiguous reconstruction based on an errorless spectrum. For realistic chip sizes, these bounds are tighter than those given by Arratia et al. (1996). Finally, we report results on simulations with real DNA sequences, showing that even in the presence of 50% false negative errors, a target of cosmid length can be recovered with less than 0.1% miscalled bases.  相似文献   

9.
Meiotic recombination is a fundamental cellular mechanism in sexually reproducing organisms and its different forms, crossing over and gene conversion both play an important role in shaping genetic variation in populations. Here, we describe a coalescent-based full-likelihood Markov chain Monte Carlo (MCMC) method for jointly estimating the crossing-over, gene-conversion, and mean tract length parameters from population genomic data under a Bayesian framework. Although computationally more expensive than methods that use approximate likelihoods, the relative efficiency of our method is expected to be optimal in theory. Furthermore, it is also possible to obtain a posterior sample of genealogies for the data using this method. We first check the performance of the new method on simulated data and verify its correctness. We also extend the method for inference under models with variable gene-conversion and crossing-over rates and demonstrate its ability to identify recombination hotspots. Then, we apply the method to two empirical data sets that were sequenced in the telomeric regions of the X chromosome of Drosophila melanogaster. Our results indicate that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences while the local rates of crossing over as inferred by our program are not low. The mean tract lengths for gene-conversion events are estimated to be ~70 bp and 430 bp, respectively, for these data sets. Finally, we discuss ideas and optimizations for reducing the execution time of our algorithm.  相似文献   

10.
11.
We show that a family of prokaryotic repetitive sequences, called REP (repetitive extragenic palindromic), (Stern et al., 1984) is involved in the formation of chromosomal rearrangements such as duplications. The join-points of seven RecA+ tandem duplications previously characterized in Salmonella typhimurium, that fuse the hisD gene to distant foreign promoters, were cloned and sequenced. In all seven cases they are shown to have originated by recombination between distant REP sequences. Importantly, several join-points had also occurred at REP sequences even in a RecA-background. Thus, REPs can recombine with each other by a RecA(-)-independent mechanism involved in the generation of chromosomal rearrangements. While all RecA+ duplications analysed resulted from recombination between REP sequences, some RecA-duplications did occur also outside of REP sequences, in one case by recombination within a 7 bp homology. Possible roles for the known interaction between DNA gyrase and REP in chromosomal rearrangements are discussed.  相似文献   

12.
13.
In representing the evolutionary history of a set of binary DNA sequences by a connected graph, a set theoretical approach is introduced for studying recombination events. We show that set theoretical constraints have direct implications on the number of recombination events. We define a new lower bound on the number of recombination events and demonstrate the usefulness of our new approach through several explicit examples.  相似文献   

14.
The genetic nature and biological effects of recombination between porcine endogenous retroviruses (PERV) were studied. An infectious molecular clone was generated from a high-titer, human-tropic PERV isolate, PERV-A 14/220 (B. A. Oldmixon, et al. J. Virol. 76:3045-3048, 2002; T. A. Ericsson et al. Proc. Natl. Acad. Sci. USA 100:6759-6764, 2003). To analyze this sequence and 15 available full-length PERV nucleotide sequences, we developed a sequence comparison program, LOHA(TM) to calculate local sequence homology between two sequences. This analysis determined that PERV-A 14/220 arose by homologous recombination of a PERV-C genome replacing an 850-bp region around the pol-env junction with that of a PERV-A sequence. This 850-bp PERV-A sequence encompasses the env receptor binding domain, thereby conferring a wide host range including human cells. In addition, we determined that multiple regions derived from PERV-C are responsible for the increased infectious titer of PERV-A 14/220. Thus, a single recombination event may be a fast and effective way to generate high-titer, potentially harmful PERV. Further, local homology and phylogenetic analyses between 16 full-length sequences revealed evidence for other recombination events in the past that give rise to other PERV genomes that possess the PERV-A, but not the PERV-B, env gene. These results indicate that PERV-A env is more prone to recombination with heterogeneous backbone genomes than PERV-B env. Such recombination events that generate more active PERV-A appear to occur in pigs rather frequently, which increases the potential risk of zoonotic PERV transmission. In this context, pigs lacking non-human-tropic PERV-C would be more suitable as donor animals for clinical xenotransplantation.  相似文献   

15.
Several algorithms have been proposed for determining the centre of rotation of ball joints. These algorithms are used rather to locate the hip joint centre. Few studies have focused on the determination of the glenohumeral joint centre. However, no studies have assessed the accuracy and repeatability of functional methods for glenohumeral joint centre.This paper aims at evaluating the accuracy and the repeatability with which the glenohumeral joint rotation centre (GHRC) can be estimated in vivo by functional methods. The reference joint centre is the glenohumeral anatomical centre obtained by medical imaging. Five functional methods were tested: the algorithm of Gamage and Lasenby (2002), bias compensated (Halvorsen, 2003), symmetrical centre of rotation estimation (Ehrig et al., 2006), normalization method (Chang and Pollard, 2007), helical axis (Woltring et al., 1985). The glenohumeral anatomical centre (GHAC) was deduced from the fitting of the humeral head.Four subjects performed three cycles of three different movements (flexion/extension, abduction/adduction and circumduction). For each test, the location of the glenohumeral joint centre was estimated by the five methods. Analyses focused on the 3D location, on the repeatability of location and on the accuracy by computing the Euclidian distance between the estimated GHRC and the GHAC. For all the methods, the error repeatability was inferior to 8.25 mm. This study showed that there are significant differences between the five functional methods. The smallest distance between the estimated joint centre and the centre of the humeral head was obtained with the method of Gamage and Lasenby (2002).  相似文献   

16.
Current sitewise methods for detecting positive selection on gene sequences (the de facto standard being the CODEML method (Yang et al., 2000)) assume no recombination. This paper presents simulation results indicating that violation of this assumption can lead to false positive detection of sites undergoing positive selection. Through the use of population-scaled mutation and recombination rates, simulations can be performed that permit the generation of appropriate null distributions corresponding to neutral expectations in the presence of recombination, thereby allowing for a more accurate estimation of positive selection.  相似文献   

17.
Propionibacterium acnes is an anaerobic Gram-positive bacterium that has been linked to a wide range of opportunistic human infections and conditions, most notably acne vulgaris (I. Kurokawa et al., Exp. Dermatol. 18:821-832, 2009). We now present the whole-genome sequences of three P. acnes strains from the type IA(2) cluster which were recovered from ophthalmic infections (A. McDowell et al., Microbiology 157:1990-2003, 2011).  相似文献   

18.

Background  

Multiple sequence alignment is fundamental. Exponential growth in computation time appears to be inevitable when an optimal alignment is required for many sequences. Exact costs of optimum alignments are therefore rarely computed. Consequently much effort has been invested in algorithms for alignment that are heuristic, or explore a restricted class of solutions. These give an upper bound on the alignment cost, but it is equally important to determine the quality of the solution obtained. In the absence of an optimal alignment with which to compare, lower bounds may be calculated to assess the quality of the alignment. As more effort is invested in improving upper bounds (alignment algorithms), it is therefore important to improve lower bounds as well. Although numerous cost metrics can be used to determine the quality of an alignment, many are based on sum-of-pairs (SP) measures and their generalizations.  相似文献   

19.
The usage of preferred codons in Drosophila melanogaster is reduced in regions of lower recombination. This is consistent with population genetics theory, whereby the effectiveness of selection on multiple targets is limited by stochastic effects caused by linkage. However, because the selectively preferred codons in D. melanogaster end in C or G, it has been argued that base-composition-biasing effects of recombination can account for the observed relationship between preferred codon usage and recombination rate (Marais et al., 2003). Here, we show that the correlation between base composition (of protein-coding and intron regions) and recombination rate holds only for lower values of the latter. This is consistent with a Hill-Robertson interference model and does not support a model whereby the entire effect of recombination on codon usage can be attributed to its potential role in generating compositional bias.  相似文献   

20.
Stigter D 《Biophysical chemistry》2004,110(1-2):171-178
Brewer et al. (Biophys. J. 85 (2003) 2519-2524) have studied the compaction of dsDNA in a double flow cell by observing the extension of stained DNA tethered in buffer solutions with or without Abf2p. They use a Langmuir adsorption model in which one Abf2p molecule adsorbs on one site on the DNA, and the binding constant, K, is given as the ratio of the experimental rates of adsorption and desorption. This paper presents an improved interpretation. Instead of Langmuir adsorption we use the more appropriate McGhee-von Hippel (J. Mol. Biol. 86 (1974) 469-489) theory for the adsorption of large ligands to a one-dimensional lattice. We assume that each adsorbed molecule shortens the effective contour length of DNA by the foot print of Abf2p of 27 base pairs. When Abf2p adsorbs to DNA stretched in the flowing buffer solution, we account for a tension effect that decreases the adsorption rate and the binding constant by a factor of 2 to 4. The data suggest that the accessibility to Abf2p decreases significantly with increasing compaction of DNA, resulting in a lower adsorption rate and a lower binding constant. The kinetics reported by Brewer et al. (Biophys. J. 85 (2003) 2519-2524) lead to a binding constant K=3.6 x 10(6) M(-1) at the beginning, and to K=5 x 10(5) M(-1) near the end of a compaction run, more than an order of magnitude lower than the value K=2.57 x 10(7) M(-1) calculated by Brewer et al. (Biophys. J. 85 (2003) 2519-2524).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号