首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 178 毫秒
1.
An algorithm to enumerate sorting reversals for signed permutations.   总被引:1,自引:0,他引:1  
The rearrangement distance between single-chromosome genomes can be estimated as the minimum number of inversions required to transform the gene ordering observed in one into that observed in the other. This measure, known as "inversion distance," can be computed as the reversal distance between signed permutations. During the past decade, much progress has been made both on the problem of computing reversal distance and on the related problem of finding a minimum-length sequence of reversals, which is known as "sorting by reversals." For most problem instances, however, many minimum-length sequences of reversals exist, and in the absence of auxiliary information, no one is of greater value than the others. The problem of finding all minimum-length sequences of reversals is thus a natural generalization of sorting by reversals, yet it has received little attention. This problem reduces easily to the problem of finding all "sorting reversals" of one permutation with respect to another - that is, all reversals rho such that, if rho is applied to one permutation, then the reversal distance of that permutation from the other is decreased. In this paper, an efficient algorithm is derived to solve the problem of finding all sorting reversals, and experimental results are presented indicating that, while the new algorithm does not represent a significant improvement in asymptotic terms (it takes O(n(3)) time, for permutations of size n; the problem can now be solved by brute force in Theta(n(3)) time), it performs dramatically better in practice than the best known alternative. An implementation of the algorithm is available at www.cse.ucsc.edu/~acs.  相似文献   

2.
Based on a large repertoire of chromosomal rearrangement operations, the genomic distance d between two genomes with chi(r) and chi(b) linear chromosomes, respectively, both containing the same (or orthologous) n genes or markers, is d=n + max(chi(r),chi(b))-c, where c is the number of cycles in the breakpoint graph of the two genomes. In this paper, we study the exact probability distribution of c. We derive the expectation and variance, and show that, in the limit, the expectation of d is n - (2chirchib)/(2chir+2chib(-1)) - 1/2ln (n + max(chir,chib))/(chir+chib).  相似文献   

3.

Background

One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or genome rearrangements, that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed permutations of the form \(\pi =(\pi _1 \pi _2 \ldots \pi _n)\), and in that case we can consider without loss of generality that one of them is the identity permutation \(\iota _n =(1 2 \ldots n)\), and that we just need to sort the other (i.e., transform it into \(\iota _n\)). The most studied genome rearrangement events are reversals, where a segment of the genome is reversed and reincorporated at the same location; and transpositions, where two consecutive segments are exchanged. Many variants, e.g., combining different types of (possibly constrained) rearrangements, have been proposed in the literature. One of them considers that the number of genes involved, in a reversal or a transposition, is never greater than two, which is known as the problem of sorting by super short operations (or SSOs).

Results and conclusions

All problems considering SSOs in permutations have been shown to be in \(\mathsf {P}\), except for one, namely sorting signed circular permutations by super short reversals and super short transpositions. Here we fill this gap by introducing a new graph structure called cyclic permutation graph and providing a series of intermediate results, which allows us to design a polynomial algorithm for sorting signed circular permutations by super short reversals and super short transpositions.
  相似文献   

4.
We study the probability distribution of the distance d = n + chi - kappa - psi between two genomes with n markers distributed on chi chromosomes and with breakpoint graphs containing kappa cycles and psi "good" paths, under the hypothesis of random gene order. We interpret the random order assumption in terms of a stochastic method for constructing the bicolored breakpoint graph. We show that the limiting expectation of E[d] = n - 1/2chi - 1/2 log n+chi/2chi. We also calculate the variance, the effect of different numbers of chromosomes in the two genomes, and the number of plasmids, or circular chromosomes, generated by the random breakpoint graph construction. A more realistic model allows intra- and interchromosomal operations to have different probabilities, and simulations show that for a fixed number of rearrangements, kappa and d depend on the relative proportions of the two kinds of operation.  相似文献   

5.
Sorting by weighted reversals, transpositions, and inverted transpositions.   总被引:1,自引:0,他引:1  
During evolution, genomes are subject to genome rearrangements that alter the ordering and orientation of genes on the chromosomes. If a genome consists of a single chromosome (like mitochondrial, chloroplast, or bacterial genomes), the biologically relevant genome rearrangements are (1) inversions--also called reversals--where a section of the genome is excised, reversed in orientation, and reinserted and (2) transpositions, where a section of the genome is excised and reinserted at a new position in the genome; if this also involves an inversion, one speaks of an inverted transposition. To reconstruct ancient events in the evolutionary history of organisms, one is interested in finding an optimal sequence of genome rearrangements that transforms a given genome into another genome. It is well known that this problem is equivalent to the problem of "sorting" a signed permutation into the identity permutation. In this paper, we provide a 1.5-approximation algorithm for sorting by weighted reversals, transpositions and inverted transpositions for biologically realistic weights.  相似文献   

6.
CTRD is a software for computing translocation distance between genomes. It takes two genomes as its input and tests whether one genome can be transformed into the other. If possible, it computes the translocation distance between two genomes, and gives the translocation operation serial. We adopt the fastest known O(n(2)log n) algorithm. Our contributions include (1) give a necessary and sufficient condition to ensure that one genome can be transformed into the other for translocation operations, and (2) develop a software using the fastest known O(n(2)log n) algorithm.  相似文献   

7.
We use the finite Markov chain embedding technique to obtain the distribution of the number of cycles in the breakpoint graph of a random uniform signed permutation. This further gives a very good approximation of the distribution of the reversal distance between two random genomes.  相似文献   

8.
Motivated by the trend of genome sequencing without completing the sequence of the whole genomes, a problem on filling an incomplete multichromosomal genome (or scaffold) I with respect to a complete target genome G was studied. The objective is to minimize the resulting genomic distance between I' and G, where I' is the corresponding filled scaffold. We call this problem the onesided scaffold filling problem. In this paper, we conduct a systematic study for the scaffold filling problem under the breakpoint distance and its variants, for both unichromosomal and multichromosomal genomes (with and without gene repetitions). When the input genome contains no gene repetition (i.e., is a fragment of a permutation), we show that the two-sided scaffold filling problem (i.e., G is also incomplete) is polynomially solvable for unichromosomal genomes under the breakpoint distance and for multichromosomal genomes under the genomic (or DCJ--Double-Cut-and-Join) distance. However, when the input genome contains some repeated genes, even the one-sided scaffold filling problem becomes NP-complete when the similarity measure is the maximum number of adjacencies between two sequences. For this problem, we also present efficient constant-factor approximation algorithms: factor-2 for the general case and factor 1.33 for the one-sided case.  相似文献   

9.
Pranav Kumar  G. Sahoo 《Genomics》2018,110(3):149-153
An important tool for comparing genome analysis is the rearrangement event that can transform one given genome into other. For finding minimum sequence of fission and fusion, we have proposed here an algorithm and have shown a transformation example for converting the source genome into the target genome. The proposed algorithm comprises of circular sequence i.e. “cycle graph” in place of mapping. The main concept of algorithm is based on optimal result of permutation. These sorting processes are performed in constant running time by showing permutation in the form of cycle. In biological instances it has been observed that transposition occurs half of the frequency as that of reversal. In this paper we are not dealing with reversal instead commencing with the rearrangement of fission, fusion as well as transposition.  相似文献   

10.
The total order of genes or markers on a chromosome is crucial for most comparative genomics studies. However, current gene mapping efforts might only suffice to provide a partial order of the genes on a chromosome. Several different genes or markers might be mapped at the same position due to the low resolution of gene mapping or missing data. Moreover, conflicting datasets might give rise to the ambiguity of gene order. In this paper, we consider the reversal distance and breakpoint distance problems for partially ordered genomes. We first prove that these problems are nondeterministic polynomial-time (NP)-hard, and then give an efficient heuristic algorithm to compute the breakpoint distance between partially ordered genomes. The algorithm is based on an efficient approximation algorithm for a natural generalization of the well-known feedback vertex set problem, and has been tested on both simulated and real biological datasets. The experimental results demonstrate that our algorithm is quite effective for estimating the breakpoint distance between partially ordered genomes and for inferring the gene (total) order.  相似文献   

11.
The list of species whose complete DNA sequence have been read is growing steadily, and it is believed that comparative genomics is in its early days. Permutations patterns (groups of genes in some "close" proximity) on gene sequences of genomes across species is being studied under different models, to cope with this explosion of data. The challenge is to (intelligently and efficiently) analyze the genomes in the context of other genomes. In this paper, we present a generalized model that uses three notions, gapped permutation patterns (with gap g), genome clusters, via quorum, K>1, parameter, and, possible multiplicity in the patterns. The task is to automatically discover all permutation patterns (with possible multiplicity), that occur with gap g in at least K of the given m genomes. We present (log mN (I) + /Sigma/log/Sigma/N (O)) time algorithm where m is the number of sequences, each defined on Sigma, N (I) is the size of the input and N (O) is the size of the maximal gene clusters that appear in at least K of the m genomes.  相似文献   

12.
Yue F  Zhang M  Tang J 《BMC genomics》2008,9(Z2):S15

Background

Because of the advent of high-throughput sequencing and the consequent reduction in the cost of sequencing, many organisms have been completely sequenced and most of their genes identified. It thus has become possible to represent whole genomes as ordered lists of gene identifiers and to study the rearrangement of these entities through computational means. As a result, genome rearrangement data has attracted increasing attentions from both biologists and computer scientists as a new type of data for phylogenetic analysis. The main events of genome rearrangements include inversions, transpositions and transversions. To date, GRAPPA and MGR are the most accurate methods for rearrangement phylogeny, both assuming inversion as the only event. However, due to the complexity of computing transposition distance, it is very difficult to analyze datasets when transpositions are dominant.

Results

We extend GRAPPA to handle transpositions. The new method is named GRAPPA-TP, with two major extensions: a heuristic method to estimate transposition distance, and a new transposition median solver for three genomes. Although GRAPPA-TP uses a greedy approach to compute the transposition distance, it is very accurate when genomes are relatively close. The new GRAPPA-TP is available from http://phylo.cse.sc.edu/.

Conclusion

Our extensive testing using simulated datasets shows that GRAPPA-TP is very accurate in terms of ancestor genome inference and phylogenetic reconstruction. Simulation results also suggest that model match is critical in genome rearrangement analysis: it is not accurate to simulate transpositions with other events including inversions.
  相似文献   

13.
Genome rearrangement is an important area in computational biology and bioinformatics. The translocation operation is one of the popular operations for genome rearrangement. It was proved that computing the unsigned translocation distance is NP-hard. In this paper, we present a (1.5 + epsilon)-approximation algorithm for computing unsigned translocation distance which improves upon the best known 1.75-ratio. The running time of our algorithm is O(n2 + (4/epsilon)1.5 square root log(4/epsilon )2(4/epsilon), where n is the total number of genes in the genome.  相似文献   

14.
《Gene》1996,172(1):GC11-GC17
Algorithms inspired by comparative genomics calculate an edit distance between two linear orders based on elementary edit operations such as inversion, transposition and reciprocal translocation. All operations are generally assigned the same weight, simply by default, because no systematic empirical studies exist verifying whether algorithmic outputs involve realistic proportion of each. Nor de we have data on how weights should vary with the length of the inverted or transposed segment of the chromosome. In this paper, we present a rapid algorithm that allows each operation to take on a range of weights, producing an relatively tight bound on the distance between single-chromosome genomes, by means of a greedy search with look-ahead. The efficiency of this algorithm allows us to test random genomes for each parameter setting, to detect gene order similarity and to infer the parameter values most appropriate to the phylogenetic domain under study. We apply this method to genome segments in which the sa me gene order is conserved in Escherichia coli and Bacillus subtilis, as well as to the gene order in human versus Drosophila mitochondrial genomes. In both cases, we conclude that it is most appropriate to assign somewhat more than twice the weight to transpositions and inverted transpositions than to inversions. We also explore segment-length weighting for fungal mitochondrial gene orders.  相似文献   

15.
In this paper, we are interested in the computational complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes or genomic markers, a problem that happens frequently when comparing whole nuclear genomes. Recently, several methods ( [1], [2]) have been proposed that are based on two steps to compute a given (dis)similarity measure M between two genomes G_1 and G_2: first, one establishes a oneto- one correspondence between genes of G_1 and genes of G_2 ; second, once this correspondence is established, it defines explicitly a permutation and it is then possible to quantify their similarity using classical measures defined for permutations, like the number of breakpoints. Hence these methods rely on two elements: a way to establish a one-to-one correspondence between genes of a pair of genomes, and a (dis)similarity measure for permutations. The problem is then, given a (dis)similarity measure for permutations, to compute a correspondence that defines an optimal permutation for this measure. We are interested here in two models to compute a one-to-one correspondence: the exemplar model, where all but one copy are deleted in both genomes for each gene family, and the matching model, that computes a maximal correspondence for each gene family. We show that for these two models, and for three (dis)similarity measures on permutations, namely the number of common intervals, the maximum adjacency disruption (MAD) number and the summed adjacency disruption (SAD) number, the problem of computing an optimal correspondence is NP-complete, and even APXhard for the MAD number and SAD number.  相似文献   

16.
MOTIVATION: Finding genomic distance based on gene order is a classic problem in genome rearrangements. Efficient exact algorithms for genomic distances based on inversions and/or translocations have been found but are complicated by special cases, rare in simulations and empirical data. We seek a universal operation underlying a more inclusive set of evolutionary operations and yielding a tractable genomic distance with simple mathematical form. RESULTS: We study a universal double-cut-and-join operation that accounts for inversions, translocations, fissions and fusions, but also produces circular intermediates which can be reabsorbed. The genomic distance, computable in linear time, is given by the number of breakpoints minus the number of cycles (b-c) in the comparison graph of the two genomes; the number of hurdles does not enter into it. Without changing the formula, we can replace generation and re-absorption of a circular intermediate by a generalized transposition, equivalent to a block interchange, with weight two. Our simple algorithm converts one multi-linear chromosome genome to another in the minimum distance.  相似文献   

17.
MOTIVATION: The double cut and join operation (abbreviated as DCJ) has been extensively used for genomic rearrangement. Although the DCJ distance between signed genomes with both linear and circular (uni- and multi-) chromosomes is well studied, the only known result for the NP-complete unsigned DCJ distance problem is an approximation algorithm for unsigned linear unichromosomal genomes. In this article, we study the problem of computing the DCJ distance on two unsigned linear multichromosomal genomes (abbreviated as UDCJ). RESULTS: We devise a 1.5-approximation algorithm for UDCJ by exploiting the distance formula for signed genomes. In addition, we show that UDCJ admits a weak kernel of size 2k and hence an FPT algorithm running in O(2(2k)n) time.  相似文献   

18.
MOTIVATION: A one-to-one correspondence between the sets of genes in the two genomes being compared is necessary for the notions of breakpoint and reversal distances. To compare genomes where there are paralogous genes, Sankoff formulated the exemplar distance problem as a general version of the genome rearrangement problem. Unfortunately, the problem is NP-hard even for the breakpoint distance. RESULTS: This paper proposes a divide-and-conquer approach for calculating the exemplar breakpoint distance between two genomes with multiple gene families. The combination of our approach and Sankoff's branch-and-bound technique leads to a practical program to answer this question. Tests with both simulated and real datasets show that our program is much more efficient than the existing program that is based only on the branch-and-bound technique. AVAILABILITY: Code for the program is available from the authors.  相似文献   

19.
A central problem in genome rearrangement is finding a most parsimonious rearrangement scenario using certain rearrangement operations. An important problem of this type is sorting a signed genome by reversals and translocations (SBRT). Hannenhalli and Pevzner presented a duality theorem for SBRT which leads to a polynomial time algorithm for sorting a multi-chromosomal genome using a minimum number of reversals and translocations. However, there is one case for which their theorem and algorithm fail. We describe that case and suggest a correction to the theorem and the polynomial algorithm. The solution of SBRT uses a reduction to the problem of sorting a signed permutation by reversals (SBR). The best extant algorithms for SBR require quadratic time. The common approach to solve SBR is by finding a safe reversal using the overlap graph or the interleaving graph of a permutation. We describe a family of signed permutations which proves a quadratic lower bound on the number of affected vertices in the overlap/interleaving graph during any optimal sorting scenario. This implies, in particular, an Omega(n3) lower bound for Bergeron's algorithm.  相似文献   

20.
Genome rearrangement is an important area in computational biology and bioinformatics. The translocation operation is one of the popular operations for genome rearrangement. It was proved that computing the unsigned translocation distance is NP-hard. In this paper, we present a (1.5+epsiv)-approximation algorithm for computing the unsigned translocation distance that improves upon the best known 1.75-ratio. The runtime of our algorithm is O(n2+(4/epsiv)1.5radic(log(4/epsiv)24/epsiv)), where n is the total number of genes in the genome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号