首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 85 毫秒
1.
Sorting permutations by transpositions is an important problem in genome rearrangements. A transposition is a rearrangement operation in which a segment is cut out of the permutation and pasted in a different location. The complexity of this problem is still open and it has been a 10-year-old open problem to improve the best known 1.5-approximation algorithm. In this paper, we provide a 1.375-approximation algorithm for sorting by transpositions. The algorithm is based on a new upper bound on the diameter of 3-permutations. In addition, we present some new results regarding the transposition diameter: We improve the lower bound for the transposition diameter of the symmetric group and determine the exact transposition diameter of simple permutations  相似文献   

2.
Yue F  Zhang M  Tang J 《BMC genomics》2008,9(Z2):S15

Background

Because of the advent of high-throughput sequencing and the consequent reduction in the cost of sequencing, many organisms have been completely sequenced and most of their genes identified. It thus has become possible to represent whole genomes as ordered lists of gene identifiers and to study the rearrangement of these entities through computational means. As a result, genome rearrangement data has attracted increasing attentions from both biologists and computer scientists as a new type of data for phylogenetic analysis. The main events of genome rearrangements include inversions, transpositions and transversions. To date, GRAPPA and MGR are the most accurate methods for rearrangement phylogeny, both assuming inversion as the only event. However, due to the complexity of computing transposition distance, it is very difficult to analyze datasets when transpositions are dominant.

Results

We extend GRAPPA to handle transpositions. The new method is named GRAPPA-TP, with two major extensions: a heuristic method to estimate transposition distance, and a new transposition median solver for three genomes. Although GRAPPA-TP uses a greedy approach to compute the transposition distance, it is very accurate when genomes are relatively close. The new GRAPPA-TP is available from http://phylo.cse.sc.edu/.

Conclusion

Our extensive testing using simulated datasets shows that GRAPPA-TP is very accurate in terms of ancestor genome inference and phylogenetic reconstruction. Simulation results also suggest that model match is critical in genome rearrangement analysis: it is not accurate to simulate transpositions with other events including inversions.
  相似文献   

3.

Background

One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or genome rearrangements, that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed permutations of the form \(\pi =(\pi _1 \pi _2 \ldots \pi _n)\), and in that case we can consider without loss of generality that one of them is the identity permutation \(\iota _n =(1 2 \ldots n)\), and that we just need to sort the other (i.e., transform it into \(\iota _n\)). The most studied genome rearrangement events are reversals, where a segment of the genome is reversed and reincorporated at the same location; and transpositions, where two consecutive segments are exchanged. Many variants, e.g., combining different types of (possibly constrained) rearrangements, have been proposed in the literature. One of them considers that the number of genes involved, in a reversal or a transposition, is never greater than two, which is known as the problem of sorting by super short operations (or SSOs).

Results and conclusions

All problems considering SSOs in permutations have been shown to be in \(\mathsf {P}\), except for one, namely sorting signed circular permutations by super short reversals and super short transpositions. Here we fill this gap by introducing a new graph structure called cyclic permutation graph and providing a series of intermediate results, which allows us to design a polynomial algorithm for sorting signed circular permutations by super short reversals and super short transpositions.
  相似文献   

4.
In genome duplexes that exceed 100 kb the frequency distributions of their trinucleotides (triplet profiles) are the same in both strands. This remarkable symmetry, sometimes called Chargaff's second parity rule, is not the result of base pairing, but can be explained as the result of countless inversions and inverted transpositions that occurred throughout evolution (G. Albrecht-Buehler, 2006, Proc. Natl. Acad. Sci. USA 103, 17828-17833). Furthermore, comparing the triplet profiles of genomes from a large number of different taxa and species revealed that they were not only strand-symmetrical, but even surprisingly similar to one another (majority profile; G. Albrecht-Buehler, 2007, Genomics 89, 596-601). The present article proposes that the same inversion/transposition mechanism(s) that created the strand symmetry may also explain the existence of the majority profile. Thus they may be key factors in the creation of an almost universal "format" in which genome sequences are written. One may speculate that this universality of genome format may facilitate horizontal gene transfer and, thus, accelerate evolution.  相似文献   

5.
One possible model to study genome evolution is to represent genomes as permutations of genes and compute distances based on the minimum number of certain operations (rearrangements) needed to transform one permutation into another. Under this model, the shorter the distance, the closer the genomes are. Two operations that have been extensively studied are the reversal and the transposition. A reversal is an operation that reverses the order of the genes on a certain portion of the permutation. A transposition is an operation that "cuts" a certain portion of the permutation and "pastes" it elsewhere in the same permutation. In this note, we show that the reversal and transposition distance of the signed permutation pi(n) = (-1 -2.-(n - 1)-n) with respect to the identity is left floor n/2 right floor + 2 for all n>or=3. We conjecture that this value is the diameter of the permutation group under these operations.  相似文献   

6.
We study three classical problems of genome rearrangement--sorting, halving, and the median problem--in a restricted double cut and join (DCJ) model. In the DCJ model, introduced by Yancopoulos et al., we can represent rearrangement events that happen in multichromosomal genomes, such as inversions, translocations, fusions, and fissions. Two DCJ operations can mimic transpositions or block interchanges by first extracting an appropriate segment of a chromosome, creating a temporary circular chromosome, and then reinserting it in its proper place. In the restricted model, we are concerned with multichromosomal linear genomes and we require that each circular excision is immediately followed by its reincorporation. Existing linear-time DCJ sorting and halving algorithms ignore this reincorporation constraint. In this article, we propose a new algorithm for the restricted sorting problem running in O(n log n) time, thus improving on the known quadratic time algorithm. We solve the restricted halving problem and give an algorithm that computes a multilinear halved genome in linear time. Finally, we show that the restricted median problem is NP-hard as conjectured.  相似文献   

7.
Abstract.— The role played by gene transpositions during the evolution of eukaryotic genomes is still poorly understood and indeed has been analyzed in detail only in nematodes. In Drosophila , a limited number of transpositions have been detected by comparing the chromosomal location of genes between different species. The relative importance of gene transposition versus other types of chromosomal rearrangements, for example, inversions, has not yet been evaluated. Here, we use physical mapping to perform an extensive search for long-distance gene transpositions and assess their impact during the evolution of the Drosophila genome. We compare the relative order of 297 molecular markers that cover 60% of the euchromatic fraction of the genome between two related Drosophila species and conclude that the frequency of gene transpositions is very low, namely one order of magnitude lower than that of nematodes. In addition, gene transpositions seem to be events almost exclusively associated with genes of repetitive nature such as the Histone gene complex ( HIS-C ).  相似文献   

8.
MOTIVATION: In spite of a well-known fact that genome rearrangements are supposed to be viewed in the light of the evolutionary relationships within and between the species involved, no formal underlying framework based on the evolutionary considerations for treating the questions arising in the area has been proposed. If such an underlying framework is provided, all the basic questions in the area can be posed in a biologically more appropriate and useful form: e.g., the similarity between two genomes can then be computed via the nearest ancestor, rather than 'directly', ignoring the evolutionary connections. RESULTS: We outline an evolution-based general framework for answering questions related to the multiple genome rearrangement. In the proposed model, the evolutionary genome graph (EG-graph) encapsulates an evolutionary history of a genome family. For a set of all EG-graphs, we introduce a family of similarity measures, each defined via a fixed set of genome transformations. Given a set of genomes and restricting ourselves to the transpositions, an algorithm for constructing an EG-graph is presented. We also present the experimental results in the form of an EG-graph for a set of concrete genomes (for several species). This EG-graph turns out to be very close to the corresponding known phylogenetic tree.  相似文献   

9.
An algorithm to enumerate sorting reversals for signed permutations.   总被引:1,自引:0,他引:1  
The rearrangement distance between single-chromosome genomes can be estimated as the minimum number of inversions required to transform the gene ordering observed in one into that observed in the other. This measure, known as "inversion distance," can be computed as the reversal distance between signed permutations. During the past decade, much progress has been made both on the problem of computing reversal distance and on the related problem of finding a minimum-length sequence of reversals, which is known as "sorting by reversals." For most problem instances, however, many minimum-length sequences of reversals exist, and in the absence of auxiliary information, no one is of greater value than the others. The problem of finding all minimum-length sequences of reversals is thus a natural generalization of sorting by reversals, yet it has received little attention. This problem reduces easily to the problem of finding all "sorting reversals" of one permutation with respect to another - that is, all reversals rho such that, if rho is applied to one permutation, then the reversal distance of that permutation from the other is decreased. In this paper, an efficient algorithm is derived to solve the problem of finding all sorting reversals, and experimental results are presented indicating that, while the new algorithm does not represent a significant improvement in asymptotic terms (it takes O(n(3)) time, for permutations of size n; the problem can now be solved by brute force in Theta(n(3)) time), it performs dramatically better in practice than the best known alternative. An implementation of the algorithm is available at www.cse.ucsc.edu/~acs.  相似文献   

10.
Sorting by reciprocal translocations via reversals theory.   总被引:1,自引:0,他引:1  
The understanding of genome rearrangements is an important endeavor in comparative genomics. A major computational problem in this field is finding a shortest sequence of genome rearrangements that transforms, or sorts, one genome into another. In this paper we focus on sorting a multi-chromosomal genome by translocations. We reveal new relationships between this problem and the well studied problem of sorting by reversals. Based on these relationships, we develop two new algorithms for sorting by reciprocal translocations, which mimic known algorithms for sorting by reversals: a score-based method building on Bergeron's algorithm, and a recursive procedure similar to the Berman-Hannenhalli method. Though their proofs are more involved, our procedures for reciprocal translocations match the complexities of the original ones for reversals.  相似文献   

11.
The plastid genome from subclover, Trifolium subterraneum, is unusual in a variety of respects, compared with other land-plant chloroplast DNAs. Gene mapping of subclover chloroplast DNA reveals major structural reorganization of the genome. Ten clusters of genes are rearranged in both order and orientation. Eight large inversions are sufficient to explain this reorganization; however, the actual evolutionary changes may have been more complex. For example, a fine-scale analysis of a set of ribosomal protein genes reveals the occurrence of insertions, deletions, and transpositions. Associated with this unusually unstable genome are two structural features potentially involved in the rearrangements. A dispersed family of repeats, with each element about 1 kb in length, is present in at least six copies. A survey of a wide taxonomic range of species indicates that these elements are unique to the chloroplast DNAs of subclover and two closely related species. Several of the repeated elements are associated with genomic rearrangements, and one repeat is inserted within a normally highly conserved series of genes. This set of dispersed repeats may be the first family of transposable elements found in any organelle genome. In addition, the subclover genome is much larger than those in other closely related legumes, even when one takes into account the presence of the repeated elements. Some of the extra DNA has no sequence similarity to other chloroplast genomes and may represent insertion of DNA from another genome. These unusual features are not found in the structurally stable chloroplast genomes of other vascular plants and may, therefore, be implicated in the rapid and major reorganization of the chloroplast DNA in subclover.  相似文献   

12.
Markov AV  Zakharov IA 《Genetika》2006,42(11):1547-1557
Relative frequencies of large and small genome rearrangements (inversions and transpositions) in the evolution of prokaryotic genomes can be evaluated using the ratio between the index S (the ratio of the number of identical pairs of neighboring genes in two genomes to the total number of genes in the sample of interest) and 1 - 6 x L/n, where L is the mean difference in intergenic distances and n is the number of genes in the sample. The S value uniformly decreases with the fixation of genome rearrangements, while the decrease rate of I - 6 x L/n is determined by the rearrangement size. Specifically, large inversions and transpositions lead to a dramatic decrease in the index value, while small rearrangements result in an insignificant decrease. The ratio between these indices was computed for twenty pairs of closely related species belonging to different groups of bacteria and archaea. The pairs examined strongly differed in the relative frequency of large and small rearrangements. However, computer simulation showed that the total variation can be reproduced with the same input parameters of the model. This means that the differences observed can be stochastic and can be interpreted without assuming different mechanisms and factors of genome rearrangements for different groups of prokaryotes. Relative frequencies of large and small rearrangements displayed no noticeable correlations with taxonomic position, total rate of rearrangement fixation, habitation conditions, and the abundance of transposons and repetitive sequences. It is suggested that, in some cases, phage activity increases the frequency of large genome rearrangements.  相似文献   

13.
14.
The problem of sorting by transpositions asks for a sequence of adjacent interval exchanges that sorts a permutation and is of the shortest possible length. The distance of the permutation is defined as the length of such a sequence. Despite the apparently intuitive nature of this problem, introduced in 1995 by Bafna and Pevzner, the complexity of both finding an optimal sequence and computing the distance remains open today. In this paper, we establish connections between two different graph representations of permutations, which allows us to compute the distance of a few nontrivial classes of permutations in linear time and space, bypassing the use of any graph structure. By showing that every permutation can be obtained from one of these classes, we prove a new tight upper bound on the transposition distance. Finally, we give improved bounds on some other families of permutations and prove formulas for computing the exact distance of other classes of permutations, again in polynomial time  相似文献   

15.
J D Palmer  W F Thompson 《Cell》1982,29(2):537-550
We examined the arrangement of sequences common to seven angiosperm chloroplast genomes. The chloroplast DNAs of spinach, petunia and cucumber are essentially colinear. They share with the corn chloroplast genome a large inversion of approximately 50 kb relative to the genomes of three legumes--mung bean, pea and broad bean. There is one additional rearrangement, a second, smaller inversion within the 50 kb inversion, which is specific to the corn genome. These two changes are the only detectable rearrangements that have occurred during the evolution of the species examined (corn, spinach, petunia, cucumber and mung bean) whose chloroplast genomes contain a large inverted repeat sequence of 22-25 kb. In contrast, we find extensive sequence rearrangements in comparing the pea and broad bean genomes, both of which have deleted one entire segment of the inverted repeat, and also in comparing each of these to the mung bean genome. Thus there is a relatively stable arrangement of sequences in those genomes with the inverted repeat and a much more dynamic arrangement in those that have lost it. We discuss several explanations for this correlation, including the possibility that the inverted repeat may play a direct role in maintaining a conserved arrangement of chloroplast DNA sequences.  相似文献   

16.
Relative frequencies of large and small genome rearrangements (inversions and transpositions) in the evolution of prokaryotic genomes can be evaluated using the ratio between the index S (the ratio of the number of identical pairs of neighboring genes in two genomes to the total number of genes in the sample of interest) and 1–6L/n, where L is the mean difference in intergenic distances and n is the number of genes in the sample. The S value uniformly decreases with the fixation of genome rearrangements, while the decrease rate of 1–6L/n is determined by the rearrangement size. Specifically, large inversions and transpositions lead to a dramatic decrease in the index value, while small rearrangements result in an insignificant decrease. The ratio between these indices was computed for twenty pairs of closely related species belonging to different groups of bacteria and archaea. The pairs examined strongly differed in the relative frequency of large and small rearrangements. However, computer simulation showed that the total variation can be reproduced with the same input parameters of the model. This means that the differences observed can be stochastic and can be interpreted without assuming different mechanisms and factors of genome rearrangements for different groups of prokaryotes. Relative frequencies of large and small rearrangements displayed no noticeable correlations with taxonomic position, total rate of rearrangement fixation, habitation conditions, and the abundance of transposons and repetitive sequences. It is suggested that, in some cases, phage activity increases the frequency of large genome rearrangements.  相似文献   

17.
A central problem in genome rearrangement is finding a most parsimonious rearrangement scenario using certain rearrangement operations. An important problem of this type is sorting a signed genome by reversals and translocations (SBRT). Hannenhalli and Pevzner presented a duality theorem for SBRT which leads to a polynomial time algorithm for sorting a multi-chromosomal genome using a minimum number of reversals and translocations. However, there is one case for which their theorem and algorithm fail. We describe that case and suggest a correction to the theorem and the polynomial algorithm. The solution of SBRT uses a reduction to the problem of sorting a signed permutation by reversals (SBR). The best extant algorithms for SBR require quadratic time. The common approach to solve SBR is by finding a safe reversal using the overlap graph or the interleaving graph of a permutation. We describe a family of signed permutations which proves a quadratic lower bound on the number of affected vertices in the overlap/interleaving graph during any optimal sorting scenario. This implies, in particular, an Omega(n3) lower bound for Bergeron's algorithm.  相似文献   

18.
S Sun  R Ke  D Hughes  M Nilsson  DI Andersson 《PloS one》2012,7(8):e42639
Genome rearrangements have important effects on bacterial phenotypes and influence the evolution of bacterial genomes. Conventional strategies for characterizing rearrangements in bacterial genomes rely on comparisons of sequenced genomes from related species. However, the spectra of spontaneous rearrangements in supposedly homogenous and clonal bacterial populations are still poorly characterized. Here we used 454 pyrosequencing technology and a 'split mapping' computational method to identify unique junction sequences caused by spontaneous genome rearrangements in chemostat cultures of Salmonella enterica Var. Typhimurium LT2. We confirmed 22 unique junction sequences with a junction microhomology more than 10 bp and this led to an estimation of 51 true junction sequences, of which 28, 12 and 11 were likely to be formed by deletion, duplication and inversion events, respectively. All experimentally confirmed rearrangements had short inverted (inversions) or direct (deletions and duplications) homologous repeat sequences at the endpoints. This study demonstrates the feasibility of genome wide characterization of spontaneous genome rearrangements in bacteria and the very high steady-state frequency (20-40%) of rearrangements in bacterial populations.  相似文献   

19.
Many approaches to compute the genomic distance are still limited to genomes with the same content, without duplicated markers. However, differences in the gene content are frequently observed and can reflect important evolutionary aspects. While duplicated markers can hardly be handled by exact models, when duplicated markers are not allowed, a few polynomial time algorithms that include genome rearrangements, insertions and deletions were already proposed. In an attempt to improve these results, in the present work we give the first linear time algorithm to compute the distance between two multichromosomal genomes with unequal content, but without duplicated markers, considering insertions, deletions and double cut and join (DCJ) operations. We derive from this approach algorithms to sort one genome into another one also using DCJ operations, insertions and deletions. The optimal sorting scenarios can have different compositions and we compare two types of sorting scenarios: one that maximizes and one that minimizes the number of DCJ operations with respect to the number of insertions and deletions. We also show that, although the triangle inequality can be disrupted in the proposed genomic distance, it is possible to correct this problem adopting a surcharge on the number of non-common markers. We use our method to analyze six species of Rickettsia, a group of obligate intracellular parasites, and identify preliminary evidence of clusters of deletions.  相似文献   

20.
《Gene》1996,172(1):GC11-GC17
Algorithms inspired by comparative genomics calculate an edit distance between two linear orders based on elementary edit operations such as inversion, transposition and reciprocal translocation. All operations are generally assigned the same weight, simply by default, because no systematic empirical studies exist verifying whether algorithmic outputs involve realistic proportion of each. Nor de we have data on how weights should vary with the length of the inverted or transposed segment of the chromosome. In this paper, we present a rapid algorithm that allows each operation to take on a range of weights, producing an relatively tight bound on the distance between single-chromosome genomes, by means of a greedy search with look-ahead. The efficiency of this algorithm allows us to test random genomes for each parameter setting, to detect gene order similarity and to infer the parameter values most appropriate to the phylogenetic domain under study. We apply this method to genome segments in which the sa me gene order is conserved in Escherichia coli and Bacillus subtilis, as well as to the gene order in human versus Drosophila mitochondrial genomes. In both cases, we conclude that it is most appropriate to assign somewhat more than twice the weight to transpositions and inverted transpositions than to inversions. We also explore segment-length weighting for fungal mitochondrial gene orders.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号