首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The order of genes in the genomes of species can change during evolution and can provide information about their phylogenetic relationship. An interesting method to infer the phylogenetic relationship from the gene orders is to use different types of rearrangement operations and to find possible rearrangement scenarios using these operations. One of the most common rearrangement operations is reversals, which reverse the order of a subset of neighbored genes. In this paper, we study the problem to find the ancestral gene order for three species represented by their gene orders. The rearrangement scenario should use a minimal number of reversals and no other rearrangement operations. This problem is called the Median problem and is known to be NP--complete. In this paper, we describe a heuristic algorithm for finding solutions to the Median problem that searches for rearrangement scenarios with the additional property that gene groups should not be destroyed by reversal operations. The concept of conserved intervals for signed permutations is used to describe such gene groups. We show experimentally, for different types of test problems, that the proposed algorithm produces very good results compared to other algorithms for the Median problem. We also integrate our reversal selection procedure into the well-known MGR and GRAPPA algorithms and show that they achieve a significant speedup while obtaining solutions of the same quality as the original algorithms on the test problems.  相似文献   

2.
SUMMARY: We present the web-based program CREx for heuristically determining pairwise rearrangement events in unichromosomal genomes. CREx considers transpositions, reverse transpositions, reversals and tandem-duplication-random-loss (TDRL) events. It supports the user in finding parsimonious rearrangement scenarios given a phylogenetic hypothesis. CREx is based on common intervals, which reflect genes that appear consecutively in several of the input gene orders. AVAILABILITY: CREx is freely available at http://pacosy.informatik.uni-leipzig.de/crex  相似文献   

3.
A central problem in genome rearrangement is finding a most parsimonious rearrangement scenario using certain rearrangement operations. An important problem of this type is sorting a signed genome by reversals and translocations (SBRT). Hannenhalli and Pevzner presented a duality theorem for SBRT which leads to a polynomial time algorithm for sorting a multi-chromosomal genome using a minimum number of reversals and translocations. However, there is one case for which their theorem and algorithm fail. We describe that case and suggest a correction to the theorem and the polynomial algorithm. The solution of SBRT uses a reduction to the problem of sorting a signed permutation by reversals (SBR). The best extant algorithms for SBR require quadratic time. The common approach to solve SBR is by finding a safe reversal using the overlap graph or the interleaving graph of a permutation. We describe a family of signed permutations which proves a quadratic lower bound on the number of affected vertices in the overlap/interleaving graph during any optimal sorting scenario. This implies, in particular, an Omega(n3) lower bound for Bergeron's algorithm.  相似文献   

4.
We propose new algorithms for computing pairwise rearrangement scenarios that conserve the combinatorial structure of genomes. More precisely, we investigate the problem of sorting signed permutations by reversals without breaking common intervals. We describe a combinatorial framework for this problem that allows us to characterize classes of signed permutations for which one can compute, in polynomial time, a shortest reversal scenario that conserves all common intervals. In particular, we define a class of permutations for which this computation can be done in linear time with a very simple algorithm that does not rely on the classical Hannenhalli-Pevzner theory for sorting by reversals. We apply these methods to the computation of rearrangement scenarios between permutations obtained from 16 synteny blocks of the X chromosomes of the human, mouse, and rat  相似文献   

5.
In comparative genomics, gene order data is often modeled as signed permutations. A classical problem for genome comparison is to detect common intervals in permutations, that is, genes that are colocalized in several species, indicating that they remained grouped during evolution. A second largely studied problem related to gene order is to compute a minimum scenario of reversals that transforms a signed permutation into another. Several studies began to mix the two problems and it was observed that their results are not always compatible: Often, parsimonious scenarios of reversals break common intervals. If a scenario does not break any common interval, it is called perfect. In two recent studies, Berard et al. defined a class of permutations for which building a perfect scenario of reversals sorting a permutation was achieved in polynomial time and stated as an open question whether it is possible to decide, given a permutation, if there exists a minimum scenario of reversals that is perfect. In this paper, we give a solution to this problem and prove that this widens the class of permutations addressed by the aforementioned studies. We implemented and tested this algorithm on gene order data of chromosomes from several mammal species and we compared it to other methods. The algorithm helps to choose among several possible scenarios of reversals and indicates that the minimum scenario of reversals is not always the most plausible  相似文献   

6.
An algorithm to enumerate sorting reversals for signed permutations.   总被引:1,自引:0,他引:1  
The rearrangement distance between single-chromosome genomes can be estimated as the minimum number of inversions required to transform the gene ordering observed in one into that observed in the other. This measure, known as "inversion distance," can be computed as the reversal distance between signed permutations. During the past decade, much progress has been made both on the problem of computing reversal distance and on the related problem of finding a minimum-length sequence of reversals, which is known as "sorting by reversals." For most problem instances, however, many minimum-length sequences of reversals exist, and in the absence of auxiliary information, no one is of greater value than the others. The problem of finding all minimum-length sequences of reversals is thus a natural generalization of sorting by reversals, yet it has received little attention. This problem reduces easily to the problem of finding all "sorting reversals" of one permutation with respect to another - that is, all reversals rho such that, if rho is applied to one permutation, then the reversal distance of that permutation from the other is decreased. In this paper, an efficient algorithm is derived to solve the problem of finding all sorting reversals, and experimental results are presented indicating that, while the new algorithm does not represent a significant improvement in asymptotic terms (it takes O(n(3)) time, for permutations of size n; the problem can now be solved by brute force in Theta(n(3)) time), it performs dramatically better in practice than the best known alternative. An implementation of the algorithm is available at www.cse.ucsc.edu/~acs.  相似文献   

7.
MOTIVATION: The total order of the genes or markers on a chromosome inherent in its representation as a signed per-mutation must often be weakened to a partial order in the case of real data. This is due to lack of resolution (where several genes are mapped to the same chromosomal position) to missing data from some of the datasets used to compile a gene order, and to conflicts between these datasets. The available genome rearrangement algorithms, however, require total orders as input. A more general approach is needed to handle rearrangements of gene partial orders. RESULTS: We formalize the uncertainty in gene order data by representing a chromosome from each genome as a partial order, summarized by a directed acyclic graph (DAG). The rearrangement problem is then to infer a minimal sequence of reversals for transforming any topological sort of one DAG to any one of the other DAG. Each topological sort represents a possible linearization compatible with all the datasets on the chromosome. The set of all possible topological sorts is embedded in each DAG by appropriately augmenting the edge set, so that it becomes a general directed graph (DG). The DGs representing chromosomes of two genomes are combined to produce a bicoloured graph from which we extract a maximal decomposition into alternating coloured cycles, and from which, in turn, an optimal sequence of reversals can usually be identified. We test this approach on simulated incomplete comparative maps and on cereal chromosomal maps drawn from the Gramene browser.  相似文献   

8.
The common intervals of two permutations on n elements are the subsets of terms contiguous in both permutations. They constitute the most basic representation of conserved local order. We use d, the size of the symmetric difference (the complement of the common intervals) of the two subsets of 2({1,n}) thus determined by two permutations, as an evolutionary distance between the gene orders represented by the permutations. We consider the Steiner Tree problem in the space (2({1,n}), d) as the basis for constructing phylogenetic trees, including ancestral gene orders. We extend this to genomes with unequal gene content and to genomes containing gene families. Applied to streptophyte phylogeny, our method does not support the positioning of the complex algae Charales as a sister group to the land plants. Simulations show that the method, though unmotivated by any specific model of genome rearrangement, accurately reconstructs a tree from artificial genome data generated by random inversions deriving each genome from its ancestor on this tree.  相似文献   

9.
In comparative genomics, differences or similarities of gene orders are determined to predict functional relations of genes or phylogenetic relations of genomes. For this purpose, various combinatorial models can be used to specify gene clusters--groups of genes that are co-located in a set of genomes. Several approaches have been proposed to reconstruct putative ancestral gene clusters based on the gene order of contemporary species. One prevalent and natural reconstruction criterion is consistency: For a set of reconstructed gene clusters, there should exist a gene order that comprises all given clusters. For permutation-based gene cluster models, efficient methods exist to verify this condition. In this article, we discuss the consistency problem for different gene cluster models on sequences with restricted gene multiplicities. Our results range from linear-time algorithms for the simple model of adjacencies to NP-completeness proofs for more complex models like common intervals.  相似文献   

10.
Genome rearrangement with gene families   总被引:1,自引:0,他引:1  
MOTIVATION: The theory and practice of genome rearrangement analysis breaks down in the biologically widespread contexts where each gene may be present in a number of copies, not necessarily contiguous. In some of these contexts it is, however, appropriate to ask which members of each gene family in two genomes G and H, lengths lG and lH, are its true exemplars, i.e. which best reflect the original position of the ancestral gene in the common ancestor genome. This entails a search for the two exemplar strings of same length n (= number of gene families, including singletons), having the smallest possible rearrangement distance: the exemplar distance. RESULTS: A branch and bound algorithm calculates these distances efficiently when based on easily calculated traditional rearrangement distances, such as signed reversals distance or breakpoint distance, which also satisfy a property of monotonicity in the number of genes. Simulations show that in two random genomes, the expected exemplar distance/n is sensitive to the number and size of gene families, but approaches 1 as the number of singleton families increases. When the basic rearrangement distance is just the number of breakpoints, the expected cost of computing the exemplar breakpoints distance (EBD), as measured by total calls to the underlying breakpoint distance routine, is highly dependent on both n and the configuration of gene families. On the other hand, basing exemplar distance on exemplar reversals distance (ERD), the expected computing cost depends on the configuration of gene families but is not sensitive to n. AVAILABILITY: Code for EBD and ERD is available from the author or may be accessed at http://www.crm.umontreal.ca/viart/exemplar_di s. html CONTACT: sankoff@ere.umontreal.ca.  相似文献   

11.
Multiple genome rearrangement methodology facilitates the inference of animal phylogeny from gene orders on the mitochondrial genome. The breakpoint distance is preferable to other, highly correlated but computationally more difficult, genomic distances when applied to these data. A number of theories of metazoan evolution are compared to phylogenies reconstructed by ancestral genome optimization, using a minimal total breakpoints criterion. The notion of unambiguously reconstructed segments is introduced as a way of extracting the invariant aspects of multiple solutions for a given ancestral genome; this enables a detailed reconstruction of the evolution of non-tRNA mitochondrial gene order. Received: 15 July 1998 / Accepted: 5 March 1999  相似文献   

12.
Canine tricuspid valve malformation (CTVM) maps to canine chromosome 9 (CFA9), in a region syntenic with gene-dense human chromosome 17q. To define synteny blocks, we analyzed 148 markers on CFA9 using radiation hybrid mapping and established a four-way comparative map for human, mouse, rat, and dog. We identified a large number of rearrangements, allowing us to reconstruct the evolutionary history of individual synteny blocks and large chromosomal segments. A most parsimonious rearrangement scenario for all four species reveals that human chromosome 17q differs from CFA9 and the syntenic rodent chromosomes through two macroreversals of 9.2 and 23 Mb. Compared to a recovered ancestral gene order, CFA9 has undergone 11 reversals of <3 Mb and 2 reversals of >3 Mb. Interspecies reuse of breakpoints for micro- and macrorearrangements was observed. Gene order and content of the ctvm interval are best extrapolated from murine data, showing that multispecies genome rearrangement scenarios contribute to identifying gene content in canine mapping studies.  相似文献   

13.
The order of genes in genomes provides extensive information. In comparative genomics, differences or similarities of gene orders are determined to predict functional relations of genes or phylogenetic relations of genomes. For this purpose, various combinatorial models can be used to identify gene clusters—groups of genes that are colocated in a set of genomes. We introduce a unified approach to model gene clusters and define the problem of labeling the inner nodes of a given phylogenetic tree with sets of gene clusters. Our optimization criterion in this context combines two properties: parsimony, i.e., the number of gains and losses of gene clusters has to be minimal, and consistency, i.e., for each ancestral node, there must exist at least one potential gene order that contains all the reconstructed clusters. We present and evaluate an exact algorithm to solve this problem. Despite its exponential worst-case time complexity, our method is suitable even for large-scale data. We show the effectiveness and efficiency on both simulated and real data.  相似文献   

14.
A preliminary step to most comparative genomics studies is the annotation of chromosomes as ordered sequences of genes. Different genetic mapping techniques often give rise to different maps with unequal gene content and sets of unordered neighboring genes. Only partial orders can thus be obtained from combining such maps. However, once a total order O is known for a given genome, it can be used as a reference to order genes of a closely related species characterized by a partial order P. Our goal is to find a linearization of P that is as close as possible to O, in term of a given genomic distance. We first prove NP-completeness complexity results considering the breakpoint and the common interval distances. We then focus on the breakpoint distance and give a dynamic programming algorithm whose running time is exponential for general partial orders, but polynomial when the partial order is derived from a bounded number of genetic maps. A time-efficient greedy heuristic is then given for the general case and is empirically shown to produce solutions within 10% of the optimal solution, on simulated data. Applications to the analysis of grass genomes are presented.  相似文献   

15.
Retracing the trajectories of past genetic events is crucial to understand the structure of the genome, both in individuals and across populations. A haplotype describes a string of polymorphic sites along a DNA segment. Haplotype diversity is due to mutations creating new variants, and to recombinations and gene conversions that mix and redistribute these variants among individual chromosomes in populations. A number of studies have revealed a relatively simple pattern of haplotype diversity in the human genome, dominated by a few common haplotypes representing founder ancestral ones. New haplotypes are usually rare and have a limited geographic distribution. We propose a method to derive a new haplotype from a set of putative ancestral haplotypes, once mutations in place, through minimal recombination and gene conversion pathways. We describe classes of pathways that represent the whole set of minimal pathways leading to a new haplotype. We show that obtaining this set of pathways can be represented as a problem of finding "secondary structures" of minimum energy. We present a polynomial algorithm solving this folding problem.  相似文献   

16.

Background  

A classical problem in studying genome rearrangements is understanding the series of rearrangement events involved in transforming one genome into another in accordance with the parsimonious principle when two genomes with the same set of genes differ in gene order. The most studied event is the reversal, but an increasing number of reports have considered reversals along with other genome rearrangement events. Some recent studies have investigated the use of reversals and block-interchanges simultaneously with a weight proportion of 1:2. However, there has been less progress towards exploring additional combinations of weights.  相似文献   

17.
MOTIVATION: Algorithms for phylogenetic tree reconstruction based on gene order data typically repeatedly solve instances of the reversal median problem (RMP) which is to find for three given gene orders a fourth gene order (called median) with a minimal sum of reversal distances. All existing algorithms of this type consider only one median for each RMP instance even when a large number of medians exist. A careful selection of one of the medians might lead to better phylogenetic trees. RESULTS: We propose a heuristic algorithm amGRP for solving the multiple genome rearrangement problem (MGRP) by repeatedly solving instances of the RMP taking all medians into account. Algorithm amGRP uses a branch-and-bound method that branches over medians from a selected subset of all medians for each RMP instance. Different heuristics for selecting the subsets have been investigated. To show that the medians for RMP vary strongly with respect to different properties that are likely to be relevant for phylogenetic tree reconstruction, the set of all medians has been investigated for artificial datasets and mitochondrial DNA (mtDNA) gene orders. Phylogenetic trees have been computed for a large set of randomly generated gene orders and two sets of mtDNA gene order data for different animal taxa with amGRP and with two standard approaches for solving the MGRP (GRAPPA-DCM and MGR). The results show that amGRP outperforms both other methods with respect to solution quality and computation time on the test data. AVAILABILITY: The source code of amGRP, additional results and the test instances used in this paper are freely available from the authors.  相似文献   

18.
ABSTRACT: BACKGROUND: The ancestries of genes form gene trees which do not necessarily have the same topology as the species tree due to incomplete lineage sorting. Available algorithms determining the probability of a gene tree given a species tree require exponential computational runtime. RESULTS: In this paper, we provide a polynomial time algorithm to calculate the probability of a ranked gene tree topology for a given species tree, where a ranked tree topology is a tree topology with the internal vertices being ordered. The probability of a gene tree topology can thus be calculated in polynomial time if the number of orderings of the internal vertices is a polynomial number. However, the complexity of calculating the probability of a gene tree topology with an exponential number of rankings for a given species tree remains unknown. CONCLUSIONS: Polynomial algorithms for calculating ranked gene tree probabilities may become useful in developing methodology to infer species trees based on a collection of gene trees, leading to a more accurate reconstruction of ancestral species relationships.  相似文献   

19.
We review the combinatorial optimization problems in calculating edit distances between genomes and phylogenetic inference based on minimizing gene order changes. With a view to avoiding the computational cost and the "long branches attract" artifact of some tree-building methods, we explore the probabilization of genome rearrangement models prior to developing a methodology based on branch-length invariants. We characterize probabilistically the evolution of the structure of the gene adjacency set for reversals on unsigned circular genomes and, using a nontrivial recurrence relation, reversals on signed genomes. Concepts from the theory of invariants developed for the phylogenetics of homologous gene sequences can be used to derive a complete set of linear invariants for unsigned reversals, as well as for a mixed rearrangement model for signed genomes, though not for pure transposition or pure signed reversal models. The invariants are based on an extended Jukes-Cantor semigroup. We illustrate the use of these invariants to relate mitochondrial genomes from a number of invertebrate animals.  相似文献   

20.
Most reported examples of change in vertebrate mitochondrial (mt) gene order could be explained by a tandem duplication followed by random loss of redundant genes (tandem duplication-random loss [TDRL] model). Under this model of evolution, independent loss of genes arising from a single duplication in an ancestral species are predicted, and remnant pseudogenes expected, intermediate states that may remain in rearranged genomes. However, evidence for this is rare and largely scattered across vertebrate lineages. Here, we report new derived mt gene orders in the vertebrate "WANCY" region of four closely related caecilian amphibians. The novel arrangements found in this genomic region (one of them is convergent with the derived arrangement of marsupials), presence of pseudogenes, and positions of intergenic spacers fully satisfy predictions from the TDRL model. Our results, together with comparative data for the available vertebrate complete mt genomes, provide further evidence that the WANCY genomic region is a hotspot for gene order rearrangements and support the view that TDRL is the dominant mechanism of gene order rearrangement in vertebrate mt genomes. Convergent gene rearrangements are not unlikely in hotspots of gene order rearrangement by TDRL.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号