首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 812 毫秒
1.
ABSTRACT: BACKGROUND: Traditional algorithms to solve the problem of sorting by signed reversals output just one optimal solution while the space of all optimal solutions can be huge. A so-called trace represents a group of solutions which share the same set of reversals that must be applied to sort the original permutation following a partial ordering. By using traces, we therefore can represent the set of optimal solutions in a more compact way. Algorithms for enumerating the complete set of traces of solutions were developed. However, due to their exponential complexity, their practical use is limited to small permutations. A partial enumeration of traces is a sampling of the complete set of traces and can be an alternative for the study of distinct evolutionary scenarios of big permutations. Ideally, the sampling should be done uniformly from the space of all optimal solutions. This is however conjectured to be #P-complete. RESULTS: We propose and evaluate three algorithms for producing a sampling of the complete set of traces that instead can be shown in practice to preserve some of the characteristics of the space of all solutions. The first algorithm (RA) performs the construction of traces through a random selection of reversals on the list of optimal 1-sequences. The second algorithm (DFALT) consists in a slight modification of an algorithm that performs the complete enumeration of traces. Finally, the third algorithm (SWA) is based on a sliding window strategy to improve the enumeration of traces. All proposed algorithms were able to enumerate traces for permutations with up to 200 elements. CONCLUSIONS: We analysed the distribution of the enumerated traces with respect to their height and average reversal length. Various works indicate that the reversal length can be an important aspect in genome rearrangements. The algorithms RA and SWA show a tendency to lose traces with high average reversal length. Such traces are however rare, and qualitatively our results show that, for testable-sized permutations, the algorithms DFALT and SWA produce distributions which approximate the reversal length distributions observed with a complete enumeration of the set of traces.  相似文献   

2.
We propose new algorithms for computing pairwise rearrangement scenarios that conserve the combinatorial structure of genomes. More precisely, we investigate the problem of sorting signed permutations by reversals without breaking common intervals. We describe a combinatorial framework for this problem that allows us to characterize classes of signed permutations for which one can compute, in polynomial time, a shortest reversal scenario that conserves all common intervals. In particular, we define a class of permutations for which this computation can be done in linear time with a very simple algorithm that does not rely on the classical Hannenhalli-Pevzner theory for sorting by reversals. We apply these methods to the computation of rearrangement scenarios between permutations obtained from 16 synteny blocks of the X chromosomes of the human, mouse, and rat  相似文献   

3.
In comparative genomics, gene order data is often modeled as signed permutations. A classical problem for genome comparison is to detect common intervals in permutations, that is, genes that are colocalized in several species, indicating that they remained grouped during evolution. A second largely studied problem related to gene order is to compute a minimum scenario of reversals that transforms a signed permutation into another. Several studies began to mix the two problems and it was observed that their results are not always compatible: Often, parsimonious scenarios of reversals break common intervals. If a scenario does not break any common interval, it is called perfect. In two recent studies, Berard et al. defined a class of permutations for which building a perfect scenario of reversals sorting a permutation was achieved in polynomial time and stated as an open question whether it is possible to decide, given a permutation, if there exists a minimum scenario of reversals that is perfect. In this paper, we give a solution to this problem and prove that this widens the class of permutations addressed by the aforementioned studies. We implemented and tested this algorithm on gene order data of chromosomes from several mammal species and we compared it to other methods. The algorithm helps to choose among several possible scenarios of reversals and indicates that the minimum scenario of reversals is not always the most plausible  相似文献   

4.
The problem of sorting by transpositions asks for a sequence of adjacent interval exchanges that sorts a permutation and is of the shortest possible length. The distance of the permutation is defined as the length of such a sequence. Despite the apparently intuitive nature of this problem, introduced in 1995 by Bafna and Pevzner, the complexity of both finding an optimal sequence and computing the distance remains open today. In this paper, we establish connections between two different graph representations of permutations, which allows us to compute the distance of a few nontrivial classes of permutations in linear time and space, bypassing the use of any graph structure. By showing that every permutation can be obtained from one of these classes, we prove a new tight upper bound on the transposition distance. Finally, we give improved bounds on some other families of permutations and prove formulas for computing the exact distance of other classes of permutations, again in polynomial time  相似文献   

5.
Permutations on strings representing gene clusters on genomes have been studied earlier by Uno and Yagiura (2000), Heber and Stoye (2001), Bergeron et al. (2002), Eres et al. (2003), and Schmidt and Stoye (2004) and the idea of a maximal permutation pattern was introduced by Eres et al. (2003). In this paper, we present a new tool for representation and detection of gene clusters in multiple genomes, using PQ trees (Booth and Leuker, 1976): this describes the inner structure and the relations between clusters succinctly, aids in filtering meaningful from apparently meaningless clusters, and also gives a natural and meaningful way of visualizing complex clusters. We identify a minimal consensus PQ tree and prove that it is equivalent to a maximal pi pattern (Eres et al., 2003) and each subgraph of the PQ tree corresponds to a nonmaximal permutation pattern. We present a general scheme to handle multiplicity in permutations and also give a linear time algorithm to construct the minimal consensus PQ tree. Further, we demonstrate the results on whole genome datasets. In our analysis of the whole genomes of human and rat, we found about 1.5 million common gene clusters but only about 500 minimal consensus PQ trees, with E. Coli K-12 and B. Subtilis genomes, we found only about 450 minimal consensus PQ trees out of about 15,000 gene clusters, and when comparing eight different Chloroplast genomes, we found only 77 minimal consensus PQ trees out of about 6,700 gene clusters. Further, we show specific instances of functionally related genes in two of the cases.  相似文献   

6.
Haplotyping as perfect phylogeny: a direct approach.   总被引:4,自引:0,他引:4  
A full haplotype map of the human genome will prove extremely valuable as it will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A haplotype map project has been announced by NIH. The biological key to that project is the surprising fact that some human genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). In this paper we explore the algorithmic implications of the no-recombination in long blocks observation, for the problem of inferring haplotypes in populations. This assumption, together with the standard population-genetic assumption of infinite sites, motivates a model of haplotype evolution where the haplotypes in a population are assumed to evolve along a coalescent, which as a rooted tree is a perfect phylogeny. We consider the following algorithmic problem, called the perfect phylogeny haplotyping problem (PPH), which was introduced by Gusfield (2002) - given n genotypes of length m each, does there exist a set of at most 2n haplotypes such that each genotype is generated by a pair of haplotypes from this set, and such that this set can be derived on a perfect phylogeny? The approach taken by Gusfield (2002) to solve this problem reduces it to established, deep results and algorithms from matroid and graph theory. Although that reduction is quite simple and the resulting algorithm nearly optimal in speed, taken as a whole that approach is quite involved, and in particular, challenging to program. Moreover, anyone wishing to fully establish, by reading existing literature, the correctness of the entire algorithm would need to read several deep and difficult papers in graph and matroid theory. However, as stated by Gusfield (2002), many simplifications are possible and the list of "future work" in Gusfield (2002) began with the task of developing a simpler, more direct, yet still efficient algorithm. This paper accomplishes that goal, for both the rooted and unrooted PPH problems. It establishes a simple, easy-to-program, O(nm(2))-time algorithm that determines whether there is a PPH solution for input genotypes and produces a linear-space data structure to represent all of the solutions. The approach allows complete, self-contained proofs. In addition to algorithmic simplicity, the approach here makes the representation of all solutions more intuitive than in Gusfield (2002), and solves another goal from that paper, namely, to prove a nontrivial upper bound on the number of PPH solutions, showing that that number is vastly smaller than the number of haplotype solutions (each solution being a set of n pairs of haplotypes that can generate the genotypes) when the perfect phylogeny requirement is not imposed.  相似文献   

7.
A fundamental problem arising in the evolutionary molecular biology is to discover the locations of gene duplications and multiple gene duplication episodes based on the phylogenetic information. The solutions to the MULTIPLE GENE DUPLICATION problems can provide useful clues to place the gene duplication events onto the locations of a species tree and to expose the multiple gene duplication episodes. In this paper, we study two variations of the MULTIPLE GENE DUPLICATION problems: the EPISODE-CLUSTERING (EC) problem and the MINIMUM EPISODES (ME) problem. For the EC problem, we improve the results of Burleigh et al. with an optimal linear-time algorithm. For the ME problem, on the basis of the algorithm presented by Bansal and Eulenstein, we propose an optimal linear-time algorithm.  相似文献   

8.
The problem of inferring haplotype phase from a population of genotypes has received a lot of attention recently. This is partly due to the observation that there are many regions on human genomic DNA where genetic recombination is rare (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). A Haplotype Map project has been announced by NIH to identify and characterize populations in terms of these haplotypes. Recently, Gusfield introduced the perfect phylogeny haplotyping problem, as an algorithmic implication of the no-recombination in long blocks observation, together with the standard population-genetic assumption of infinite sites. Gusfield's solution based on matroid theory was followed by direct theta(nm2) solutions that use simpler techniques (Bafna et al., 2003; Eskin et al., 2003), and also bound the number of solutions to the PPH problem. In this short note, we address two questions that were left open. First, can the algorithms of Bafna et al. (2003) and Eskin et al. (2003) be sped-up to O(nm + m2) time, which would imply an O(nm) time-bound for the PPH problem? Second, if there are multiple solutions, can we find one that is most parsimonious in terms of the number of distinct haplotypes. We give reductions that suggests that the answer to both questions is "no." For the first problem, we show that computing the output of the first step (in either method) is equivalent to Boolean matrix multiplication. Therefore, the best bound we can presently achieve is O(nm(omega-1)), where omega < or = 2.52 is the exponent of matrix multiplication. Thus, any linear time solution to the PPH problem likely requires a different approach. For the second problem of computing a PPH solution that minimizes the number of distinct haplotypes, we show that the problem is NP-hard using a reduction from Vertex Cover (Garey and Johnson, 1979).  相似文献   

9.
This paper examines some of the rich structure of the syntenic distance model of evolutionary distance, introduced by Ferretti et al. (1996). The syntenic distance between two genomes is the minimum number of fissions, fusions, and translocations required to transform one into the other, ignoring gene order within chromosomes. We prove that the previously unanalyzed algorithm given by Ferretti et al. (1996) is a 2-approximation and no better, and that, further, it always outperforms the algorithm presented by DasGupta et al. (1998). We also prove the same results for an improved version of the Ferretti et al. algorithm. We then prove a number of properties which give insight into the structure of optimal move sequences. We give instances in which any move sequence working solely within connected components is nearly twice optimal and prove a general lower bound based on the spread of genes from each chromosome. We then prove a monotonicity property for the syntenic distance, and bound the difficulty of the hardest instance of any size. We discuss the results of implementing these algorithms and testing them on real and simulated synteny data.  相似文献   

10.
In this paper, based on maximum neural network, we propose a new parallel algorithm that can help the maximum neural network escape from local minima by including a transient chaotic neurodynamics for bipartite subgraph problem. The goal of the bipartite subgraph problem, which is an NP- complete problem, is to remove the minimum number of edges in a given graph such that the remaining graph is a bipartite graph. Lee et al. presented a parallel algorithm using the maximum neural model (winner-take-all neuron model) for this NP- complete problem. The maximum neural model always guarantees a valid solution and greatly reduces the search space without a burden on the parameter-tuning. However, the model has a tendency to converge to a local minimum easily because it is based on the steepest descent method. By adding a negative self-feedback to the maximum neural network, we proposed a new parallel algorithm that introduces richer and more flexible chaotic dynamics and can prevent the network from getting stuck at local minima. After the chaotic dynamics vanishes, the proposed algorithm is then fundamentally reined by the gradient descent dynamics and usually converges to a stable equilibrium point. The proposed algorithm has the advantages of both the maximum neural network and the chaotic neurodynamics. A large number of instances have been simulated to verify the proposed algorithm. The simulation results show that our algorithm finds the optimum or near-optimum solution for the bipartite subgraph problem superior to that of the best existing parallel algorithms.  相似文献   

11.
The order of genes in the genomes of species can change during evolution and can provide information about their phylogenetic relationship. An interesting method to infer the phylogenetic relationship from the gene orders is to use different types of rearrangement operations and to find possible rearrangement scenarios using these operations. One of the most common rearrangement operations is reversals, which reverse the order of a subset of neighbored genes. In this paper, we study the problem to find the ancestral gene order for three species represented by their gene orders. The rearrangement scenario should use a minimal number of reversals and no other rearrangement operations. This problem is called the Median problem and is known to be NP--complete. In this paper, we describe a heuristic algorithm for finding solutions to the Median problem that searches for rearrangement scenarios with the additional property that gene groups should not be destroyed by reversal operations. The concept of conserved intervals for signed permutations is used to describe such gene groups. We show experimentally, for different types of test problems, that the proposed algorithm produces very good results compared to other algorithms for the Median problem. We also integrate our reversal selection procedure into the well-known MGR and GRAPPA algorithms and show that they achieve a significant speedup while obtaining solutions of the same quality as the original algorithms on the test problems.  相似文献   

12.
Molecular solutions for the subset-sum problem on DNA-based supercomputing   总被引:9,自引:0,他引:9  
Chang WL  Ho MS  Guo M 《Bio Systems》2004,73(2):117-130
In this paper our main purpose is to give molecular solutions for the subset-sum problem. In order to achieve this, we propose a DNA-based algorithm of an n-bit parallel adder and a DNA-based algorithm of an n-bit parallel comparator to formally verify our designed molecular solutions for the subset-sum problem.  相似文献   

13.
We consider the problem of coordinating a team of agents that have to collect disseminated resources in an unknown environment. We are interested in approaches in which agents collectively explore the environment and build paths between home and resources. The originality of our approach is to simultaneously build an artificial potential field (APF) around the agents’ home while foraging. We propose a multi-agent model defining a distributed and asynchronous version of Barraquand et al. Wavefront algorithm. Agents need only to mark and read integers locally on a grid, that is, their environment. We prove that the construction converges to the optimal APF. This allows the definition of a complete parameter-free foraging algorithm, called c-marking agents. The algorithm is evaluated by simulation, while varying the foraging settings. Then we compare our approach to a pheromone-based algorithm. Finally, we discuss requirements for implementation in robotics.  相似文献   

14.
In this paper, introducing stochastic dynamics into an optimal competitive Hopfield network model (OCHOM), we propose a new algorithm that permits temporary energy increases which helps the OCHOM escape from local minima. The goal of the maximum cut problem, which is an NP-complete problem, is to partition the node set of an undirected graph into two parts in order to maximize the cardinality of the set of edges cut by the partition. The problem has many important applications including the design of VLSI circuits and design of communication networks. Recently, Galán-Marín et al. proposed the OCHOM, which can guarantee convergence to a global/local minimum of the energy function, and performs better than the other competitive neural approaches. However, the OCHOM has no mechanism to escape from local minima. The proposed algorithm introduces stochastic dynamics which helps the OCHOM escape from local minima, and it is applied to the maximum cut problem. A number of instances have been simulated to verify the proposed algorithm.  相似文献   

15.
Protein sequence design is a natural inverse problem to protein structure prediction: given a target structure in three dimensions, we wish to design an amino acid sequence that is likely fold to it. A model of Sun, Brem, Chan, and Dill casts this problem as an optimization on a space of sequences of hydrophobic (H) and polar (P) monomers; the goal is to find a sequence that achieves a dense hydrophobic core with few solvent-exposed hydrophobic residues. Sun et al. developed a heuristic method to search the space of sequences, without a guarantee of optimality or near-optimality; Hart subsequently raised the computational tractability of constructing an optimal sequence in this model as an open question. Here we resolve this question by providing an efficient algorithm to construct optimal sequences; our algorithm has a polynomial running time, and performs very efficiently in practice. We illustrate the implementation of our method on structures drawn from the Protein Data Bank. We also consider extensions of the model to larger amino acid alphabets, as a way to overcome the limitations of the binary H/P alphabet. We show that for a natural class of arbitrarily large alphabets, it remains possible to design optimal sequences efficiently. Finally, we analyze some of the consequences of this sequence design model for the study of evolutionary fitness landscapes. A given target structure may have many sequences that are optimal in the model of Sun et al.; following a notion raised by the work of J. Maynard Smith, we can ask whether these optimal sequences are "connected" by successive point mutations. We provide a polynomial-time algorithm to decide this connectedness property, relative to a given target structure. We develop the algorithm by first solving an analogous problem expressed in terms of submodular functions, a fundamental object of study in combinatorial optimization.  相似文献   

16.
RNA shapes, introduced by Giegerich et al. (2004), provide a useful classification of the branching complexity for RNA secondary structures. In this paper, we derive an exact value for the asymptotic number of RNA shapes, by relying on an elegant relation between non-ambiguous, context-free grammars, and generating functions. Our results provide a theoretical upper bound on the length of RNA sequences amenable to probabilistic shape analysis (Steffen et al., 2006; Voss et al., 2006), under the assumption that any base can basepair with any other base. Since the relation between context-free grammars and asymptotic enumeration is simple, yet not well-known in bioinformatics, we give a self-contained presentation with illustrative examples. Additionally, we prove a surprising 1-to-1 correspondence between pi-shapes and Motzkin numbers.  相似文献   

17.
It is known (Reidys et al., 1997b. Bull. Math. Biol. 59(2), 339-397) that for any two secondary structures S,S' there exists an RNA sequence compatible with both, and that this result does not extend to more than two secondary structures. Indeed, a simple formula for the number of RNA sequences compatible with secondary structures S,S' plays a role in the algorithms of Flamm et al. (2001. RNA 7, 254-265) and of Abfalter et al. (2003. Proceedings of the German Conference on Bioinformatics, ) to design an RNA switch. Here we show that a natural extension of this problem is NP-complete. Unless P=NP, there is no polynomial time algorithm, which when given secondary structures S1,...,S(k), for k4, determines the least number of positions, such that after removal of all base pairs incident to these positions there exists an RNA nucleotide sequence compatible with the given secondary structures. We also consider a restricted version of this problem with a "fixed maximum" number of possible stars and show that it has a simple polynomial time solution.  相似文献   

18.
本文引入正四面体晶格(简称D晶格)、D格点、D格点简并度等新概念,对正四面体晶格的计数及分布规律进行了系统地研究,从归纳法和组合数学两条途径均得出了正四面体的顶点、棱上、面上、体内及总的D格点数的一般公式,而且得到了其D格点简并度的表达式;归纳总结出了D格点碱基组分表及分布规律.解释了经验方程〔1〕;计算了人的46条染色体DNA分子的简并度,探讨了不同生物种类DNA的排序花样.  相似文献   

19.
In this study, we address a job sequencing and tool switching problem arising in flexible manufacturing systems. We consider the single machine problem of minimizing total flow time. We prove that the problem is NP-hard in the strong sense and show that the tool switching problem is polynomially solvable for a given sequence. We propose a branch-and-bound algorithm whose efficiency is improved by precedence relations and several lower and upper bounding techniques. Our computational results reveal that the branch and bound approach produces optimal solutions in reasonable times for moderate sized problems. Our upper bounds produce very satisfactory solutions; therefore they can be an attractive alternative to solve larger sized problems.  相似文献   

20.
We make a novel contribution to the theory of biopolymer folding, by developing an efficient algorithm to compute the number of locally optimal secondary structures of an RNA molecule, with respect to the Nussinov-Jacobson energy model. Additionally, we apply our algorithm to analyze the folding landscape of selenocysteine insertion sequence (SECIS) elements from A. Bock (personal communication), hammerhead ribozymes from Rfam (Griffiths-Jones et al., 2003), and tRNAs from Sprinzl's database (Sprinzl et al., 1998). It had previously been reported that tRNA has lower minimum free energy than random RNA of the same compositional frequency (Clote et al., 2003; Rivas and Eddy, 2000), although the situation is less clear for mRNA (Seffens and Digby, 1999; Workman and Krogh, 1999; Cohen and Skienna, 2002),(1) which plays no structural role. Applications of our algorithm extend knowledge of the energy landscape differences between naturally occurring and random RNA. Given an RNA molecule a(1), ... , a(n) and an integer k > or = 0, a k-locally optimal secondary structure S is a secondary structure on a(1), ... , a(n) which has k fewer base pairs than the maximum possible number, yet for which no basepairs can be added without violation of the definition of secondary structure (e.g., introducing a pseudoknot). Despite the fact that the number numStr(k) of k-locally optimal structures for a given RNA molecule in general is exponential in n, we present an algorithm running in time O(n (4)) and space O(n (3)), which computes numStr(k) for each k. Structurally important RNA, such as SECIS elements, hammerhead ribozymes, and tRNA, all have a markedly smaller number of k-locally optimal structures than that of random RNA of the same dinucleotide frequency, for small and moderate values of k. This suggests a potential future role of our algorithm as a tool to detect noncoding RNA genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号